Day 22 - Project Matching

Spring 2023

Dr. Jared Joseph

Smith College

Overview

Timeline

Project Pitches
Project Matching
Work Time

Goal

To form project teams and start planning.

Project Pitches

The Top 10

Project #1: Better Budgets
Project #2: Travel Itineraries
Project #3: Webscraping Archive.org
Project #4: To-Do Tracking
Project #5: Spotify Discover
Project #6: NYT Cooking
Project #7: Syllabi and Due Dates
Project #8: Major Progress
Project #9: RateMyProfessors
Project #10: Movie Recommendations

Pitch Process

The creator of each project idea has the option to speak for ~2 mins to explain their idea.

You are selling the idea to potential team mates.

Only ~8 of the top 10 projects will form valid teams of 3~4 members.

Things to cover:

What problem will your package solve?
What general strategy do you plan to use to solve the problem?
What skill sets do you think the project needs?
Why are you personally interested in the project? What makes you excited to work on it?
Image you run the code from your package, what is the output?

Project #1: Better Budgets

My package will help users budget better. To solve this problem, user will in-out their monthly income/how much money they would like to spend. The package will calculate how much money the user should spend on housing, food, transportation, entertainment, healthcare, savings, and other. The program will also provide users with lists of recipes to stay on budget. The user will also be able to compare different costs of transportation. Personally, I am interested in this project because I love working with numbers. The output will be some sort of detailed budget plan with tips to stay on budget.

Project #2: Travel Itineraries

The package would generate a travel itinerary including transportation and housing, based on input from a user about their destination, days of travel, budget, a genre of activities they enjoy, and how fast-paced they’d like their schedule to be.

The R package would use data from various sources (e.g., travel websites, tourism boards, etc., so web scraping would be involved) to generate a list of potential travel options that match the user’s preferences. I imagine this would also involve a lot of regex to parse through all the site data.

I’ve wasted a lot of time planning trips before, even short-term ones. There are so many moving parts and complications that arise at any second, so some kind of package to help assist with that seemed like a useful idea. I think developing it would exercise a lot of the skills we’ve learned in class and help me learn new ones, even if the final package isn’t one a user would execute for data science purposes.

As for the end result of the package: users could then select the generated options they like best and the package would generate a full itinerary, complete with daily activities, travel times, etc. displayed in a clean, easy to understand manner. There would then be an option to export the itinerary (PDF? CSV?) so that it could easily be shared with other people.

Project #3: Webscraping Archive.org

Webscraping Archive.org to study the Environmental History of the Global South

I believe it will take not only scientists but also humanists working together to better understand and ultimately transform humanity’s relationship with the rest of nature. That is why I am an environmental historian of the Middle East, and that is why I want to leverage our data science skills to improve our understanding of humanity’s experiences with past global climate crises.

The project I have in mind will allow people to easily explore past human experiences of major climatic events with an R package for creating machine-readable text corpuses out of public domain publications from www.archive.org. Archive.org has an amazing collection of historical publications, but the problem is that it is time-consuming to search each individual document. So the goal would be to create an R package that makes it easy to pick a historical publication and search its entirety for the years/dates of key words like flood, drought, famine, and then produce tidy tables and visualizations to explore the results. This would also be useful for text analyses. The test case that I have in mind is a global publication from the 1800s and early 1900s that brings together observations from across East Asia, Southeast Asia, South Asia, the Middle East, and Africa.

The skills I anticipate using are web scraping, data wrangling, cleaning, and visualization. The ultimate goal is to render the text of historical publications in Archive.org searchable and visualiziable for researchers interested in climate science, political ecology, and environmental history (but the corpuses could have many other uses as well!). The primary output would be a package that can produce a machine-readable corpus. Depending on how that goes, secondary outputs would be using text analysis to generate visualizations and data for key words that we could pair with historic climate data.

Project #4: To-Do Tracking

An advanced to-do list that allows you to set an estimated time of completion for each task and starts a timer when each task begins, prioritize tasks by importance using personal choice of color palettes, and check off tasks when they are completed.

I believe someone working on this project would need strong organizational skills and some experience with website design or art.

I am interested in this project because I have never been able to find a to-do list that motivates me to do my work. I have tried amazon to-do lists and spreadsheets but I would use them for a few weeks and never touch it again. This to-do list has a lot of features that ensures that people stay on top of their work.

Project #5: Spotify Discover

Given a user’s Discover Weekly, the package organizes songs by its “category ID” given by the Spotify API.

This package will organize songs for listeners who do not know how to organize their listening habits. It also will give listeners insight into what Spotify thinks they would like to listen to.

The skill sets needed are data frame managing and knowledge with urls.

I am interested in learning more about listening habits of myself and others.

The output would be a data frame with the songs on the discover weekly, the category ID, the url, and other features of the song.

Project #6: NYT Cooking

The package will generate meal prep ideas from NYT cooking (or any other good cooking website) based on the ingredient lists you add. If the ingredient list is more than 3 items, you will have the opportunity to rank your three most important ingredients or specify the cuisine.

The return will be the list of top 10 meal ideas (title, link, instruction) ranked based on rating on the website. The strategy used to solve the problem would be searching the NYT cooking website using the ingredient lists that the user inputs.

The skill sets needed for this project idea would be web scrapping, RegEx, iteration, etc. As a person interested in cooking and preparing meals, it is exciting to see a new R package useful for other home cooks.

Project #7: Syllabi and Due Dates

A package to pdf search and webscrape through all the syllabus files we input, to return a list of due dates sorted by date and a ggplot showing how many hours each class will take.

Project #8: Major Progress

My project idea is to create a package that will help students to track their major progress. If you input your major, it will output a data frame that has all the requirements needed to graduate with that major. Additionally there will be a list of all possible classes that could satisfy those requirements and more info telling whether it’s offered in the fall / spring or if there are any prerequisites. This package would greatly help any student who wishes to organize and plan for their future at Smith. If this is too difficult, it could be sized down to only focus on one or two majors.

This package would look at the most recent course catalog on the Smith website, so web scraping skills should be used. Data wrangling / cleaning data would also be needed.

Project #9: RateMyProfessors

The problem my package will solve is not knowing details about a professor and having to search hard to determine whether to take a course by them.

The package will look through https://www.ratemyprofessors.com/ and a campus website to give information about a professor. The package requires lots of web scraping and problem solving with web scraping.

I am interested in the project because I use ratemyprofessor for every class I take, but want some help deciding whether the professor is good.

The output would give key words about the professor, what they teach, example classes, and their rating.

Project #10: Movie Recommendations

A movie recommendation package. This package will recommend movies to watch next based on the user’s input of a movie. The movie generator will recommend a list of movies that have a similar genre.

Each movie can have a score for how much it is a comedy, action, romance, or horror movie.
If a user were to input a movie, the movie generator will try to recommend other movies based upon the scores across each genre and find movies that are most similar to it

Input: Enter the name of a movie Output: A list of 5-10 recommendations of movies the user may like based on the movie entered

Project Matching

Rank your Choices

Fill out this form to rank your project choices!

Run the Matching Code

Using the Gale–Shapley algorithm I use the survey results to match people with their teams.

I am treating every project as if it had a preference of 0 toward everyone except the project creator.

There is some randomness, but the goal is to find a stable optimal solution so everyone is on a project they enjoy.

Work Time

Starting your Project Repo

Link to GitHub Classroom Final Assignment

One person from your team create a group for the final project on GitHub Classroom
Everyone else should join that team
Everyone should clone that repo to their machine

We we cover how to start collaboratively coding using GitHub on Monday.

Figure Out the Minimal Viable Product

What is the absolute minimum this package needs to do to “work”?

Figure out this minimal viable product
Set an internal team date for when it needs to be done
Delegate tasks to individual team members
Plan for future extensions on this minimal version to accomplish the other things you want

Ready? Set? Go!

From the syllabus:

Note

This is a 4-credit course. You should be spending 12-hours total per week on this course. Expect to spend around 8.25 hours (12 hours - 3.75 hours/week of in-class instruction) on class material per week outside of class.

You will need to work with your group outside of class to get this project done.

We will have a fair amount of in-class time, but to make the most of that time (getting quick answers to coding or planning stumbles) you need to come prepared.

Day 22 - Project Matching

Overview

Timeline

Goal

Project Pitches

The Top 10

Pitch Process

Project #1: Better Budgets

Project #2: Travel Itineraries

Project #3: Webscraping Archive.org

Project #4: To-Do Tracking

Project #5: Spotify Discover

Project #6: NYT Cooking

Project #7: Syllabi and Due Dates

Project #8: Major Progress

Project #9: RateMyProfessors

Project #10: Movie Recommendations

Project Matching

Rank your Choices

Run the Matching Code

Work Time

Starting your Project Repo

Figure Out the Minimal Viable Product

Ready? Set? Go!

For Next Time

Topic

To-Do