Lab 3. Iteration & Apply

Author

Dr. Jared Joseph

Introduction

Click here to access the lab on Github Classroom: Github Classroom Assignment for Lab 3: Iteration & Apply

Both for() loops and the apply functions greatly expand the scope of tasks we can accomplish. Combined with our new knowledge of functions, these tools drastically increase what we are able to do. The ability to solve a problem, and then automatically apply that solution to an arbitrary number of cases is amazing.

Today we will be practicing this increase in scale by working with some data on banned books. While (I think) this data is interesting, it is important to keep in mind the skill we are focusing on is learning to iterate in R, rather than data analysis. The thing we are iterating on is largely arbitrary; though each case will offer its own challenges. Just keep in mind that this dataframe could easily be a collection of files to modify, a list of web addresses to scrape, or configurations for simulations to run.

The Data

The data we will be using today comes from PEN America’s Book Bans database. This database attempts to catalog all books banned in US schools between July 2021 and June 2022. While it is impossible to know if this is all of the bans, and it does not account for books being un-banned later, it is still quite the dataset to look through.

Tip

I’ve provided a CSV version of the database in the data/ directory of this project, as well as a copy of the data documentation in the docs/ directory. Be sure to refer to it as you work.

Question 1

Load in the banned books CSV and assign it to an object named banned_books

# <REPLACE THIS COMMENT WITH YOR ANSWER>

Tidying the Title

We have some un-tidy data in our banned books dataframe, namely that the Title column contains both the title, and the title of the series if it is part of one. We should fix that.

Question 2

Use either a for() loop or an apply family function to go over the Title column of banned_books. Your final output should be a dataframe with new columns that contain each of the following:

  • A logical if it was part of a series
  • The name of the series if it had one
  • A clean title for the book

The following functions will be helpful; be sure to read about them and test them out:

  • strsplit()
  • trimws()
  • grepl()
  • gsub()
# <REPLACE THIS COMMENT WITH YOR ANSWER>

Splitting the Data

The challenge question will be looking at banned books by state, so we are going to split the main dataframe in preparation. Rather than typing out and sub-setting each manually, we will use a for() loop to do so.

Question 3

Use a for() loop to iterate through each state, and sub-set the banned_books dataframe by that state. You should store your results in the state_books list provided.

You want to end up with a list, state_books, that has an element for each state that appeared in our dataset, with the content of that element being a dataframe of all the books banned in that state.

HINT: The unique() function will be helpful for building your loop. When given a vector it will return all the unique values in that vector.

# Create output list
state_books = list()

#<REPLACE THIS COMMENT WITH YOR ANSWER>

Getting Meta-Data

We know the names of these banned books, but that doesn’t tell us much. We’re going to use the Google API to try and get more information. I’m going to provide a function for you to use below. What it does is query the Google books API, and return the first result for a search on a book’s name. You will use it in the following question. You will need the jsonlite package installed.

# Function to get book info

get_book_info = function(book_name, author_name){
  
  # construct api call
  api_call = paste0("https://www.googleapis.com/books/v1/volumes?maxResults=1&q=name=", book_name, "&inauthor=", author_name)
  
  # get the raw json data from books api
  book_info = jsonlite::fromJSON(URLencode(api_call))
  
  # get the info elements
  info_list = as.list(book_info$items$volumeInfo)
  
  # return
  return(info_list)
}

Look at the example output to understand the structure of the returned object.

example = get_book_info(book_name = "Dune", author_name = "Frank Herbert")

View(example)
CHALLANGE QUESTION

Using the provided get_book_info() function, get some meta-data for each of the banned books in the database for Indiana only (it takes forever otherwise). We want the publisher, ISBN 13 number, page count, maturity rating, and description.

# <REPLACE THIS COMMENT WITH YOR ANSWER>