Lab 2. Functions and Flow

Author

Dr. Jared Joseph

Introduction

Click here to access the lab on Github Classroom: Github Classroom Assignment for Lab 2: Functions and Flow

Under the hood, functions are a set of instructions that we can apply to our data. Functions save us a lot of time by letting us execute code that does a complex task with a single line. But functions aren’t something special built in to R, we can make our own! Today we will we doing just that. We will create, and then gradually refine a single function. First we will create a function that will list all the files in a directory and some information about them. We will then expand what it can look at, and adapt it to different inputs.

The Data

For today’s lab we will be using data from your own computer! One of the coolest things about learning to code is that you can use your new skills to solve problems. Today we will walk through an exercise that helped me solve a problem of my own, namely finding what files were filling up my hard drive.

In the code chunk below, change the file path to some directory on your computer with a lot of files in it. Your downloads folder or “my documents” folder might be a good start. You can always change it later.

# Set directory we will be looking at
# Windows Example: directory = "C:/Users/jared/Downloads/"
# Mac Example: directory = "/Users/jared/Downloads/"

directory = "<PATH TO DIRECTORY HERE>"

Planning our Function

The first step of any good function is figuring out what we want it to do. The function you will write today will accomplish a few tasks. The input will be the directory you assigned above (directory), and the output will be a dataframe with information about the files in that directory. You will be getting the names, file size, and last modification date of every file in the input directory, with room for extensions.

Here are a few functions in R that we can use to learn about the files on our computer. Skim through the help pages of each to get an idea of what they can do. We will use these in the body of our function.

  • list.files()
  • basename()
  • file.size()
  • file.mtime()
Important

DO NOT play around with file.remove(). It will delete your files for good. No recycle bin, no way to get them back, gone. You have been warned.

Load in the following data from this project repo to get an example of what the eventual output of your finished function should look like. The columns are as follows:

path
Full path to the file
name
Name of the file
size_mb
Size of the file in megabytes (MB)
last_modified
Last time that file was modified
example_output = read.csv("data/example_output.csv")
path name size_mb last_modified
C:/Users/jared/Downloads/data_1.csv data_1.csv 21.20 2022-10-11 12:56:04 CDT
C:/Users/jared/Downloads/data_2.csv data_2.csv 2.56 2022-10-11 12:56:04 CDT
C:/Users/jared/Downloads/data_3.csv data_3.csv 0.30 2022-10-09 12:56:04 CDT
C:/Users/jared/Downloads/doc_1.pdf doc_1.pdf 0.04 2022-10-11 12:55:58 CDT
C:/Users/jared/Downloads/doc_2.pdf doc_2.pdf 0.30 2022-10-11 10:55:21 CDT
C:/Users/jared/Downloads/reading_1.pdf reading_1.pdf 0.90 2022-10-11 10:41:17 CDT
C:/Users/jared/Downloads/song_i_like.flac song_i_like.flac 29.90 2022-07-06 12:27:15 CDT
C:/Users/jared/Downloads/random_file.html random_file.html 0.60 2022-09-05 22:08:32 CDT
C:/Users/jared/Downloads/minecraft_mod.jar minecraft_mod.jar 3.28 2022-09-05 22:27:25 CDT
C:/Users/jared/Downloads/memes_for_slides.gif memes_for_slides.gif 2.23 2022-10-11 10:39:48 CDT
C:/Users/jared/Downloads/3d_print_file.stl 3d_print_file.stl 0.07 2022-07-06 12:52:44 CDT
C:/Users/jared/Downloads/3d_print_file_fixed.stl 3d_print_file_fixed.stl 0.08 2022-07-15 13:57:54 CDT
C:/Users/jared/Downloads/datapack.zip datapack.zip 3.50 2022-10-11 12:51:11 CDT

Creating the Steps

Now that we have an idea of our inputs and outputs, we can start creating the body of our function. We want to create an output similar to the one provided above in example_output. Write some code using the functions I listed to create that output. We will turn that code into our function later.

Question 1

In the following code chunk, develop the steps your function will take. Do not create the function yet, we will do that as a separate step next.

# <REPLACE THIS COMMENT WITH YOR ANSWER>
Question 2

In the following code chunk, convert the code you developed above into a function. It should accept a directory path as an input, and output something similar to the example_output dataframe above.

# <REPLACE THIS COMMENT WITH YOR ANSWER>

Provide Some Options

That’s neat, but doesn’t really do much beyond what we can already do by looking at the file browser or finder. For the second part of this lab, we will be adding an argument to our function that will tell it to look at files recursively, meaning it will look at all the files in the directory we provide, but also all files inside the folders of that directory.

As an example, say you have a folder of music inside your downloads directory. Right now, if you pointed our file_info() at your downloads folder, it would not take the extra step to look inside this music folder. It would only give us the size of the entire music folder. If we tell it to look recursively, it will look inside that folder at the individual files, and list them all out for us.

Question 3

Copy and paste your function definition from above into the following code chunk. Then, modify it by adding an argument that if TRUE, will make it look through folders recursively (check the help files again for ideas). This argument should have a default value of FALSE. Make sure to inspect your output carefully!

# <REPLACE THIS COMMENT WITH YOR ANSWER>

With some patience, you will now have a function that you can point at any directory on your computer, and find out where the largest files are, no matter how many folders deep they are hiding. File bloat be gone!

Make it Flex

For the last part of this lab, we will be modifying our function to give it a bit more flexibility. Using conditions, make it so that our file_info() function can accept both a path to a directory, as it currently does, or a path for a single file. If the provided path is a single file, I want your function to output all the same information, but in a named vector for that single file. You can see an example below:

file_info(path = "C:/Users/jared/Downloads/data_1.csv")
path name size_mb last_modified
C:/Users/jared/Downloads/data_1.csv data_1.csv 0.28 2022-08-02 15:50:02
Question 4

In the following code chunk, develop the steps your new modified function will take. You will actually turn it into a function in the next question. The following functions will come in handy:

  • file.exists()
  • dir.exists()
# <REPLACE THIS COMMENT WITH YOR ANSWER>
Question 5

In the following code chunk, convert the code you developed above into a function. Now that it is in a function, make sure that the function reports what it is doing with warnings or error messages if appropriate.

# <REPLACE THIS COMMENT WITH YOR ANSWER>
CHALLANGE QUESTION

Modify our file_info() function such that it can accept any combination of arbitrary file paths (not just a set number) as input and create a single unified dataframe as output, or a named vector for a single file input. For example, it would be able to accept:

file_info("C:/Users/jared/Documents",
          list.files("C:/Users/jared/Downloads", full.names = TRUE),
          "C:/Users/jared/Music/tetris.mp3",
          list(
            "photos" = list.files("C:/Users/jared/Pictures", full.names = TRUE),
            "videos" = list.files("C:/Users/jared/Videos", full.names = TRUE)
            ),
          recursive = TRUE
          )
# <REPLACE THIS COMMENT WITH YOR ANSWER>