# Set directory we will be looking at
# Windows Example: directory = "C:/Users/jared/Downloads/"
# Mac Example: directory = "/Users/jared/Downloads/"
= "<PATH TO DIRECTORY HERE>" directory
Lab 2. Functions and Flow
Introduction
Click here to access the lab on Github Classroom: Github Classroom Assignment for Lab 2: Functions and Flow
Under the hood, functions are a set of instructions that we can apply to our data. Functions save us a lot of time by letting us execute code that does a complex task with a single line. But functions aren’t something special built in to R, we can make our own! Today we will we doing just that. We will create, and then gradually refine a single function. First we will create a function that will list all the files in a directory and some information about them. We will then expand what it can look at, and adapt it to different inputs.
The Data
For today’s lab we will be using data from your own computer! One of the coolest things about learning to code is that you can use your new skills to solve problems. Today we will walk through an exercise that helped me solve a problem of my own, namely finding what files were filling up my hard drive.
In the code chunk below, change the file path to some directory on your computer with a lot of files in it. Your downloads folder or “my documents” folder might be a good start. You can always change it later.
Planning our Function
The first step of any good function is figuring out what we want it to do. The function you will write today will accomplish a few tasks. The input will be the directory you assigned above (directory
), and the output will be a dataframe with information about the files in that directory. You will be getting the names, file size, and last modification date of every file in the input directory, with room for extensions.
Here are a few functions in R that we can use to learn about the files on our computer. Skim through the help pages of each to get an idea of what they can do. We will use these in the body of our function.
list.files()
basename()
file.size()
file.mtime()
Load in the following data from this project repo to get an example of what the eventual output of your finished function should look like. The columns are as follows:
- path
- Full path to the file
- name
- Name of the file
- size_mb
- Size of the file in megabytes (MB)
- last_modified
- Last time that file was modified
= read.csv("data/example_output.csv") example_output
path | name | size_mb | last_modified |
---|---|---|---|
C:/Users/jared/Downloads/data_1.csv | data_1.csv | 21.20 | 2022-10-11 12:56:04 CDT |
C:/Users/jared/Downloads/data_2.csv | data_2.csv | 2.56 | 2022-10-11 12:56:04 CDT |
C:/Users/jared/Downloads/data_3.csv | data_3.csv | 0.30 | 2022-10-09 12:56:04 CDT |
C:/Users/jared/Downloads/doc_1.pdf | doc_1.pdf | 0.04 | 2022-10-11 12:55:58 CDT |
C:/Users/jared/Downloads/doc_2.pdf | doc_2.pdf | 0.30 | 2022-10-11 10:55:21 CDT |
C:/Users/jared/Downloads/reading_1.pdf | reading_1.pdf | 0.90 | 2022-10-11 10:41:17 CDT |
C:/Users/jared/Downloads/song_i_like.flac | song_i_like.flac | 29.90 | 2022-07-06 12:27:15 CDT |
C:/Users/jared/Downloads/random_file.html | random_file.html | 0.60 | 2022-09-05 22:08:32 CDT |
C:/Users/jared/Downloads/minecraft_mod.jar | minecraft_mod.jar | 3.28 | 2022-09-05 22:27:25 CDT |
C:/Users/jared/Downloads/memes_for_slides.gif | memes_for_slides.gif | 2.23 | 2022-10-11 10:39:48 CDT |
C:/Users/jared/Downloads/3d_print_file.stl | 3d_print_file.stl | 0.07 | 2022-07-06 12:52:44 CDT |
C:/Users/jared/Downloads/3d_print_file_fixed.stl | 3d_print_file_fixed.stl | 0.08 | 2022-07-15 13:57:54 CDT |
C:/Users/jared/Downloads/datapack.zip | datapack.zip | 3.50 | 2022-10-11 12:51:11 CDT |
Creating the Steps
Now that we have an idea of our inputs and outputs, we can start creating the body of our function. We want to create an output similar to the one provided above in example_output
. Write some code using the functions I listed to create that output. We will turn that code into our function later.
# <REPLACE THIS COMMENT WITH YOR ANSWER>
# <REPLACE THIS COMMENT WITH YOR ANSWER>
Provide Some Options
That’s neat, but doesn’t really do much beyond what we can already do by looking at the file browser or finder. For the second part of this lab, we will be adding an argument to our function that will tell it to look at files recursively, meaning it will look at all the files in the directory we provide, but also all files inside the folders of that directory.
As an example, say you have a folder of music inside your downloads directory. Right now, if you pointed our file_info()
at your downloads folder, it would not take the extra step to look inside this music folder. It would only give us the size of the entire music folder. If we tell it to look recursively, it will look inside that folder at the individual files, and list them all out for us.
# <REPLACE THIS COMMENT WITH YOR ANSWER>
With some patience, you will now have a function that you can point at any directory on your computer, and find out where the largest files are, no matter how many folders deep they are hiding. File bloat be gone!
Make it Flex
For the last part of this lab, we will be modifying our function to give it a bit more flexibility. Using conditions, make it so that our file_info()
function can accept both a path to a directory, as it currently does, or a path for a single file. If the provided path is a single file, I want your function to output all the same information, but in a named vector for that single file. You can see an example below:
file_info(path = "C:/Users/jared/Downloads/data_1.csv")
path | name | size_mb | last_modified |
---|---|---|---|
C:/Users/jared/Downloads/data_1.csv | data_1.csv | 0.28 | 2022-08-02 15:50:02 |
# <REPLACE THIS COMMENT WITH YOR ANSWER>
# <REPLACE THIS COMMENT WITH YOR ANSWER>
# <REPLACE THIS COMMENT WITH YOR ANSWER>