Objects in R

Overview

Objects in R are the thing we typically work with. We load data into objects, manipulate those objects, and run analyses on them. But for that seamless workflow to happen a surprising amount of work is happening behind the scenes. Today we will be looking behind the curtain a bit to see how R understands objects. We will also be spending some time working with lists.

For each of the following questions, write the answer in an R script or Quarto document. The answers are due at the start of the next class day, but will not be turned in. Rather, the answers to these worksheet act as valuable reference for the labs on Friday. Further, if you are not able to answer all of the questions here, you will have a difficult time on the more complex labs. Answers for worksheets will be shared after lecture the day they are due.

Class

The most common way we thing of vectors in R relates to the three big classes: character, numeric, and logical. A vector can only contain one of these types, and R will force that to be the case if it is not already.

Can you predict what class each of the following vectors will be before running them?

# example 1
c(TRUE, 8)

# example 2
c("TRUE", FALSE)

# example 3
c(F, F, F, T)

# example 4
c(TRUE, TRUE, TRUE, 1)
  • Example 1: Numeric
  • Example 2: Character
  • Example 3: Logical
  • Example 4: Numeric

It is possible to force a vector into a specific type though coercion. R will basically take the vector, and do it’s best to present it in the way you ask. You can do this using the as.xxxx() family of functions. For example:

as.numeric(c("1", "8", "10"))
[1]  1  8 10
class(as.numeric(c("1", "8", "10")))
[1] "numeric"

However, this doesn’t always work. This coercion is actually the cause of one of the more common warnings you may have seen.

What will happen when I run the following?

as.numeric(c("Will", "I", "work?"))
[1] NA NA NA
Warning message:
NAs introduced by coercion 

Words are not numbers, so it fails!

Here is where we can really start to do odd things given our knowledge of R. the as.xxxx() functions are basically just telling R to look at a vector a different way. If we know what is happening under the hood, we can do it ourselves without the need of the function.

Try running the following. What happened? Explain the process.

test_vec = c(1, 3, 3, 7)
class(test_vec) = "character"

Just like how we can subset elemetns from a vector or assign to it depending on what side of the equal sign our objects are, we can do the same with object classes. Rather than use an as.xxxx() function to turn this numeric vector into a character, we changed by editing the class ourselves.

Attributes

Attributes are another way R understands objects. You may not have knowingly interacted with them very often, but they are one of the main ways R decides what to do when given an object. There are a few pre-defined attributes, the most common of which being names. The names attribute is what makes named vectors work. For example:

# make a named vector (weather forecast in degrees F)
named_vec = c("Mon" = 39, "Tues" = 36, "Wed" = 44, "Thur" = 36, "Fri" = 44, "Sat" = 45, "Sun" = 37)
named_vec
 Mon Tues  Wed Thur  Fri  Sat  Sun 
  39   36   44   36   44   45   37 

Each of the values in the vector are named, such that when they are shown, the corresponding name will be shown above them. We can still using it like a normal numeric vector in every way though!

# quick maths
named_vec * 2
 Mon Tues  Wed Thur  Fri  Sat  Sun 
  78   72   88   72   88   90   74 

We can see how these names are stored using the str() or “structure” function. You may have used this to preview dataframes before. In this case, we can see the vector, and beneath it is shows all the attributes attached to it.

str(named_vec)
 Named num [1:7] 39 36 44 36 44 45 37
 - attr(*, "names")= chr [1:7] "Mon" "Tues" "Wed" "Thur" ...

names is a special attribute that R recognizes, but you can create your own custom attributes too.

Using the attr() function, add a new attribute to the named_vec from above.

attr(named_vec, “happy_score”) = c(1, 1, 3, 3, 4, 5, 2)

Lists

Thus far we have been working with vectors, which are commonly used as components of larger data structures. The one you are probably most familiar with is dataframes. Take for example the following:

# get the built in mtcars data
cars = mtcars

# subset suing $
mtcars$mpg
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4
# subset using []
mtcars[, "mpg"]
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4

Both of these are ways to subset out one column of a dataframe, which is a vector.

Working with a shallow list, one with only one layer of depth, is much the same. Take for example the list below. It contains three elements: 1) a numeric vector, 2) a character vector, and 3) a dataframe.

# make an example list to work with
example_list = list("numbers" = c(1, 2, 5, 7, 3, 5),
                    "letters" = c("y", "u", "n", "r", "t", "b"),
                    "df" = mtcars)

When trying to subset elements of a list, we can similarly use both $ and []. However, there is one major difference, and that is in how we use the brackets []. When using the brackets on a list, one set of brackets (list_name[]) will get us the element of the list we request, while two sets (list_name[[]]) will get the contents of that element. See for example the differences below:

# get element 1
example_list[1]
$numbers
[1] 1 2 5 7 3 5
# get contents of element 1
example_list[[1]]
[1] 1 2 5 7 3 5

The first call returns a list of length 1, which contains the numbers vector, while the second actually returns the numbers vector itself. Forgetting this will cause you a lot of grief later on!

With that importance difference in mind, we can use any of the following strategies to get the numbers vector out of our list.

# using $
example_list$numbers
[1] 1 2 5 7 3 5
# using name
example_list[["numbers"]]
[1] 1 2 5 7 3 5
# using position
example_list[[1]]
[1] 1 2 5 7 3 5

Now that we can get out a vector, we can subset as normal from there.

From the example list we just made, subset the following:

  • The number 7 from the number vector
  • The letters “y” and “t” from the character vector
  • This entire mtcars dataframe
  • example_list$numbers[4] / example_list[[“numbers]][4] / example_list$numbers[4]
  • example_list$letters[c(1, 5)] / example_list[[“letters”]][c(1, 5)] / example_list[[2]][c(1, 5)]
  • example_list$df / example_list[[“df”]] / example_list[[3]]