Day 20 - Package Creation

Spring 2023

Smith College

Overview

Timeline

  • Package Purpose
  • Anatomy of a Package
  • Rules for Package Development

Goal

To understand what a package is and the basics of their creation.

Package Purpose

Themed Code

Packages compile a suite of functions that work together.


Some packages are small, others are huge, but all are cohesive in their purpose.


You (probably) wouldn’t include a function on analyzing the stock market in a package about penguins.

Example themes:

dplyr
Data manipulation
ggplot2
Plotting
plotly
Interactive plotting
rvest
Web scraping
pdftools
PDF tools

Documentation

These themed code collections are documented.


Every help file you’ve opened in R was written by someone! If you think they were bad, do better!


Packages also often have vignettes or other examples.

Share-able

Packages are share-able, meaning they can be installed in standardized ways.

CRAN

GitHub

Open Source

With rare exceptions based on dependencies, all R code is open-source.


This means other people will be ablse to see everything related to your package.


This has pros and cons, but means that the whole community can contribute and learn.

Anatomy of a Package

Overview

All packages share a common structure.


This structure is baked in to the file structure of the package directory.


This means you must place and name files exactly where they need to be or things will break.

R Project Name
.
├── R/
│   └── func1.R
├── man/
│   └── func1.Rd
├── inst/
│   ├── data/
│   │   └── example_data1.csv
│   └── other
├── vignettes/
│   └── v1.rmd
├── docs/
│   └── v1.html
├── tests/
│   └── testthat/
│       └── test-func1.R
├── NAMESPACE
├── DESCRIPTION
├── LICENSE
├── NEWS.md
├── README.Rmd 
├── .gitignore
└── .Rbuildignore

./R/ - R code

A package is a vehicle to share R code.


Everything else supports that function.


R code must go into the R directory, within *.R scripts (no quarto/markdown!).


All R code in this directory is run when the package is loaded.

R Project Name
.
├── R/
│   └── func1.R
├── man/
│   └── func1.Rd
├── inst/
│   ├── data/
│   │   └── example_data1.csv
│   └── other
├── vignettes/
│   └── v1.rmd
├── docs/
│   └── v1.html
├── tests/
│   └── testthat/
│       └── test-func1.R
├── NAMESPACE
├── DESCRIPTION
├── LICENSE
├── NEWS.md
├── README.Rmd 
├── .gitignore
└── .Rbuildignore

./man/ - Help Files

The man directory contains all of the help files for your code.


These are the files loaded when using the ? or help() function in R.


You need to write them, but we have tools to help with the formatting later.

R Project Name
.
├── R/
│   └── func1.R
├── man/
│   └── func1.Rd
├── inst/
│   ├── data/
│   │   └── example_data1.csv
│   └── other
├── vignettes/
│   └── v1.rmd
├── docs/
│   └── v1.html
├── tests/
│   └── testthat/
│       └── test-func1.R
├── NAMESPACE
├── DESCRIPTION
├── LICENSE
├── NEWS.md
├── README.Rmd 
├── .gitignore
└── .Rbuildignore

./inst/ - Other Resources

inst contains other things you want your package to have access to in R.


This is where the example data lives for packages.


You have probably used things packages palmerspenguins, this is how it worked.


Can contain other incidentals.

R Project Name
.
├── R/
│   └── func1.R
├── man/
│   └── func1.Rd
├── inst/
│   ├── data/
│   │   └── example_data1.csv
│   └── other
├── vignettes/
│   └── v1.rmd
├── docs/
│   └── v1.html
├── tests/
│   └── testthat/
│       └── test-func1.R
├── NAMESPACE
├── DESCRIPTION
├── LICENSE
├── NEWS.md
├── README.Rmd 
├── .gitignore
└── .Rbuildignore

./vignettes/ - Detailed Walkthroughs

Vignettes are longer explanatory articles that explain the package to readers.


They often contain a minimal example of the package in action, and explains how this example works one step at a time.


Writing these can be difficult, but they are immensely useful to users.

R Project Name
.
├── R/
│   └── func1.R
├── man/
│   └── func1.Rd
├── inst/
│   ├── data/
│   │   └── example_data1.csv
│   └── other
├── vignettes/
│   └── v1.rmd
├── docs/
│   └── v1.html
├── tests/
│   └── testthat/
│       └── test-func1.R
├── NAMESPACE
├── DESCRIPTION
├── LICENSE
├── NEWS.md
├── README.Rmd 
├── .gitignore
└── .Rbuildignore

./tests/ - Unit Tests

tests is an “optional” directory which contains unit-tests for your code.


Unit tests essentially run your code and check the results against pre-determined outputs.


If you change something and one of these known outputs change, you know something broke.

R Project Name
.
├── R/
│   └── func1.R
├── man/
│   └── func1.Rd
├── inst/
│   ├── data/
│   │   └── example_data1.csv
│   └── other
├── vignettes/
│   └── v1.rmd
├── docs/
│   └── v1.html
├── tests/
│   └── testthat/
│       └── test-func1.R
├── NAMESPACE
├── DESCRIPTION
├── LICENSE
├── NEWS.md
├── README.Rmd 
├── .gitignore
└── .Rbuildignore

Metadata Files

The loose files in the R package are critical.


These contain license info, define what the package is called, and what other packages it depends on.


It also allows you to include your author information and other such attribution.

R Project Name
.
├── R/
│   └── func1.R
├── man/
│   └── func1.Rd
├── inst/
│   ├── data/
│   │   └── example_data1.csv
│   └── other
├── vignettes/
│   └── v1.rmd
├── docs/
│   └── v1.html
├── tests/
│   └── testthat/
│       └── test-func1.R
├── NAMESPACE
├── DESCRIPTION
├── LICENSE
├── NEWS.md
├── README.Rmd 
├── .gitignore
└── .Rbuildignore

Rules for Package Development

You are not Writing for Yourself!

If you are putting the effort into making a package, you want other people to use it!


Thus, things like good documentation, clean files, and clear examples become critical.


What would you like if you were trying to use a package like yours?

Don’t Pollute the Environment

You never want to alter the environment of the user, as you never know how that will impact them.


At best it would be annoying and inconsiderate, at worst it could seriously change the results of their code.


To these ends, you should never change things like options, the working directory, seed, or other environment conditions.

Avoid these:

  • library()
  • require()
  • source()
  • options()
  • par()
  • setwd()
  • Sys.setenv()
  • Sys.setlocale()
  • set.seed()
  • Etc.

No Side Effects

Unlike other languages, R typically expects that a single function will have a single output, with no other impacts on the environment.


While you can break this convention, it is generally not a good idea, especially in a package where other people will be expecting typical execution.

output <- function(input)


Nothing else!

Code-Along

For Next Time