Spring 2023
Smith College
Learn some tools to measure and compare code segments.
A benchmark is a controlled test to see how well your code runs.
You are trying to quantify the efficiency of your code, so you can compare it against other methods.
You are also often looking for bottlenecks, or parts of your code that slow down the whole process.
You may recall we’ve actually done some simple benchmarks before!
When we were talking about parallelizing code, we used tictoc
to see how long it took code to run.
That’s pretty helpful on it’s own!
Running sequentially, takes about 37 seconds on my desktop.
We want efficient code, not just fast.
bench
PackageThe bench
package provides a simple function, mark()
to track how long code takes to run.
It will run the contained code several times to get an average.
It will also tell you how much memory the code uses.
Still takes about 37 seconds on my desktop, but will run multiple times to average.
Apply
For Loop
Tidy
Type | min | median | itr/sec | mem_alloc |
---|---|---|---|---|
Apply | 4.84ms | 4.98ms | 196.80240 | 121.9KB |
For Loop | 14.43ms | 14.85ms | 65.95779 | 96.1KB |
purrr | 4.92ms | 5.06ms | 193.52338 | 393.6KB |
The difference is a semantic one.
Broadly, benchmarks look at individual functions, profiling looks at whole code flows.
Think of it as the difference between tracking a race and a marathon.
Benchmark == Race
Profile == Marathon
Here is the profile of a section of my research code on asset forfeiture.
The top part of the diagram shows how long each line of code takes, as well as memory usage.
The bottom plots how many functions are being run nested inside each other.
There are several tools to profile with in R. I will reccomend profvis
.
It creates interactive profile plots that give you an at-a-glance view of code sequences.
I’ve included a small example here:
A few rules of thumb for more efficient code:
Lab 8
SDS 270: Advanced Programming for Data Science