Spring 2023
Smith College
Learn what the basics of remote computing.
Let’s compare an average laptop with the Smith R Studio Server:
Servers are just on a whole different level than personal machines
While working on a remote server can be a huge boon to large tasks, it is not a simple endeavor
Aside from having to exist, using a server adds some friction to the data science workflow
However, sometimes you just need more compute
Working on a server comes with some responsibilities.
There are only so many resources; the more you take the less other have
If you are too greedy, you can crash the server for everyone–killing other people’s jobs
The college actually has several servers which you may not know about
Each servers a specific role, and you need permission to use any of them
I’ve registered you all for the R Studio server for this class
Smith College hosts an R Studio server students and faculty can use
You need an account to do so (I’ve made one for all of you)
You can access it by going to rstudio.smith.edu
in your browser
We don’t want to use the GUI (most servers won’t have one)
We use ssh
(secure shell) to create an encrypted connection from our machine to the server
We can log in to the server on the command line using your Smith credentials, you must be on the same network
We can make things easier on ourselves by using the SSH key we created at the start of the class
Remember, SSH keys are identifiers for your computer, so can be like an alternative password identifying your machine
Just make sure you keep your keys safe!
Copy SSH Key
ssh-copy-id jjoseph34@rstudio.smith.edu
Create Shortcut (./.ssh/config)
Host srs
HostName rstudio.smith.edu
User jjoseph34
It is important to recognize on a remote sever, you are a user
This means you have limited access to files, what code you can run, and what you can download and install
Often, this means you will need the server admins to do some tasks for you, such as installing new software or creating shared directories
Most of your regular commands will work on a server (ls, cd, etc.)
Some become more important however, such as top
or htop
These will help you know what is currently running on the server and how many resources are free
htop
htop -u <YOUR NAME>
lshw -short
We can use a program called sftp
to move files between local and remote
It works similarly to ssh
in that it creates a connection between the computers
However, it allows you to navigate both computers at the same time for the purpose of transferring files
sftp
Commandsls/lls
cd/lcd
get
put
You can run code normally on a remote server
However, given the shared nature, you can also be NICE
NICE sets a priority to jobs, such that you can rank how important a specific command is
nice -n <NUM> <COMMAND>
Lower values == More Critical
A common workflow for using remote servers would be as follows:
Remember, SSH keys are identifies for specific machines
Because working on a remote server is a different machine you will need to set up a new SSH key
You will then need to add this SSH key to your GitHub account
Otherwise, you won’t be able to push or pull anything
Recap & Mid-Semester Review
SDS 270: Advanced Programming for Data Science