Day 18 - Remote Servers

Spring 2023

Smith College

Overview

Timeline

  • What is “Remote”
  • Getting on Remote
  • Working on Remote

Goal

Learn what the basics of remote computing.

What is “Remote”

Local vs. Remote

Local

Remote

Why Use Remote Compute

Let’s compare an average laptop with the Smith R Studio Server:


Local

  • ~8 Cores
  • ~16 Threads
  • ~8GB RAM
  • ~500GB Storage

Smith R Studio Server

  • 48 Cores
  • 96 Threads
  • 64GB RAM
  • 6.6TB Storage


Servers are just on a whole different level than personal machines

Considerations with Remote

While working on a remote server can be a huge boon to large tasks, it is not a simple endeavor


Aside from having to exist, using a server adds some friction to the data science workflow


However, sometimes you just need more compute

Limitations

  • Server needs to exist ($$$)
  • You need access and permissions
  • Often need others to do tasks for you
  • Sometimes offline/broken
  • Need to be on the same network
  • Need to make your code work on any machine
  • Need to work in a shared environment

Responsibilities of a Shared Resource

Working on a server comes with some responsibilities.


There are only so many resources; the more you take the less other have


If you are too greedy, you can crash the server for everyone–killing other people’s jobs

How to not be a Jerk

  • Check what is running already and how much of the server is free
  • Don’t use too many cores
  • Don’t use too much RAM
  • Be NICE with your jobs (more later)
  • Keep your user folder clean; delete old files

Smith Servers

The college actually has several servers which you may not know about


Each servers a specific role, and you need permission to use any of them


I’ve registered you all for the R Studio server for this class

Smith Servers

  • R Studio
  • R Studio 2
  • General Linux
  • Faculty Linux
  • Jupyter
  • Gitlab
  • More?

Getting on Remote

A Primer on the R Studio Server

Smith College hosts an R Studio server students and faculty can use


You need an account to do so (I’ve made one for all of you)


You can access it by going to rstudio.smith.edu in your browser

Login with Password

We don’t want to use the GUI (most servers won’t have one)


We use ssh (secure shell) to create an encrypted connection from our machine to the server


We can log in to the server on the command line using your Smith credentials, you must be on the same network

Login with SSH

We can make things easier on ourselves by using the SSH key we created at the start of the class


Remember, SSH keys are identifiers for your computer, so can be like an alternative password identifying your machine


Just make sure you keep your keys safe!

Copy SSH Key

ssh-copy-id jjoseph34@rstudio.smith.edu


Create Shortcut (./.ssh/config)

Host srs
    HostName rstudio.smith.edu
    User jjoseph34

Working on Remote

Being a User

It is important to recognize on a remote sever, you are a user


This means you have limited access to files, what code you can run, and what you can download and install


Often, this means you will need the server admins to do some tasks for you, such as installing new software or creating shared directories

Understanding the Machine

Most of your regular commands will work on a server (ls, cd, etc.)


Some become more important however, such as top or htop


These will help you know what is currently running on the server and how many resources are free

Useful Commands

htop
See the current jobs running on the server and how many resources they are using
htop -u <YOUR NAME>
You can specify your user id to limit it to just your processes
lshw -short
This lets you know what hardware the machine is running

Moving Files to/from Remote

We can use a program called sftp to move files between local and remote


It works similarly to ssh in that it creates a connection between the computers


However, it allows you to navigate both computers at the same time for the purpose of transferring files

Useful sftp Commands

ls/lls
Lists files on remote/local
cd/lcd
Change directory on remote/local
get
Move files from remote to local
put
Move files from local to remote

Running Code

You can run code normally on a remote server


However, given the shared nature, you can also be NICE


NICE sets a priority to jobs, such that you can rank how important a specific command is

Being NICE

nice -n <NUM> <COMMAND>

Lower values == More Critical

Remote Workflow

A common workflow for using remote servers would be as follows:

  1. Code your project on local
  2. Push to GitHub
  3. Clone to remote
  4. Copy data to remote
  5. Run code on remote
  6. Retrieve results

A Note about GitHub

Remember, SSH keys are identifies for specific machines


Because working on a remote server is a different machine you will need to set up a new SSH key


You will then need to add this SSH key to your GitHub account


Otherwise, you won’t be able to push or pull anything

Code-Along

For Next Time

Topic

Recap & Mid-Semester Review

To-Do

  • Complete Worksheet