Hosting VS Code on SageMaker

Step-by-step guide to set it up in your environment

Sofian Hamiti
Towards Data Science

--

Joint post with Prayag Singh and Phil Meakins.

ML teams need the flexibility to choose notebooks or full-fledge IDE when working on a project. They may even use multiple IDEs in the same project.

It’s a bit like climbing a mountain with the appropriate equipment. It makes the climb easier and gives more chances to summit.

Photo by Raimond Klavins on Unsplash

So far in SageMaker you could pick between Jupyter and RStudio. In this post, I will show how you can host VS Code first on a Studio environment, and also on Notebook Instances.

Walkthrough overview

We will tackle this with the following steps:

In Option A: Hosting VS Code on Studio

  • We will first see where VS Code can run in the SageMaker Studio architecture, and key considerations for this approach.
  • Then, we will go through the VS Code install process for a Studio user.
  • Finally, I will show how Studio admins can use a lifecycle configuration to automate the install across all Studio users.

In Option B: We will install VS Code on a Notebook Instance.

Prerequisites

To go through this example, make sure you have the following:

For Option A: Hosting VS Code on Studio

  1. Familiarity with its underlying architecture. You can visit Dive deep into Amazon SageMaker Studio Notebooks architecture if it sounds new to you.
  2. We will use code-server to install and run VS Code in the cloud. Make sure you are familiar with this implementation.
  3. In Step 3, will automate the VS Code install using a lifecycle configuration. Make sure you have read Customize Amazon SageMaker Studio using Lifecycle Configurations before continuing.

For Option B: Install VS Code on a Notebook Instance

  1. Access to a SageMaker Notebook Instance. Make sure it runs Amazon Linux 2.
  2. This shell script on the Notebook Instances. It will help us install VS Code and setup the Jupyter Proxy so we can access it.

Option A: Hosting VS Code on Studio

Step 1: Where should VS Code be installed on Studio?

Architecture overview

Code-server allows us to access VS Code from a browser, while having it hosted in Studio. So first, we need to figure out where in Studio we will install and run it.

Below is the architecture overview for the setup:

Image by author

SageMaker Studio runs the JupyterLab UI in a JupyterServer, decoupled from notebook kernels. The JupyterServer comes with a proxy and allows us to access VS Code on a browser. This is where we will host code-server.

Key considerations for this approach

Here are 4 things about the JupyterServer you should keep in mind:

  • It is hosted on a small, fixed-size instance allowing minimal local compute.
  • It runs inside a docker container and you cannot do docker-in-docker.
  • The container image is managed and reinitialized each time you stop/start the JupyterServer app.
  • You can install libraries using yum or pip.
Image by author: Leave JupyterServer for development, and run the larger compute in SageMaker jobs.

Launch pip install sagemaker to get the SageMaker python SDK.

Step 2: Installing and running VS Code for a Studio user

Now that you know where VS Code will be hosted, let’s install and run it in a Studio user profile. We will do it in 3 steps:

Launch a System Terminal from your Studio launcher

“System Terminal” is the terminal for the JupyterServer.

Image by author: Click on the Terminal icon in your Studio launcher

Install and run code-server

You can run scrip below in your System Terminal to install code-server:

The script should take a few seconds to run and that is all for this part. When the install is done, you can run code-server with the following command:

code-server --auth none --disable-telemetry

Make sure you run the command from /home/sagemaker-user, the main folder for Studio users.

Image by author: You should see the following after running the code-server command.

Notes

  • We create a vscode/ folder to avoid permission errors. Code-server does not actually install in it. Also, we use a default install and feel free to customize yours if needed.
  • We run code-server with --auth none for illustrative purposes. You may use password or other means to secure the access in your environment. See the code-server documentation for more details.

Access VS Code on your Browser

Now, all you need is to copy your Studio URL, change it a bit, and past it into a new tab:

Images by author: On the left is my Studio original URL. I replaced /lab? with /proxy/8080/ and pasted it in a new browser tab.
Image by author: VS Code will load and you can open the /home/sagemaker-user folder in it.

You can install the Python extension and get to work in your ML project!

Step 3: Automating the install across Studio users

As a Studio admin, you can let each user install VS Code on their own. The setup does not require changes to the domain. However, if you support hundreds of data scientists you can automate the setup for all of them.

Here we will create a lifecycle configuration from our script, and attach it to a Studio domain. The install will be inherited by all user in the domain.

Convert the install script to a base64 encoded string

First, we need to download the install script into our environment and convert it to a base64 string using the following command:

LCC_CONTENT=`openssl base64 -A -in on-jupyter-server-start.sh`

Attach the lifecycle configuration to the Studio domain

Then, we use the AWS CLI to create the domain lifecycle configuration:

aws sagemaker create-studio-lifecycle-config \
--studio-lifecycle-config-name install-vscode-on-jupyterserver \
--studio-lifecycle-config-content $LCC_CONTENT \
--studio-lifecycle-config-app-type JupyterServer

You can also use this Boto3 command if needed. Just make sure to execute the command in the same region as your domain is in.

Note - Existing Studio users will need to stop/start their JupyterServer app to see the change.

Option B: Hosting VS Code on Notebook Instances

The setup on Notebook Instances is quick. You can launch a terminal via the Jupyter/Jupyterlab of your Notebook Instance.

If you use Jupyter, the button should be on the top-right corner:

Image from author

Then, you will need to execute this script on the terminal with the following command:

sudo sh install_vscode.sh

The command should take a few seconds to run.

Image by author: You can close the terminal tab when you see the following.

Now reload the Jupyter page and check the New button on the top-right corner:

Image by author: You should see a VS Code option under the New button.

Clicking on the VS Code button will open VS Code in a new browser tab. You can install the Python extension for VS Code and get to work in your ML project!

Image by author

You can also add the script to a Lifecycle Configuration, and automate the setup for when the instance restarts.

Conclusion

ML teams need the flexibility to work with notebooks or full-fledge IDE when working on a project.

In this post, I showed how you can install VS Code both on SageMaker Studio, and Notebook Instances. for a single user and for hundreds of them using a lifecycle configuration.

You can also read Industrializing an ML platform with Amazon SageMaker Studio, and learn how enterprise ML platform teams can organize, standardize, and expedite Studio environments provisioning.

--

--