SageMaker Studio Labs & Jupyter Made Easy

Dana Crane
5 min readApr 28, 2023

--

The Artificial Intelligence (AI) gold rush following the staggering success of ChatGPT has resulted in some unforeseen challenges: global chip shortages and a growing demand for Machine Learning (ML) model training is resulting in cloud-based AI app creators experiencing longer wait times due to limited server availability.

Even prior to ChatGPT, the demand for AI workloads was growing. For example, retailers running AI deployments increased by 29% from late 2019 to early 2021, while financial services models went up 37% according to KPMG.

Hardware supply constraints in an exploding software field like AI/ML means it’s more important than ever to ensure that you’re optimizing the way you work. Waiting for GPU resources to come available in order to run your routine just to see it fail due to something as simple as a misconfigured runtime is not a good use of an increasingly valuable resource.

While Amazon Web Service (AWS) is not immune to these shortages, its SageMaker Studio Lab is one of the most popular ways to create, train, and deploy ML models in the cloud. Whether you’re working with pre-trained models, training on your own data or developing your own algorithms with SageMaker’s Jupyter Notebook instances, it’s always best to start with an optimal Python runtime environment.

Just as developers are not necessarily ML experts, data scientists are not necessarily Python experts, with the time and know-how to create (and too often) troubleshoot their own runtime environments. This blog will teach you how to avoid the pitfalls, and get your Studio Lab environment up and running faster, with no programming experience or command line expertise required.

SageMaker Studio Lab Python Runtimes Made Easy

SageMaker Studio Lab is offered as a free CPU or GPU instance of Jupyter Notebook (no credit card required) that you can use to get your ML project off the ground quickly, and then migrate to AWS SageMaker Studio when you want to deploy.

SageMaker Studio Lab provides:

  • GitHub integration.
  • Preconfigured ML tools, frameworks, and libraries.
  • Auto-saving of your work so you don’t need to restart in between sessions.

While the Studio Lab setup process offers AWS’ “default Python” it provides you with little beyond just the required Jupyter packages. Of course you can always try to “pip install” other packages, but that can lead to conflicts, corrupt environments and require the installation of prebuilt binaries, which pose a supply chain security issue if you intend to take your concept to production.

Alternatively, you can avoid all these issues by using an ActiveState Python runtime, which automatically builds compatible-version packages from vetted source code, eliminating conflicts and corruption while delivering enterprise-grade software that you won’t need to abandon when your idea goes into production.

It just takes 5 steps to get started:

Step 1: Create a free account on www.platform.activestate.com/create-account

Step 2: Once logged in, click on the following URL:

https://platform.activestate.com/ActiveStateSE/SagemakerStudioLabStarter

And then click on the Fork It button:

Name the fork anything you like and place it into your organization:

Step 3 — You can now modify this runtime environment to add in whichever packages your project needs by clicking on the “add Packages” button and searching for them, or else copying and pasting your requirements.txt into the dialog that pops up when you click the “Import from File” button. The advantages you gain include:

  • The Platform will automatically resolve and pull in all dependencies, ensuring against conflicts.
  • The Platform will build all components from vetted source code, ensuring security.
  • You can share your environment easily with all your colleagues

Be careful to make sure the “Jupyter” package is not deleted, as this is a requirement of SageMaker Studio Labs.

For an example of how it can be modified, check out my fork to which I’ve added a number of common ML and data science packages, such as:

  • Bokeh — for data visualization
  • Keras — simplifies the creation of neural networks
  • Matplotlib — for data visualization
  • Nltk — for natural language process
  • numpy — for working with arrays
  • Pandas — for cleaning and manipulating data sets
  • Pillow — for working with images
  • Pluggy — for working with pytest
  • Pytorch — for working with neural networks
  • Scikit-image — for data visualization
  • Scikit-learn — for working with different algorithms
  • Scipy — for visualizing numpy data
  • Seaborn — for making statistical graphics
  • Tensorflow — for working with neural networks
  • Xgboost — for optimizing gradient boosting

Step 4: Go to the “Downloads Builds” tab of your project and make sure you are looking at the Linux Download option from the tabs on the left. Click the “Install via Terminal” tab and follow the directions to copy the text shown.

Just open your terminal in SageMaker Studio Labs, paste the text and hit enter. Your environment will be downloaded and installed in a virtual environment in SageMaker Studio Labs.

Step 5: Reload your Studio Lab instance by refreshing the browser and you will see your shiny new ActiveState environment ready for you in the launcher.

Conclusions — Enterprise Strength Python for SageMaker

Too many data scientists make the mistake of assuming that what they’re working on will never see the light of day, or else assume that developers will “fix things” before it gets to production. But unless you’re doing pure research, this is rarely the case.

Starting with enterprise-strength ActiveState Python is the best way to ensure you can go from concept to production as smoothly as possible. And with GPU instances at a premium these days, avoiding having to swap runtimes, retrain, retest and redeploy can be a significant time and cost savings.

--

--

Dana Crane
Dana Crane

Written by Dana Crane

With 25+ years in software, I’ve had my share of both crossing and falling into the chasm. I’m currently the Product marketing Mgr at ActiveState Software.

No responses yet