I have recently published a step-by-step guide to serverless model deployments with Amazon SageMaker Pipelines, Amazon API Gateway, and AWS Lambda.
With AWS Lambda, you pay only for what you use. Lambda charges based on the number of requests, execution duration, and amount of memory allocated to the function. So how much memory should you allocate to your inference function?
In this post, I will show how you can use SageMaker Hyperparameter Tuning (HPO) jobs and a load-testing tool to automatically optimize the price/performance ratio of your serverless inference service.
Deploying some of your ML models into serverless architectures allows you to create scalabale inference services, eliminate operational overhead, and move faster to production. I have published examples here and here showing how you can adopt such architecture in your projects.
In this post, we will go a step further and automate the deployment of such serverless inference service using Amazon SageMaker Pipelines.
With SageMaker Pipelines, you can accelerate the delivery of end-to-end ML projects. It combines ML workflow orchestration, model registry, and CI/CD into one umbrella so you can quickly get your models into production.
I have recently published a post explaining core concepts on how to deploy an ML model in a serverless inference service using AWS Lambda, Amazon API Gateway, and the AWS CDK.
For some use cases you and your ML team may need to implement a more complex inference workflow where predictions come from multiple models and are orchestrated with a DAG. On AWS, Step Functions Synchronous Express Workflows allow you to easily build that orchestration layer for your real-time inference services.
Amazon SageMaker Studio is a fully integrated IDE unifying the tools needed for ML development. With Studio you can write code, track experiments, visualize data, and perform debugging and monitoring in a Jupyterlab-based interface. SageMaker manages the creation of the underlying instances and resources so you can get started quickly in your ML project.
When creating or launching a Notebook, an Interactive Shell, or a Terminal based on a SageMaker Image, the resources run as Apps on Amazon EC2 instances for which you incur cost, and you must shut them down to stop the metering.
After training an R model, you and your ML team might explore ways to deploy it as an inference service. AWS offers many options for this so you can adapt the deployment scenario to your needs. Among those, adopting a serverless architecture allows you to build a scalable R inference service while freeing your team from the infrastructure management.
Jupyter Notebooks provide useful environments to interactively explore and experiment during an ML project. However, by helping many teams deliver ML solutions for large enterprises on AWS, I often noticed a time in the project when data scientists and ML engineers needed to work with a full-fledged cloud-based IDE offering better code-completion and debugging capabilities for containers running in SageMaker.
Amazon SageMaker is a fully managed service bringing together a broad set of capabilities to help…
ML Specialist Solutions Architect