In most large enterprises, standardizing, provisioning, and ensuring governance of ML environments is the responsibility of central IT teams.
I have recently published a guide to continuous delivery of custom images IT teams can use when setting up SageMaker Studio for their end-users.
In this post, we will go a step further and use the AWS Service Catalog to enable self-service provisioning of approved SageMaker Studio environments.
NVIDIA Triton Inference Server is an open-source software ML teams can deploy their models with. It supports model formats from Tensorflow, PyTorch, ONNX, other popular frameworks, and provides a set of features to manage them.
You can host Triton in multiple ways on AWS. In this post, I show how you can deploy it on Amazon ECS, using the AWS CDK.
Amazon SageMaker Studio is a fully integrated IDE unifying the tools needed for managing your ML projects and collaborating with your team members.
Alongside providing pre-built images for running your notebooks, SageMaker Studio allows you to create containers with your favourite libraries and attach them as custom images to your domain.
In most large enterprises, ML platform administrators will manage those custom images to ensure only approved libraries are used by the Studio users. This can represent operational overhead for admins, if done manually.
In this post I show how you can automate a Studio domain setup by implementing simple…
I have recently published a step-by-step guide to serverless model deployments with Amazon SageMaker Pipelines, Amazon API Gateway, and AWS Lambda.
With AWS Lambda, you pay only for what you use. Lambda charges based on the number of requests, execution duration, and amount of memory allocated to the function. So how much memory should you allocate to your inference function?
In this post, I will show how you can use SageMaker Hyperparameter Tuning (HPO) jobs and a load-testing tool to automatically optimize the price/performance ratio of your serverless inference service.
Deploying some of your ML models into serverless architectures allows you to create scalabale inference services, eliminate operational overhead, and move faster to production. I have published examples here and here showing how you can adopt such architecture in your projects.
In this post, we will go a step further and automate the deployment of such serverless inference service using Amazon SageMaker Pipelines.
With SageMaker Pipelines, you can accelerate the delivery of end-to-end ML projects. It combines ML workflow orchestration, model registry, and CI/CD into one umbrella so you can quickly get your models into production.
I have recently published a post explaining core concepts on how to deploy an ML model in a serverless inference service using AWS Lambda, Amazon API Gateway, and the AWS CDK.
For some use cases you and your ML team may need to implement a more complex inference workflow where predictions come from multiple models and are orchestrated with a DAG. On AWS, Step Functions Synchronous Express Workflows allow you to easily build that orchestration layer for your real-time inference services.
Amazon SageMaker Studio is a fully integrated IDE unifying the tools needed for ML development. With Studio you can write code, track experiments, visualize data, and perform debugging and monitoring in a Jupyterlab-based interface. SageMaker manages the creation of the underlying instances and resources so you can get started quickly in your ML project.
When creating or launching a Notebook, an Interactive Shell, or a Terminal based on a SageMaker Image, the resources run as Apps on Amazon EC2 instances for which you incur cost, and you must shut them down to stop the metering.
After training an R model, you and your ML team might explore ways to deploy it as an inference service. AWS offers many options for this so you can adapt the deployment scenario to your needs. Among those, adopting a serverless architecture allows you to build a scalable R inference service while freeing your team from the infrastructure management.
Jupyter Notebooks provide useful environments to interactively explore and experiment during an ML project. However, by helping many teams deliver ML solutions for large enterprises on AWS, I often noticed a time in the project when data scientists and ML engineers needed to work with a full-fledged cloud-based IDE offering better code-completion and debugging capabilities for containers running in SageMaker.
Amazon SageMaker is a fully managed service bringing together a broad set of capabilities to help…
ML Solutions Architect