MLOps with MLFlow and Amazon SageMaker Pipelines
Step-by-step guide to using MLflow with SageMaker projects
Earlier this year, I published a step-by-step guide to deploying MLflow on AWS Fargate, and using it with Amazon SageMaker. This can help streamline the experimental phase of an ML project.
In this post, we will go a step further and automate an end-to-end ML lifecycle using MLflow and Amazon SageMaker Pipelines.
SageMaker Pipelines combines ML workflow orchestration, model registry, and CI/CD into one umbrella so you can quickly get your models into production.
We will create an MLOps project for model building, training, and deployment to train an example Random Forest model and deploy it into a SageMaker Endpoint. We will update the modelBuild side of the project so it can log models into the MLflow model registry, and the modelDeploy side so it can ship them to production.
Walkthrough overview
We will tackle this in 3 steps:
- We will first deploy MLflow on AWS and launch an MLOps project in SageMaker.
- Then we will update the modelBuild pipeline so we can log models into our MLflow model registry.
- Finally, I will show how you can deploy the MLflow models into production with the modelDeploy pipeline.
Below is the architecture overview for the project:
Prerequisites
To go through this example, make sure you have the following:
- Visited Introducing Amazon SageMaker Pipelines if SageMaker Pipelines sound new to you.
- Familiarity with Managing your machine learning lifecycle with MLflow and Amazon SageMaker and its example lab.
- Access to an Amazon SageMaker Studio environment and be familiar with the Studio user interface.
- Docker to build and push the MLFlow inference container image to ECR.
- This GitHub repository cloned into your studio environment
Step 1: Deploying MLflow on AWS and launching the MLOps project in SageMaker
Deploying MLflow on AWS Fargate
First, we need to set up a central MLflow tracking server so we can use it in our MLOps project.
If you don’t have one, you can follow instructions and blog explanations to deploy the open source version of MLflow on AWS Fargate.
Launching your MLOps project
Now, we need to launch a SageMaker project based on the MLOps template for model building, training, and deployment.
You can follow Julien Simon’s walk through video to do this:
The project template will create 2 CodeCommit repos for modelBuild and modelDeploy, 2 CodePipeline pipelines for CI and CD, CodeBuild projects for packaging and testing artifacts, and other resources to run the project.
Allowing the project to access the MLflow artifact store
We use Amazon S3 as artifact store for MLflow and you will need to update the MLOps project role, so it can access the MLflow S3 bucket.
The role is called AmazonSageMakerServiceCatalogProductsUseRole and you can update its permissions like I did below:
Step 2: Updating the modelBuild pipeline to log models into MLflow
After cloning the modelBuild repository into your environment you can update the code with the one from the model_build folder.
You can find the example ML pipeline in pipeline.py. It has 2 simple steps:
- PrepareData gets the dataset from sklearn and splits it into train/test sets
- TrainEvaluateRegister trains a Random Forest model, logs parameters, metrics, and the model into MLflow.
At line 22 of pipeline.py, make sure you add your MLflow load balancer URI to the pipeline parameter. It will be passed to TrainEvaluateRegister so it knows where to point to find MLflow.
You can now push the updated code to the main branch of the repo.
From now on, the pipeline will register a new model version in MLflow at each execution.
To further automate, you can schedule the pipeline with Amazon EventBridge, or use other type of triggers with the start_pipeline_execution method.
Step 3: Deploying the MLflow models into production with the modelDeploy pipeline
We can now bring new model versions to the MLflow model registry, and will use the modelDeploy side of the MLOps project to deploy them into production.
Pushing the inference container image to ECR
Alongside the ML model, we need a container image to handle the inference in our SageMaker Endpoint. Let’s push the one provided by MLflow into ECR. Make sure this is done in the same AWS region as your MLOps project is in.
In my case, I push it from my laptop using the following commands:
pip install -q mlflow==1.23.1mlflow sagemaker build-and-push-container
Updating the modelDeploy repo
Next, you can update the modelDeploy repo with the code from this folder.
In buildspec.yml, you can define the model version to deploy into production. You will also need to input your MLflow load balancer URI, and Inference container URI.
I updated build.py to get the chosen model version binary from MLflow, and upload its model.tar.gz to S3.
This is done by mlflow_handler.py, which also transitions model stages in MLflow, as models go through the modelDeploy Pipeline.
Triggering the deployment
You can now push the code to the main branch of the repo, which will trigger the modelDeploy pipeline in CodePipeline. Once testing is successful in staging, you can navigate to the CodePipeline console and manually approve the endpoint to go to production.
When deploying a new version, the pipeline will archive the previous one.
And you can see your SageMaker Endpoints ready to generate predictions.
Conclusion
Amazon SageMaker Pipelines brings MLOps tooling into one umbrella to reduce the effort of running end-to-end MLOps projects.
In this post, we used a SageMaker MLOps project and the MLflow model registry to automate an end-to-end ML lifecycle.
To go further, you can also learn how to deploy a Serverless Inference Service Using Amazon SageMaker Pipelines.