MLOps with MLFlow and Amazon SageMaker Pipelines

Step-by-step guide to using MLflow with SageMaker projects

Sofian Hamiti

Follow

Published in

Towards Data Science

6 min readJul 25, 2021

--

Earlier this year, I published a step-by-step guide to deploying MLflow on AWS Fargate, and using it with Amazon SageMaker. This can help streamline the experimental phase of an ML project.

In this post, we will go a step further and automate an end-to-end ML lifecycle using MLflow and Amazon SageMaker Pipelines.

SageMaker Pipelines combines ML workflow orchestration, model registry, and CI/CD into one umbrella so you can quickly get your models into production.

We will create an MLOps project for model building, training, and deployment to train an example Random Forest model and deploy it into a SageMaker Endpoint. We will update the modelBuild side of the project so it can log models into the MLflow model registry, and the modelDeploy side so it can ship them to production.

Walkthrough overview

We will tackle this in 3 steps:

We will first deploy MLflow on AWS and launch an MLOps project in SageMaker.
Then we will update the modelBuild pipeline so we can log models into our MLflow model registry.
Finally, I will show how you can deploy the MLflow models into production with the modelDeploy pipeline.

Below is the architecture overview for the project:

Prerequisites

To go through this example, make sure you have the following:

Visited Introducing Amazon SageMaker Pipelines if SageMaker Pipelines sound new to you.
Familiarity with Managing your machine learning lifecycle with MLflow and Amazon SageMaker and its example lab.
Access to an Amazon SageMaker Studio environment and be familiar with the Studio user interface.
Docker to build and push the MLFlow inference container image to ECR.
This GitHub repository cloned into your studio environment

Step 1: Deploying MLflow on AWS and launching the MLOps project in SageMaker

Deploying MLflow on AWS Fargate

First, we need to set up a central MLflow tracking server so we can use it in our MLOps project.

If you don’t have one, you can follow instructions and blog explanations to deploy the open source version of MLflow on AWS Fargate.

Image by author: Hosting MLflow on AWS Fargate, with Amazon S3 as artifact store, and Amazon RDS for MySQL as backend store.

Image by author: Once deployed, make sure you keep the load balancer URI somewhere. We will use it in our MLOps project so the pipelines can talk to MLflow.

Launching your MLOps project

Now, we need to launch a SageMaker project based on the MLOps template for model building, training, and deployment.

You can follow Julien Simon’s walk through video to do this:

The project template will create 2 CodeCommit repos for modelBuild and modelDeploy, 2 CodePipeline pipelines for CI and CD, CodeBuild projects for packaging and testing artifacts, and other resources to run the project.

Image by author: You can clone the repos in your environment.

Allowing the project to access the MLflow artifact store

We use Amazon S3 as artifact store for MLflow and you will need to update the MLOps project role, so it can access the MLflow S3 bucket.

The role is called AmazonSageMakerServiceCatalogProductsUseRole and you can update its permissions like I did below:

Image by author: I use managed policies for this example. You may tighten permissions in your environment.

Step 2: Updating the modelBuild pipeline to log models into MLflow

After cloning the modelBuild repository into your environment you can update the code with the one from the model_build folder.

Images by author: How your modelBuild repo should look like before (left) and after updating it (right)

You can find the example ML pipeline in pipeline.py. It has 2 simple steps:

PrepareData gets the dataset from sklearn and splits it into train/test sets
TrainEvaluateRegister trains a Random Forest model, logs parameters, metrics, and the model into MLflow.

At line 22 of pipeline.py, make sure you add your MLflow load balancer URI to the pipeline parameter. It will be passed to TrainEvaluateRegister so it knows where to point to find MLflow.

You can now push the updated code to the main branch of the repo.

Image by author: Your pipeline will be updated and take a few minutes to execute.

From now on, the pipeline will register a new model version in MLflow at each execution.

To further automate, you can schedule the pipeline with Amazon EventBridge, or use other type of triggers with the start_pipeline_execution method.

Step 3: Deploying the MLflow models into production with the modelDeploy pipeline

We can now bring new model versions to the MLflow model registry, and will use the modelDeploy side of the MLOps project to deploy them into production.

Pushing the inference container image to ECR

Alongside the ML model, we need a container image to handle the inference in our SageMaker Endpoint. Let’s push the one provided by MLflow into ECR. Make sure this is done in the same AWS region as your MLOps project is in.

In my case, I push it from my laptop using the following commands:

pip install -q mlflow==1.23.1mlflow sagemaker build-and-push-container

Updating the modelDeploy repo

Next, you can update the modelDeploy repo with the code from this folder.

Images by author: How your modelDeploy repo should look like before (left) and after updating it (right)

In buildspec.yml, you can define the model version to deploy into production. You will also need to input your MLflow load balancer URI, and Inference container URI.

I updated build.py to get the chosen model version binary from MLflow, and upload its model.tar.gz to S3.

This is done by mlflow_handler.py, which also transitions model stages in MLflow, as models go through the modelDeploy Pipeline.

Triggering the deployment

You can now push the code to the main branch of the repo, which will trigger the modelDeploy pipeline in CodePipeline. Once testing is successful in staging, you can navigate to the CodePipeline console and manually approve the endpoint to go to production.