FROM ML EXPERIMENT TO MODEL SERVING API WITHIN 1 HOUR.
At Essent, we empower data scientists to be fully in control of the end-to-end lifecycle of data science models. Most of this lifecycle is covered within Databricks. However, for API deployments of machine learning models, the MLOps engineers have created a custom solution in which a data scientist can go from experiment to production grade API in less than one hour.
DATA SCIENCE ON DATABRICKS
The day-to-day work of data scientists within Essent mainly happens on Databricks. This includes experimentation with the use of notebooks, tracking data lineage & governance through Unity Catalog and track models using MLflow as the model registry.
While Databricks provides data scientists the tooling to easily run models within batch job workflows, deploying a model as a low latency microservice API on AWS requires a different set of skills beyond the traditional data science work. It involves knowledge of DevOps practices such as containerization, creating cloud resources and setting up API endpoints.
To bridge this gap and make the life of data scientists easier, we have created an API framework that enables data scientists to deploy their machine learning model as a microservice API on AWS.
HOW THE ML MODEL SERVING API FRAMEWORK WORKS
The API Framework enables data scientists to create and deploy inference API servers for any ML model stored in (Databricks) MLflow. The aim for the framework is to be completely self-service and enabling to deploy ML models without the user having to deal with the underlying infrastructure at all. The framework has been setup in a modular way and consists of three main components: a FastAPI application that can serve ML models, a Terraform infrastructure module that deploys all necessary cloud resources and a CI/CD template that takes care of deployments.
FASTAPI APPLICATION
The "core" of the API framework is a lightweight FastAPI application. FastAPI is a web framework for building APIs with Python. With FastAPI, we have created an application that is capable of loading in any ML model that is stored by data scientists in MLflow. The API exposes the model’s predict method via a route to make predictions. This application, in an empty state (without any ML model), is wrapped inside a Docker container and stored inside a container registry. That image can be used by data scientists as a base image, to which only a model needs to be added to have a fully working API that can be deployed to any platform such as AWS.
The model's input and output signature as logged in the model’s metadata is automatically transformed to an OpenAPI specification json schema. With this schema, FastAPI will perform input and output validation so we ensure that the API will only accept input that the model can process.
TERRAFORM MODULE
A deployed model serving API makes use of several (AWS) infrastructure components that need to be provisioned. For easy reusability, the framework comes with a Terraform module that can be used in the data science project’s repositories. The terraform module is responsible for creating every resource that is necessary to deploy the API.
Because the APIs are containerized, they can run on any cloud service that can run containers, such as AWS ECS, EKS or Lambda. Alongside the resources to run the API, the terraform module will also create all necessary networking components to integrate the API’s endpoints in the Essent IT landscape. Furthermore, observability is covered out-of-the-box by configuring logging, monitoring and alerting.
CI/CD TEMPLATE
To make deployment easy, we have created a CI/CD template. This template can be imported by data scientists and is responsible for building the API’s docker image and deploying the Terraform infrastructure. Within CI/CD, we retrieve the model from MLflow, build it in a container and perform a series of automatic checks and tests. Examples of tests include checking if the model’s signature has been set in the model’s metadata and if the model’s requirements do not contain any vulnerabilities.
BENEFITS OF THE ML MODEL SERVING API FRAMEWORK
With the API framework, data scientists can take their models from experiment to production-grade API within one hour, allowing for faster iteration, experimentation and delivery.
The framework simplifies the deployment process by abstracting complex DevOps tasks away from data scientists, so that they can focus more on their core work.
By offering the framework we can ensure that all model deployments follow a similar structure.
- Enhanced performance compared to 3rd party managed services
A key advantage of the framework is the ability to deploy the APIs within Essent’s internal networking. This allows for better integration, security and control over networking. Because the APIs can live in the same VPCs as their consumers, traffic does not need to go over the “public” internet and we benefit from a reduced latency.