You don't know what "model deployment" means? Even when you try to understand what it means, you end up searching for the meaning of too many baffling tech words like "CI/CD", "REST HTTPS API", "Kubernetes clusters", "WSGI servers"... and you feel overwhelmed or discouraged by this pile of new concepts?
Maybe you think that "deploying to production" is an "IT guy people" problem and not a Data Scientist problem because your job is only about working on data and ML modeling and it's your only area of interest?
Then, this page is for you! It will help you to understand in simple words what deploying a model means, why it is useful and why you need to be able to do it yourself.
ML modeling isn't everything
Assume you're a Data Scientist working at a public transportation company to develop a supervised ML model which performs fault detection on trains equipments (like doors or brakes, for example). You worked on historical data coming from various sensors to explore it, analyze it, preprocess it... Then, you chose between various algorithms like Random Forest or Gradient Boosting to build your model, using cross-validation to tune their hyperparameters. But finally, you're happy! Indeed, you obtained a model which has a good F1-score, i.e. you found an equilibrium between high precision (low number of false alarms) and high recall (low number of undetected faults).
Then, you meet the head of maintenance department in your company to show her your accomplishment: you solved her problem with your wonderful ML model! You can even exhibit your Jupyter notebook with some great plots you prepared to demonstrate how your model performs good predictions.
So you expect her to be satisfied with your work... and you're wrong!
In fact, although she finds your model satisfactory, she doesn't understand at all how she will use it! Indeed, when you show her your notebook to explain to her how to use your model to get alarms for fault detection, she is at least confused or even scared: all these lines of code seem cryptic to her. And it's perfectly normal! After all, she is a railway maintenance specialist, not a Python expert or a Data scientist! Her job is to manage a team of maintainers so that trains can run, so you cannot expect her to use your Python code.
Moreover, your job is to help her team and your company by detecting faults before they appear, hence reducing maintenance costs: as a Data Scientist, your job is at first a business-oriented job and your models should be easy to use by business stakeholders and (normally) generate some sort of ROI for them and their company
Seems like something crucial is missing! You need a way to make the model predictions available to the user: that's precisely what is called "deploying your model"! Deploying a model simply means building a tool which can be used to get predictions from your ML model.
Let's see what it can mean in practice.
Deploying a ML model
In practice, you should first ask yourself some basic questions like:
- Who needs to have access to the predictions? How many people does it represent? It may be only a few business stakeholders in your company or thousands of external people.
- How will they access these predictions?
- How often will they need these predictions?
- and maybe some other practical questions...
Depending on the answers, your deployment method may be quite different from a technical point of view. For instance, if your model is contained in a Web application with thousands or millions of users who expect instant answers, it should be prepared to support many simultaneous requests (which implies specific infrastructure needs) and to perform inference very quickly! Indeed, your user will definitely not wait 40 seconds before getting access to the content, simply because your prediction takes too long to be accessed and computed. This may even impact your modeling choices: a huge Deep Learning model may not be the first choice in this case, at least if you don't have the necessary computation power... However, if your need is only to send predictions once a week to a few people in your company, you will probably not need huge infrastructure.
Thus, there are several ways to deploy a model, depending on your needs. For example, in the predictive maintenance use case we saw above, maybe the maintenance team just needs an alerting system sending them emails or instant messages only when your model detects a potential failure. Or they may prefer having access to a Web page containing a dashboard to follow some technical KPIs about their assets.
In other cases, you may also need to deploy your model via a Web API so that it can be integrated as a part of a larger Web application. This means that you need to code "routes", i.e. URLs which give access to predictions when they are requested without giving direct access to your code, your data or your model. In this case, you'll also need to create a Web server, i.e. a computer (probably in a Cloud service like Azure, AWS...) which is always up and running and can execute these requests when the corresponding URLs are called.
You may also encounter situations where the best choice is to compute and send predictions to business stakeholders on a regular basis (each day or each week, for instance). Just think about the example of forecasting products sales in a supermarket: if the store is re-supplied every week, it's useless to send hourly or daily forecasts to the supply manager!
In any case, you'll need many tools to get a manageable deployed model:
- a data pipeline to get and transform all the necessary data from various sources
- some data validation checks
- various monitoring tools to assess if your service is available and gives good results
- etc. (see also the MLOps stack described in our previous article).
Data Scientists should deploy their own models
Of course, it can be overwhelming to acquire all these skills and perform these tasks alone for a Data Scientist (and some of these tasks can be the role of a Data Engineer or a DevOps Engineer). Deploying a ML model may become a lengthy process and it can be highly error-prone for a Data science team with little MLOps experience.
But it's important that each Data Scientist becomes responsible for the deployment of the models they create. Indeed, as a Data Scientist, being in charge of the deployment of your own models lets you gain more autonomy by having visibility on the whole process: thus, you wait less often from other experts like DevOps people for example (you also gain time in your work). Thus, being able to deploy their own models is a kind of empowerment of the Data scientist who can control the full lifecycle of their ML models (from the conception to the deployment and the monitoring). Moreover, this responsibility gives you a more concrete impact for your end user: when you've successfully deployed your own model, you feel the joy of having created an application that can be used on a daily basis by your end user. In some sense, it's your baby who came to life!
To allow data scientists to easily deploy their models, they need tools to reduce deployment and infrastructure complexity so that it becomes possible for any data scientist to put their models in production autonomously and quickly, with only a few lines of Python code and without painful rewriting of their own modeling source code.
Our MLOps platform is such a tool, you can test it here.