MLOps is the combination of Machine Learning and Operations. Like DevOps for the software world, the concatenation of "ML" with the agile execution methodology "Ops", augurs a coming of age of Machine Learning.
MLOps will make it possible to systematize and automate the going into production of Machine Learning models, by responding to the problems that have arisen in recent years with the advent of ML:
- The difficulty of running an AI in production vs. its Proof Of Concept in a test environment.
- The high failure rate and the necessary iterations in the development of models.
- The role of data scientists who rarely come from the world of software development and are not trained to run software in production.
- The need to involve many different profiles (IT, ML Engineer, Data Engineer, etc.) in extended working groups, with complex, time-consuming and therefore risky project management.
MLOps brings together a set of methods and tools to help develop, deploy and manage machine learning models. This methodology makes it possible to un-risk and accelerate the going into production, but also to significantly reduce costs. All phases of the development of an ML project, from design to the maintenance of the model in production, including the multiple trainings, are simplified and standardized. Similarly, MLOps integrates infrastructure management from the development phases to production, while managing the scaling up.
10 years of ML development
To understand the importance of the emergence of MLOps, we need to go back ten years with the development of platforms such as Datarobot (2012), Databricks (2013) or Dataiku (2013). The craze for data was reaching new heights, the Harvard Business Review titled "Data Scientist: The Sexiest Job of the 21st Century" and these new "Platform-as-a-Service" offered the first tools to these armies of data professionals. They could, on a single interface :
- Collect and prepare the data
- Choose their algorithms such as linear regression, decision trees, neural networks, gradient boosting, etc.
- Train the models and adjust the parameters to minimize the drift of the predictions.
- Validate models and evaluate their performance.
- Optimize the models, modify them and re-train them.
And all this with a "no-code" or even "auto-ML" approach aimed at democratizing Data Science to the general public.
The main role of the Data Scientist was then to find the best algorithmic recipe to solve business problems, with open-source tools and Data Science platforms to facilitate prototyping.
However, industrialization has often been neglected and it has been somewhat forgotten that the purpose of AI is to go into production. This means that AI must be ready to be used in a real environment, fully integrated into a product or business process, and accessible to end users. This implies that the AI must be functional, reliable and capable of handling the expected load, whether it is a large amount of data or intensive use.
After validating the algorithmic recipe, the model should go through the following steps to move from PoC to production AI:
- Confront the model with real-time data,
- Optimize the code for production, which implies rewriting to support scaling, but must also preserve the specifications of the prototype (agility, execution speed, explainability, etc.)
- Create an infrastructure environment that is suitable for production and that will allow for smooth scaling.
Design is not deployment
Integration with production information systems and scaling up require specialized skills, outside the traditional scope of data scientists. This is where ML Engineers, DevOps or Developers come in, considerably increasing the initial costs of the project and introducing complex project management.
A manual approach to these steps makes it possible to go into production, but it is costly, time-consuming and risky. The difficulties increase further when it comes to modifying and updating the model, which is rarely static once it goes into production. Each redeployment is time-consuming and requires reworking each step one by one. Evolutions are complex and this leads to the persistence of unsuitable and time-consuming solutions.
These difficulties often lead to the abandonment of solutions because, due to the high costs, projects have little chance of achieving ROI.
Finally, if the data teams manage to go into production, permanent supervision is necessary to diagnose and correct malfunctions in the model(s). There are two types of malfunction: either users no longer receive results, or they receive poor quality forecasts.
It is at the cost of all these skills and tools that AI can deliver reliable results. We have long since outgrown the traditional job description of a data scientist, however versatile.
AI projects are no longer simply the domain of data scientists, but are becoming cross-functional workgroups that include IT departments. They have the skills in infrastructure management and large-scale code execution.
This is where MLOps comes in and its set of practices and tools that facilitate the execution of an AI project from start to finish. Its primary aim is to shorten the development cycle, accelerate and un-risk deployment, improve the reliability and stability of the model by automating as many of the steps in an ML workflow as possible. Thus, when you develop your model within an MLOps stack, the passage from experimentation to production is done in a few clicks where previously it could take 6 months.
These new automation, optimization and monitoring practices involve the use of a large number of tools. This represents as many components of a process that must be orchestrated with each other. This is the MLOps stack that needs to be put in place. It is generally made up of the following stages:
- Source code management
- Feature storage
- Training and selection of models
- Creation of pipelines
- Joint management of code versions, data, models, metrics, etc.
- Deployment of models
- Automated testing
- Continuous integration and deployment (CI/CD)
- Hosting and production release
- Monitoring and steering of deployed models
- Automated re-training
It is by orchestrating all of these actions that a MLOps approach can be implemented. There are a multitude of independent tools and libraries, some open-source, covering each phase of this cycle. It is up to ML teams to orchestrate these tools to create their own stack. The integration of MLOps tools thus optimizes the management of chaotic and tedious projects. They will allow greater autonomy to data scientists and will productise complex "Ops" issues such as infrastructure management, API creation and model deployment. In this way, MLOps facilitates collaboration between all profiles within a single tool.
How to create your MLOps workflow?
We have identified several important criteria to take into account when choosing your tools:
- Compatibility with the existing stack in the company
- Flexibility and customisation: can you set up your tools to meet the specific needs of your teams?
- User experience: without rapid adoption of the chosen solutions, the whole concept of streamlining through MLOps falls apart.
- Security: are your data and code sufficiently protected when going into production?
- Explicability: How can you industrialize the deployment of large-scale AI projects without careful monitoring of the executions?
- Follow-up: does the selected service provide you with qualified resources and contacts to solve problems and share best practices?
Orchestrate your stack with an all-in-one platform-as-a-service
Craft AI allows you to deploy your code at scale, in a few clicks, without code refactoring or DevOps skills. Data Science teams can now take control of their AI project from start to finish, independently and in record time. Accelerate and un-risk your deployments in dedicated cloud environments. To learn more, ask for a demo here !
Written by Hélen d'Argentré