Advancing Machine Learning with DevOps
Allowing failure is a basic prerequisite for innovation. If you are not prepared to fail, you will not be able to create anything truly new. As the German CTO of a Japanese IT service provider with a strong culture of innovation, I am deeply convinced of this. But if only a good tenth of machine learning projects ever go live, something is wrong.
After all, machine learning is one of the central applications of artificial intelligence (AI) and the basis of numerous future technologies such as autonomous driving, smart cities and the Industrial Internet of Things (IIoT). To advance ML and other AI technologies more quickly, we therefore need a new form of collaboration between the development and operation of solutions based on DevOps principles, or ML-Ops for short.
Why ML-Ops? Because AI is different. In classical IT, the code determines the behavior of the system. The functionality of the system can be tested and evaluated step by step.
In artificial intelligence applications, on the other hand, data determines the behavior of the system. The difficulty here is that the source data is updated in the course of machine learning and other AI processes. Therefore, we need to continuously monitor the behavior of the ML models.
This process corresponds to the principle of continuous integration (CI) in classic software development. Experts for ML-Ops refer to this as Continuous Evaluation. In addition to the technological know-how for automating evaluation processes, this includes permanent close collaboration with the company's data scientists.
ML-Ops in practice
A typical use case for this type of ML ops is quality improvement. A Japanese automotive company, for example, launched a project in which machine learning is to help improve vehicle quality based on complaint letters in natural language. ML is used here to analyze the meaning of the complaint data in the texts. A particular challenge was to maintain the accuracy of the analyses even when introducing new products.
Here, we created a simple and fast way to update new classification models based on "bag-of-words" and "gradient boosting". The immediate result: In the areas of data processing, design and deployment, the lead time was reduced by a total of six weeks. Among other things, the high speed of checking complaints had a positive impact here. At the same time, the model is much easier and more economical to maintain - throughout the entire lifecycle.
Similarly, in an AI project of an internationally operating insurance company, it was possible to simplify and automate the development and operation of the solution to such an extent that no operational support from IT is required for operation and continuous evaluation. The data scientists can concentrate fully on their data experiments - without restrictions from the IT infrastructure.
Trustworthiness of the AI
Third example: In an Italian bank, the aim was to detect anomalous behavior in gigantic volumes of financial transactions. Experts see this as a key benefit of artificial intelligence for digital banking. But the volumes of data involved make manual training of AI models impossible. By using ML-Ops, an automated system for training the data models could be established. And since it makes every analysis model generated and every prediction based on it reproducible, it also fulfills the most important requirement for AI, not only in the financial industry: trustworthiness.