Introduction to MLOps and Model Monitoring
MLops
Machine Learning Operations (MLOps), is an emerging discipline that encompasses the practices, tools, and methodologies for managing and scaling machine learning models in production environments. In MLOps, the focus extends beyond just building and training models, it covers more broader activities, including data preparation, model training, testing, deployment, monitoring, and continuous improvement.
Model Monitoring
Model monitoring is a crucial component of MLOps. It involves the ongoing observation and evaluation of machine learning models deployed in production environments to ensure their continued performance, reliability, and relevance. ML model monitoring involves tracking the performance of deployed ML models to detect potential issues that can add negative business value. This includes monitoring prediction quality, data relevance, model accuracy, and bias.
Importance of Model Monitoring
Over time, ML models naturally degrade in performance due to factors like data inconsistencies, skews, and drifts, rendering them inaccurate and irrelevant. Effective model monitoring helps identify precisely when model performance starts to decline. This proactive approach allows for timely actions such as retraining or replacing models. Ultimately, ML monitoring fosters trust in ML systems by ensuring their continued effectiveness and reliability.
Monitoring in the Banking Industry
In the banking sector, where precision and consistency are crucial, the practice of model monitoring assumes an even greater significance. By customizing and refining model monitoring practices for the banking industry, financial institutions can proactively detect and address issues such as data drift, model degradation, and emerging risks. This proactive approach enables banks to uphold the integrity of their predictive models, thereby enhancing decision-making processes and safeguarding against potential financial losses and reputational damage.
Moreover, effective model monitoring helps banks to optimize operational efficiency by identifying inefficiencies or anomalies in their processes and systems. By mitigating financial risks and ensuring regulatory compliance, banks can build trust and confidence among their customers and stakeholders, thereby enhancing their reputation and competitiveness in the market.
As Complidata is a proud product partner of Google Cloud, this blog post will delve into some of the key features of Vertex AI – Google Cloud’s own AI platform – as applied to model monitoring in the banking sector.
From Complexity to Clarity: How Vertex AI Streamlines Machine Learning Development
Introduction to Vertex AI
The world of machine learning (ML) holds immense potential, but its complexities can often be daunting. Google Vertex AI, a unified platform on Google Cloud Platform (GCP), cuts through this complexity, offering a streamlined approach to the entire ML lifecycle. From data preparation, model training, and model evaluation to deployment and monitoring, Vertex AI helps users of all skill levels to develop, manage, and leverage powerful ML models. Its intuitive interface and comprehensive suite of tools accelerate the development process, enabling you to bring powerful ML solutions to life faster. A typical user journey can be broken down into the following stages:
Data Preparation: Vertex AI recognizes the importance of clean data for effective models. It offers a suite of tools to help users clean, transform, and prepare their data for optimal machine-learning performance.
Model Development: Vertex AI caters to both beginners and experienced users. You can choose from pre-built models for common tasks, perfect for jumpstarting projects. Alternatively, for more specific needs, you can build custom models using popular frameworks. Vertex AI even offers AutoML, a feature that automates model selection, making high-quality model development accessible to everyone.
Training and Deployment: Once you have your model, Vertex AI leverages Google's robust infrastructure to train it efficiently, often through a technique called distributed training that significantly reduces training time. Once trained, Vertex AI provides deployment options through API endpoints, allowing applications and services to receive real-time predictions from your model. For large-scale predictions, batch processing is also available.
Monitoring and Management: Vertex AI doesn't stop at deployment. It equips users with tools to monitor model performance and detect issues like data drift, which can affect accuracy over time. This proactive approach allows you to retrain and maintain your models for optimal results.
The simplicity of the user’s progression through these stages streamlines the ML journey, making ML accessible to a wider range of users.
Vertex AI Pipeline
One of the most crucial aspects of MLOps is robust experimentation. Users require a tool that allows them to track, compare, reproduce experiments and save results and data. Here's where Kubeflow Pipelines (KFP) comes in. KFP is a powerful toolkit designed specifically for running machine learning workflows on Kubernetes, a container orchestration platform. KFP excels in simplifying this process by allowing users to define their workflow as a series of Python functions that seamlessly pass results and data (known as artifacts) between each step.
Google Cloud Platform developed Vertex AI Pipelines as a managed service built on top of KFP. Vertex AI Pipelines streamlines the workflow orchestration process by handling infrastructure management. This frees up data scientists to focus on what truly matters: building and running effective ML pipelines.
With Vertex AI Pipelines, you can leverage a serverless approach to orchestrate your ML workflows. Before orchestration can begin, you define your workflow as a pipeline. These pipelines are designed to be portable and scalable, using containers and various GCP services for optimal performance. Once you’ve run your pipeline, you will be able to see it in an intuitive UI, like this:
Vertex AI Monitoring
Google Cloud VertexAI has also allowed us to monitor models. Vertex AI Model Monitoring monitors models for training-serving skew and prediction drift and sends you alerts when the incoming prediction data skews too far from the baseline.
Key Features of Vertex AI Monitoring
Feature Distribution training-serving skew and prediction drift
VertexAI Model Monitoring monitors the model's prediction input data for feature skew and drift as follows:
Training-serving skew occurs when the feature data distribution in production deviates from the feature data distribution used to train the model. Here, before enabling skew detection to monitor the training-serving skew of the models, users need to upload the original dataset.
Prediction drift occurs when feature data distribution in production changes significantly over time. If the original training data isn't available, users can enable drift detection to monitor the input data for changes over time.
Feature attribution training-serving skew and prediction drift
Vertex AI Model Monitoring with Vertex Explainable AI enabled, helps users to detect skew and drift for the feature attributions of categorical and numerical input features:
Training-serving skew occurs when a feature's attribution score in production deviates from the feature's attribution score in the original training data.
Prediction drift occurs when a feature's attribution score in production changes significantly over time.
In the second part of this article, we'll explore Vertex AI Monitoring capabilities and provide a practical guide on applying model monitoring in real-world scenarios, accompanied by insights on interpreting the monitoring results.