In the first part of this article, we delved into the crucial field of MLOps and explained the significance of model monitoring, especially within the dynamic landscape of the banking sector. Introducing Google’s Vertex AI, we highlighted its transformative potential, along with its pivotal components: Vertex AI Pipeline and Vertex AI Monitoring. This overview lays the foundation for a deeper exploration into the practical applications and functionalities of these tools. In this section, we will delve into the implementation of Vertex AI Monitoring, providing insights into its integration and use within real-world operations.
Vertex AI Monitoring
Vertex AI Monitoring Features
Vertex AI Monitoring offers comprehensive features to track and manage model performance, including monitoring for training-serving skew, and prediction drift. It enables users to detect deviations in feature data distribution during production compared to training and monitors changes in input data over time. Additionally, with Vertex Explainable AI, users can assess feature attribution skew and drift, ensuring model integrity and reliability.
Implementation Process – Demonstration
Preparation of ML Model
Before implementing model monitoring, a trained ML model is required, either as tabular AutoML or imported tabular custom training type in the model registry. In this demonstration, a custom training model saved in cloud storage, trained using the VertexAI pipeline, is uploaded. In the Model Settings field, details such as model framework, version, and path are provided.
If users intend to monitor feature attribution simultaneously, they must also populate the Explainability field when importing the model. However, in our scenario, we solely focus on monitoring feature distribution.
Endpoint Creation with Monitoring
Once the model has been imported, the next step is to deploy an endpoint to facilitate online prediction services. The user should proceed as follows:
Begin by defining the endpoint and configuring parameters such as traffic split, number of compute nodes, and machine type. Next, during endpoint creation, Vertex AI will prompt the decision to enable model monitoring.
2. Enable monitoring and provide necessary details, including the Monitoring window length to determine monitoring frequency, and Notification emails to receive alerts for threshold breaches.
3. Specify the Sampling rate to determine the percentage of prediction requests to sample within the monitoring window. (Note that this is not required for AutoML models because AutoML Models always have this field populated by Vertex AI.) For custom-trained models (like ours), it is necessary to upload a prediction input schema YAML file to correctly parse input payload according to OpenAPI 3.0.2 schema standards. (In this example, I created a yaml file and saved it in the cloud storage.)
4. Once model monitoring is configured, select the monitoring objective. In this demonstration, I opted for training-serving skew detection to monitor deviations in data distribution in production compared to the training data. Alerts are triggered by default when the distance computed for each feature exceeds the threshold of 0.3. However, the "Options" section within the Model Monitoring configuration view allows modification of feature selection for monitoring and threshold values. For this demo, I set the threshold to 0.2.
Upon completion, the monitoring job is initiated, and a confirmation email is received from the VertexAI platform.
Using the Endpoint for Prediction
To obtain predictions from the active endpoint, users can send prediction requests for new batches of data using Python scripts. A helpful reference for these scripts can be found at the following link provide by the VertexAI platform:
If skew is detected, alerts will be sent via email from the VertexAI platform.
Interpreting Monitoring Results
Plot interpretation
To access the feature distribution histograms in the Google Cloud console, begin by navigating to the Endpoints page. Select the desired endpoint and model for analysis. From the Model monitoring view, users can find information about potential feature distribution skews or drifts detected by the model monitoring jobs.
For instance, clicking on a feature will display its distribution captured by the monitoring jobs executed so far (only the last 50 job executions are displayed). To demonstrate this functionality, several small batches of prediction requests were sent to trigger alerts, potentially resulting in distribution differences from the training set.
The first plot showcases the feature value distribution using data from the selected monitoring job. Continuous numeric feature values are grouped into buckets containing a range of values. The second plot displays the feature value distribution using feature training data.
Skew Calculation
The observed distribution deviation value is 0.228, while the threshold is 0.2 for this monitoring job.
In this demo, the objective of the monitoring is skew detection and the baseline is the statistical distribution of the feature's values in the training data.The baseline is calculated during the creation of a model monitoring job and is only recalculated if the training dataset is updated for the job.
For categorical features, the computed distribution represents the number or percentage of instances of each possible value. Meanwhile, for numerical features, Model Monitoring divides the range of possible feature values into equal intervals and computes the number or percentage of feature values that fall within each interval.
Following the calculation of the latest feature values distribution, a distance score is computed to compare the distribution of the latest feature values in production against the baseline distribution. In this case, as the feature is numerical, the distance score is calculated using the Jensen-Shannon divergence. The resulting score of 0.228 exceeds the threshold, triggering an alert. Alternatively, the distance score can be calculated using the L-infinity distance for categorical features.
Continuous improvement
With insights derived from Vertex AI Model Monitoring, developers can refine machine learning model performance. By identifying areas for improvement, they can fine-tune model parameters, ultimately enhancing accuracy, fairness, and security.
Limitation of VertexAI monitoring
Pricing Concerns
While VertexAI offers robust infrastructure for efficient machine learning model training, evaluation, and deployment, its pricing can pose a challenge, particularly for smaller businesses or startups with constrained budgets.
Users of Vertex AI Model Monitoring incur charges for various aspects, including:
$3.50 per GB for all analyzed data, encompassing both training data and prediction data logged in a BigQuery table.
Possible additional charges for using other Google Cloud products alongside Model Monitoring, such as BigQuery storage or Batch Explain when attribution monitoring is enabled.
Limited References
Although the Vertex AI platform offers guides and tutorials, users may face challenges that these resources cannot fully address. Despite the availability of tutorials, users may find themselves with limited options when facing certain issues, as there are few third-party resources online discussing VertexAI Monitoring. Consequently, this deficits may hinder users in effectively resolving the complex problems they may encounter.
Absence of ML Monitoring Features
VertexAI Monitoring currently lacks ML monitoring features, focusing instead on skew and drift detection for monitoring feature distribution and attribution. While these features provide valuable insights, the absence of ML monitoring means metrics like AUC or accuracy (for datasets with a target) or predicted positive rates for classification are not evaluated within the service. As organizations prioritize machine learning model reliability, there is a growing need for comprehensive monitoring solutions covering both feature insights and performance metrics during the production stage.
Flexible Model Deployment and Monitoring Solutions at Complidata
At Complidata, our FCRR model building pipeline is tailored to operate seamlessly across both on-premises solutions using DeployKF and cloud-based solutions through VertexAI. In the production stage, we not only integrate VertexAI endpoints for making prediction requests and for leveraging its monitoring capabilities, but we also offer the flexibility of using our own faster and more cost-effective API for model serving. Notably, our monitoring solution includes machine learning monitoring in addition to skew and drift detection, in order to ensure comprehensive oversight and performance optimization.