Monitoring

Describes monitoring in HPE Ezmeral Unified Analytics Software.

Monitoring and alerting play an integral role in the observability framework. They involve monitoring the health, performance, and resource utilization of a Kubernetes cluster and its components. Administrators receive alerts about potential issues, which helps maintain optimal cluster and application operations and enables prompt responses to critical events.

NOTE

You cannot configure notifications or turn off notifications. You must view alerts and notifications in HPE Ezmeral Unified Analytics Software.

Model Monitoring

Model monitoring is the process of continuously observing and analyzing the performance and behavior of machine learning models deployed in production environments. It is a critical aspect of the machine learning lifecycle that ensures models remain reliable, accurate, and aligned with the intended objectives.

Model monitoring involves the collection, analysis, and visualization of various metrics and data related to the model's performance and data characteristics. It is an iterative process that helps ensure model reliability and enables timely adjustments or updates to maintain optimal performance. Model monitoring plays a crucial role in building trust in machine learning systems and making informed decisions based on model outputs.

Model monitoring metrics are essential to track and measure the performance of the deployed models.

In HPE Ezmeral Unified Analytics Software, you can use KServe or MLflow for monitoring operational performance and whylogs for functional performance.

Collected Metrics

Knative metrics

Knative Serving does not have built-in native support for model monitoring metrics. You can integrate Kserve with other monitoring and observability tools to collect and analyze metrics related to the performance and behavior of your deployed models.(Prometheus, Grafana, Kiali, ESK etc)

To learn more, see Importing dashboards to Grafana.

The following metrics are collected via KServe:

Knative Serving: Revision HTTP Requests
Knative Serving: Scaling Debugging
Knative Serving: Revision CPU and Memory Usage
Knative: Reconciler
Knative Serving: Control Plane Efficiency

MLflow metrics

Use OTel to collect and export the telemetry data from MLflow applications, including metrics, and traces to third-party or external monitoring systems such as Prometheus, Jaeger, or Grafana for analysis and visualization. To learn more, see Configuring Endpoints.

The following metrics are collected via MLflow:

mlflow_http_request_total: Total number of incoming HTTP requests.
mlflow_http_request_duration_seconds_sum: Total duration in seconds of all incoming HTTP requests.
mlflow_http_request_duration_seconds_count: Total count of all incoming HTTP requests.

Model Monitoring with whylogs

NOTE

This feature is presented as a developer preview. Developer previews are not tested for production environments, and should be used with caution.

whylogs is an open-source library for logging any kind of data. With whylogs, you can generate summaries of your datasets (data profiles) that you can use to:

Track changes in the dataset and detect data drifts in the model input features.
Create data constraints to validate data quality in model inputs or in a data pipeline.
Detect training-serving skew, concept drift, and model performance degradation.
Perform exploratory data analysis of massive datasets.
Track data distributions and data quality for ML experiments.
Standardize data documentation practices across the organization.
Visualize the key summary statistics about the datasets in HTML and JSON file formats.

To learn more about whylogs, see whylogs documentation.

HPE Ezmeral Unified Analytics Software enables you to use an open-source library called whylogs in the preview environment. whylogs is integrated into the Notebook as a third-party package. You can access data from external S3 object store when using whylogs for monitoring. To learn more about accessing data, see Accessing Data in External S3 Object Stores.

The following applications and frameworks support whylogs in HPE Ezmeral Unified Analytics Software:

Airflow. See Using whylogs with Airflow.
MLflow. See Using whylogs with MLflow.
NOTE
HPE Ezmeral Unified Analytics Software supports external data sources such as AWS, MinIO for whylogs with MLflow. You can not use S3 proxy as a data source.
Ray. See Using whylogs with Ray.
Spark. See Using whylogs with Spark.

HPE Ezmeral Unified Analytics Software 1.5 Documentation
Abstract	HPE Ezmeral Unified Analytics Software is a usage-based Software-as-a-Service (SaaS) model that operationalizes hybrid and multi-cloud modern analytical workloads through a simple user interface, easily installed and deployed in minutes. HPE Ezmeral Unified Analytics Software separates compute and storage for flexible, cost-efficient scalability to securely access data stored in multiple data platforms, enabling you to run traditional and advanced analytics workloads with open-source tools.
Published	July 2025
Edition	1.5.0
Topic last updated	2024-09-10