Release Notes (1.5.0)

This document provides a comprehensive overview of the latest updates and enhancements in HPE Ezmeral Unified Analytics Software (version 1.5.0), including new features, improvements, bug fixes, and known issues.

HPE Ezmeral Unified Analytics Software provides software foundations for enterprises to develop and deploy end-to-end data and advanced analytics solutions from data engineering to data science and machine learning across hybrid cloud infrastructures – delivered as a software-as-a-service model.

New Features

This release includes the following new features:
Support for External Storage Platforms
HPE Ezmeral Unified Analytics Software now integrates with external storage platforms, eliminating the internal data fabric as primary storage. This integration leverages existing storage solutions for a seamless and scalable data management experience while reducing the amount of resources required to deploy an HPE Ezmeral Unified Analytics Software cluster. It also enhances high availability (HA) efficiency to ensure a fully operational cluster after recovery from a power outage or reboot. HPE Ezmeral Unified Analytics Software currently supports HPE Ezmeral Data Fabric as primary storage with support for additional storage solutions coming in subsequent releases. For details, see Primary Storage, Preparing HPE Ezmeral Data Fabric to be Primary Storage for HPE Ezmeral Unified Analytics Software, and Installing on User-Provided Hosts (Connected and Air-gapped Environments).

MAPRSASL Authentication for Hive Metastore
You can now configure a Hive data source in HPE Ezmeral Unified Analytics Software to use MAPRSASL for authentication with the Hive Metastore on HPE Ezmeral Data Fabric. This enhancement ensures secure access and integration, providing an added layer of security for data management. For additional details, see Using MAPRSASL to Authenticate to Hive Metastore on HPE Ezmeral Data Fabric.

Enhancements

This release includes the following enhancements:
Flexibility in Tools and Frameworks Installation
You now have the option to deploy a subset of tools and frameworks during installation, and the flexibility to install the other tools and frameworks later. You can exclude the following tools and frameworks from the initial installation of HPE Ezmeral Unified Analytics Software:
  • Superset
  • EzPresto
  • Livy
  • MLDE
  • Feast
The tools and frameworks that you choose not to install initially can be installed any time. For additional details, see Installing Included Frameworks Post Unified Analytics Installation.
UI for Adding Volumes
A new user interface is now available for connecting to external storage platforms, allowing you to use them as data sources for applications and frameworks in your HPE Ezmeral Unified Analytics Software cluster. The UI supports integration with HPE Ezmeral Data Fabric and GreenLake for File Storage, providing a seamless and user-friendly way to access diverse data sources. Note that with this change, the Data Fabrics option previously under Administration in the left navigation panel has been moved to the Data Volumes tab. For additional details, see Connecting to HPE Ezmeral Data Fabric and Connecting to HPE GreenLake for File Storage.

Revoke User Access on Data Sources
Administrators can revoke user access to data sources within the Data Engineering section of the UI. This functionality allows for easy management of user privileges, ensuring secure access to both structured and object store data. For additional details, see Revoking Member Access to Data.

Run CTAS Queries with Hive Discovery Metastore
The Hive Discovery Metastore now supports running CTAS (CREATE TABLE AS SELECT) queries on CSV and parquet files stored in the HPE Ezmeral Data Fabric file system or S3 object storage, including HPE Ezmeral Data Fabric S3, MinIO S3, and AWS S3. You can also insert data into the created tables. To utilize this feature, set up a Hive data source connection with the specified parameters, as described in Hive Discovery Metastore Connection Parameters. Use schema discovery for CSV files, delta discovery for delta files, and include the format in the query for parquet files.

Installation Configuration Review
Before finalizing the installation of HPE Ezmeral Unified Analytics Software on your cluster, you can review and adjust the installation configuration details on the Review screen. This feature ensures accuracy and customization of the setup process.

Seamless Deletion of Imported Tools and Frameworks
You can now automatically delete a chart from the Charmuseum when an ezappconfig custom resource (CR) is deleted. This feature simplifies the management of imported tools and frameworks by ensuring that associated configurations and resources are removed seamlessly.

Resolved Issues

This release includes numerous fixes that enhance system security, stability, and performance, including the following resolutions:

Permission denied error when submitting the Kubeflow pipeline while using the Kubeflow notebook images
Submitting a Kubeflow pipeline using the KFP SDK V2 Kubeflow notebook images no longer returns a permission denied error.

The driver pod of the cloned Spark job remains in the container creating state
When you use the Clone option to create a new Spark application with a similar configuration as an existing Spark application, the driver pod of the cloned Spark job no longer remains in the container creating state.

Permission denied error when installing packages while using the Kubeflow notebook images
Installing the Kubeflow notebook images (with KFP SDK V2) provided by HPE Ezmeral Unified Analytics no longer returns a permission denied error.

Replace Fluent Bit with OTEL for log collection and parsing
Log collection and parsing now uses Open Telemetry (OTEL) instead of Fluent Bit, which reduces resource consumption (memory).

Unable to download infrastructure and application services logs
You can download the infrastructure and application services logs without issue.

Unable to delete Data Fabric connection due to "Secret not found" error
You can delete Data Fabric connections by deleting the Data Volume source.

Uploading a term license
Uploading a term license no longer results in an ezlicense controller pod crashloopbackoff error.

Activation code change no longer results in a crashloopbackoff error
The activation code change that caused a crashloopbackoff error when a capacity license was applied before upgrading is resolved.

Known Issues

The following sections describe known issues with workarounds where applicable:

EzPresto installation fails due to mysql pod entering CrashLoopBackOff state
During EzPresto deployment, the HPE Ezmeral Unified Analytics Software installation fails due to slow disk I/O, which leads to the mysql pod in EzPresto entering a CrashLoopBackOff state.

Workaround: To resolve this issue, see EzPresto installation fails due to mysql pod entering CrashLoopBackOff state.

Installation pre-check fails if the SSH key does not have a passphrase
If you use an SSH key file, the SSH key must have a passphrase; otherwise, the installation pre-check fails and installation cannot occur. You can set the passphrase to any value, even a dummy value.

Running CTAS against a Hive data source fails with ORC file error
Running a CTAS query against a Hive data source that is configured to use MAPRSASL authentication fails with the following error:
Error creating ORC file. Error getting user info for current user, presto.
This issue occurs if the HPE Ezmeral Data Fabric ticket was generated with impersonation enabled uids and impersonation was not enabled when the Hive data source connection was configured in HPE Ezmeral Unified Analytics Software. For example, the ticket was created as shown:
maprlogin generateticket -user pa -type servicewithimpersonationandticket \
-impersonateduids 112374829 -out pa.out
Workaround: To resolve this issue, delete the Hive data source connection and create a new Hive data source connection, making sure to include the following options in addition to the other required options:
  • Select the Hive HDFS Impersonation Enabled option.
  • Enter the principal/username that Presto will use when connecting to HPE Ezmeral Data Fabric in the Hive Hdfs Presto Principal field. If this field is not visible, perform a search for it in the Hive Advanced Settings search field.
For additional information, see Using MAPRSASL to Authenticate to Hive Metastore on HPE Ezmeral Data Fabric.
CTAS query on Hive Metastore in HPE Ezmeral Data Fabric fails
For Hive connections that authenticate to HPE Ezmeral Data Fabric via MAPRSASL, running a CTAS query against HPE Ezmeral Data Fabric returns the following error:
Database 'pa' location does not exist:<file_path>

Workaround: To resolve this issue, create and upload a configuration file that points to the HPE Ezmeral Data Fabric cluster, as described in Using MAPRSASL to Authenticate to Hive Metastore on HPE Ezmeral Data Fabric.

The Hive connection to HPE Ezmeral Data Fabric exists after deleting files
Deleting the cluster details and tickets from the mapr-clusters.conf and maprtickets files does not terminate the Hive connection to HPE Ezmeral Data Fabric. Users can still create new Hive connections to HPE Ezmeral Data Fabric and run queries against HPE Ezmeral Data Fabric. This issue occurs because HPE Ezmeral Unified Analytics Software caches the HPE Ezmeral Data Fabric files.
Workaround: After you delete the cluster details and tickets from the mapr-clusters.conf and maprtickets files, restart the EzPresto pods. To restart the pods, run:
kubectl rollout restart statefulset -n ezpresto ezpresto-sts-mst 

kubectl rollout restart statefulset -n ezpresto ezpresto-sts-wrk 
Optional Fields display by default when connecting an Iceberg data source
When adding Iceberg as a data source, the UI lists all possible connection fields (mandatory and optional) instead of listing the mandatory connection fields only.
EzPresto does not release memory when a query completes

EzPresto retains allocated memory after query completion for subsequent queries because of an open-source issue (https://github.com/prestodb/presto/issues/15637). For example, if a query uses 10GB of memory, EzPresto does not release the memory when the query completes and then uses it for the next query. If the next query requires additional memory, for instance, 12GB, EzPresto accumulates an extra 2GB and does not release it after query completion. For assistance, contact HPE support.

Configuration changes to long-running pods are not applied in Ray

Configuration changes or upgrades to long-running pods in Ray, such as adjusting resource capacities or expanding persistent volume (PV) storage are not applied in Ray.

Workaround

To ensure successful configuration changes or upgrades, manually delete relevant pods after the reconfiguration or upgrade process. For details, see https://github.com/ray-project/kuberay/issues/527.

Worker nodes do not automatically spawn with JobSubmissionClient in the Ray cluster

When submitting jobs to the Ray cluster using JobSubmissionClient, worker nodes do not spawn automatically.

Workaround

To ensure proper functionality when submitting Ray jobs using JobSubmissionClient, you must manually specify entry point resources as follows:
  • For CPU, set entrypoint_num_cpus to 1
  • For GPU, set entrypoint_num_gpus to 1
For details, see Using JobSubmissionClient to Submit Ray Jobs.

HPE is actively engaging with the community to address this open-source issue (https://github.com/ray-project/ray/issues/42436).

NVIDIA GPU cannot enforce SELinux
Due to a known NVIDIA GPU issue (https://github.com/NVIDIA/gpu-operator/issues/553), SELinux cannot be enforced for GPU deployments.
Workaround
Set GPU hosts to either disabled or permissive mode until this issue is resolved.

Ray dashboard UI
A known Ray issue prevents the Ray Dashboard UI from displaying the GPU worker group details correctly. To see updates regarding resolution and to learn more, see https://github.com/ray-project/ray/issues/14664.

Upgrade on OpenShift cluster
If you want to perform an in-place upgrade of HPE Ezmeral Unified Analytics Software on an Openshift cluster, contact HPE support for assistance to ensure a smooth transition and to address any potential complexities that can arise during the upgrade process.

Installation

Before you install or upgrade, HPE recommends that you back up your data.  If you encounter any issues during or after the installation or upgrade process, please contact HPE Support. We appreciate your feedback and strive to continually enhance your product experience. 

Additional Resources

Thank you for choosing HPE Ezmeral Unified Analytics Software. Enjoy the new features and improvements introduced in this release.