Release Notes (1.5.2)

This document provides a comprehensive overview of the latest updates and enhancements in HPE Ezmeral Unified Analytics Software (version 1.5.2), including improvements, bug fixes, and known issues.

HPE Ezmeral Unified Analytics Software provides software foundations for enterprises to develop and deploy end-to-end data and advanced analytics solutions from data engineering to data science and machine learning across hybrid cloud infrastructures – delivered as a software-as-a-service model.

Enhancements

This release includes several security and stability fixes.

Resolved Issues

This release includes the following resolutions:
Slow disk I/O no longr causes the EzPresto installation to fail due to the mysql pod entering a CrashLoopBackOff state
During EzPresto deployment, the HPE Ezmeral Unified Analytics Software installation no longer fails due to slow disk I/O, which led to the mysql pod in EzPresto entering a CrashLoopBackOff state.

Using a Capacity License for KFP SDK V2 no longer causes Katib jobs to fail
Katib jobs can successfully launch in environments that use a capacity license for KFP SDK V2.

Installation pre-check no longer fails if the SSH key does not have a passphrase
If you use an SSH key file, you do not have to provide a dummy passphrase to pass the installation pre-check.

Configuration changes to long-running pods are now applied in Ray

Configuration changes or upgrades to long-running pods in Ray, such as adjusting resource capacities or expanding persistent volume (PV) storage are now applied in Ray.

Known Issues

The following sections describe known issues with workarounds where applicable:
Delayed metrics data when the HPE Ezmeral Coordinator node reboots
When the HPE Ezmeral Coordinator node reboots, it can take up to 20 minutes for the system to reestablish the on-premises to cloud connection. Once the connection is established, all metrics data is sent.

The system allows you to create object storage connections with a bucket name in the endpoint URL
Users cannot access object storage when the data source connection is created with a bucket name in the endpoint URL, for example https://s3.us-test-2.amazonaws.com/bucket1. To resolve this issue, delete the data source connection and create a new connection with an endpoint URL that does not include a bucket name, for example https://s3.us-test-2.amazonaws.com.

Katib jobs fail when launched through Kale
If you launch a Katib job through Kale from a notebook, the Katib job fails because resource limits are not provided. Pods get stuck in a pending state and the system returns a warning message stating that resource limits must be defined.

Workaround: To work around this issue:

  1. Download the following file and put it in the /mnt/user directory:
    kale-katib.patch
  2. Open a notebook terminal and run the following command:
    cd /opt/conda/lib/python3.11/site-packages
  3. From the notebook terminal, run the following command:
    git apply /mnt/user/kale-katib.patch
  4. Close all the open notebook tabs and shut down all the kernels running in notebooks.
  5. In the top menu bar, select File > Log Out.
  6. Log in again.

Packages created with %createKernel are not available on the new kernel
When you run the %createKernel magic function, installed packages may not display on the kernel; however, you can see the installed packages by running conda list on the terminal. Some default packages, for example pandas, installed while creating a new kernel may not be available.
Spark application does not run without vCPU driver and executor values set
If you do not set the vCPU driver and executor values when you create a Spark application, the application cannot run and remains in a pending state. These values specify the amount of capacity that the Spark application can consume from the license.

Application status does not change after configuration update
When you modify an application (through the Tools & Frameworks page) and click Configure, the application status remains in the Ready state when it should change to Updating.

Workaround:To resolve this issue, refresh the Tools & Frameworks page.

Installation fails during OpenTelemetry phase
Slow disk speed can cause an intermittent timing issue that prevents the certificate from being propagated to a webhook before the timeout duration. As a result, installation fails during the OpenTelemetry phase.
Workaround: To resolve this issue, manually trigger a fresh installation through the installation orchestrator pod. The installation orchestrator updates the failed addon and reinstalls it.
  1. Get the HPE Ezmeral Coordinator kubeconfig or sign in to the HPE Ezmeral Coordinator node.
  2. Run the following command to find the installation orchestrator pod in the ${cluster_name} namespace:
    kubectl get pod -n ${cluster_name}
    
    //Example
    kubectl get pod -n ezua-demo
    NAME                                            READY   STATUS             RESTARTS   AGE
    ezua-demo-controller-manager-68474c6b97-dxprh   2/2     Running            0          2d12h
    op-clustercreate-ezua-demo                      1/1     Running            0          3d
    upgrade-ezua-upgrader-1-6-0-dc761-2-ezua-demo   1/1     Running            0          2d11h
    upgrade-ezua-upgrader-1-6-0-dc761-4-ezua-demo   1/1     Running            0          36h
    upgrade-ezua-upgrader-1-6-0-dc761-6-ezua-demo   1/1     Running            0          25h
    w-op-workload-deploy-ezua-demo                  1/1     Running            0          36h
    The name of the installation orchestrator pod is w-op-workload-deploy-${cluster_name}.
  3. Run the following command to sign in to the orchestrator pod:
    kubectl exec it w-op-workload-deploy${cluster_name} -n ${cluster_name} - bash
  4. To trigger a fresh installation, cd to /root/ezaf/orchestrator and then run:
     ./orchestrator.sh
    
    //Example
    cd /root/ezaf/orchestrator
    ./orchestrator.sh  

Submitting an MLflow job from a notebook intermittently returns a ValueError
Submitting an MLflow job from a notebook can intermittently return the following ValueError:
ValueError: numpy.dtype size changed, may indicate binary incompatibility.
Expected 96 from C header, got 88 from PyObject command terminated with exit code 1

Workaround: To resolve this issue, restart the notebook and submit the MLflow job again.

Cannot access HPE GreenLake for File Storage S3 buckets from Livy
HPE Ezmeral Unified Analytics Software users with the member role cannot access buckets in HPE GreenLake for File Storage S3 object storage from Livy when read and write access permissions are granted on the buckets.

Tiles do not appear for imported tools and frameworks
Tiles for imported tools and frameworks do not immediately appear on the Tools & Frameworks page after you import a tool or framework. You must refresh the page to see the tiles for imported tools and frameworks.

Cannot create Iceberg connections with hadoop catalog type from the UI
You must create Iceberg connections with hadoop catalog type from the command line using a curl command that posts the configuration in JSON format. For details, see EzPresto.

SQL Client and Query Editor return incorrect results for bigint data type
The SQL Client and the Query Editor return incorrect results for the bigint data type by rounding up the last few digits of large numbers. For example, if you run the following query:
SELECT 714341252076979033 LIMIT 1

The SQL Client and the Query Editor return 714341252076979100 when they should return 714341252076979033.

To work around this issue, use the CAST() function to cast the number, column, or expression to VARCHAR, for example:
SELECT CAST('714341252076979033' AS VARCHAR) LIMIT 1

Running CTAS against a Hive data source fails with ORC file error
Running a CTAS query against a Hive data source that is configured to use MAPRSASL authentication fails with the following error:
Error creating ORC file. Error getting user info for current user, presto.
This issue occurs if the HPE Ezmeral Data Fabric ticket was generated with impersonation enabled uids and impersonation was not enabled when the Hive data source connection was configured in HPE Ezmeral Unified Analytics Software. For example, the ticket was created as shown:
maprlogin generateticket -user pa -type servicewithimpersonationandticket \
-impersonateduids 112374829 -out pa.out
Workaround: To resolve this issue, delete the Hive data source connection and create a new Hive data source connection, making sure to include the following options in addition to the other required options:
  • Select the Hive HDFS Impersonation Enabled option.
  • Enter the principal/username that Presto will use when connecting to HPE Ezmeral Data Fabric in the Hive Hdfs Presto Principal field. If this field is not visible, perform a search for it in the Hive Advanced Settings search field.
For additional information, see Using MAPRSASL to Authenticate to Hive Metastore on HPE Ezmeral Data Fabric.

CTAS query on Hive Metastore in HPE Ezmeral Data Fabric fails
For Hive connections that authenticate to HPE Ezmeral Data Fabric via MAPRSASL, running a CTAS query against HPE Ezmeral Data Fabric returns the following error:
Database 'pa' location does not exist:<file_path>

Workaround: To resolve this issue, create and upload a configuration file that points to the HPE Ezmeral Data Fabric cluster, as described in Using MAPRSASL to Authenticate to Hive Metastore on HPE Ezmeral Data Fabric.

The Hive connection to HPE Ezmeral Data Fabric exists after deleting files
Deleting the cluster details and tickets from the mapr-clusters.conf and maprtickets files does not terminate the Hive connection to HPE Ezmeral Data Fabric. Users can still create new Hive connections to HPE Ezmeral Data Fabric and run queries against HPE Ezmeral Data Fabric. This issue occurs because HPE Ezmeral Unified Analytics Software caches the HPE Ezmeral Data Fabric files.
Workaround: After you delete the cluster details and tickets from the mapr-clusters.conf and maprtickets files, restart the EzPresto pods. To restart the pods, run:
kubectl rollout restart statefulset -n ezpresto ezpresto-sts-mst 

kubectl rollout restart statefulset -n ezpresto ezpresto-sts-wrk 
Optional Fields display by default when connecting an Iceberg data source
When adding Iceberg as a data source, the UI lists all possible connection fields (mandatory and optional) instead of listing the mandatory connection fields only.
EzPresto does not release memory when a query completes

EzPresto retains allocated memory after query completion for subsequent queries because of an open-source issue (https://github.com/prestodb/presto/issues/15637). For example, if a query uses 10GB of memory, EzPresto does not release the memory when the query completes and then uses it for the next query. If the next query requires additional memory, for instance, 12GB, EzPresto accumulates an extra 2GB and does not release it after query completion. For assistance, contact HPE support.

Worker nodes do not automatically spawn with JobSubmissionClient in the Ray cluster

When submitting jobs to the Ray cluster using JobSubmissionClient, worker nodes do not spawn automatically.

Workaround

To ensure proper functionality when submitting Ray jobs using JobSubmissionClient, you must manually specify entry point resources as follows:
  • For CPU, set entrypoint_num_cpus to 1
  • For GPU, set entrypoint_num_gpus to 1
For details, see Using JobSubmissionClient to Submit Ray Jobs.

HPE is actively engaging with the community to address this open-source issue (https://github.com/ray-project/ray/issues/42436).

NVIDIA GPU cannot enforce SELinux
Due to a known NVIDIA GPU issue (https://github.com/NVIDIA/gpu-operator/issues/553), SELinux cannot be enforced for GPU deployments.
Workaround
Set GPU hosts to either disabled or permissive mode until this issue is resolved.

Ray dashboard UI
A known Ray issue prevents the Ray Dashboard UI from displaying the GPU worker group details correctly. To see updates regarding resolution and to learn more, see https://github.com/ray-project/ray/issues/14664.

Upgrade on OpenShift cluster
If you want to perform an in-place upgrade of HPE Ezmeral Unified Analytics Software on an Openshift cluster, contact HPE support for assistance to ensure a smooth transition and to address any potential complexities that can arise during the upgrade process.

Installation

Before you install or upgrade, HPE recommends that you back up your data.  If you encounter any issues during or after the installation or upgrade process, please contact HPE Support. We appreciate your feedback and strive to continually enhance your product experience. 

Additional Resources

Thank you for choosing HPE Ezmeral Unified Analytics Software. Enjoy the new features and improvements introduced in this release.