EzPresto

Describes how to identify and debug issues for EzPresto .

Failure to connect to an MSSQL data source

When trying to connect HPE Ezmeral Unified Analytics Software to an MSSQL data source with a 2012 (SP4) MSSQL server or earlier version, the HPE Ezmeral Unified Analytics Software UI raises the following error:

"Unable to add data source "supersetreaderonenetdb".
test connection unsuccessful: The driver could not establish 
a secure connection to SQL Server by using Secure Sockets Layer (SSL)
encryption. Error: "Certificates do not conform to algorithm constraints". 
ClientConnectionId:...

Also, the Presto logs include the following error:

"Algorithm constraints check failed on keysize limits: RSA 1024 bit
key used with certificate: CN=SSL_Self_Signed_Fallback," when trying to register
the SQL Server data source."

To mitigate this issue, provide the java.security file to the Presto JVM process. This file allows the use of a permissive RSA 1024-bit key, which is required by SQL Server.

Complete the following steps to attach the java.security file as -Djava.security.properties=/data/shared/java/java.security:

Upload the java.security file to the shared directory in the HPE Ezmeral Unified Analytics Software UI.
1. Sign in to HPE Ezmeral Unified Analytics Software.
2. In the left navigation bar, go to Data Engineering > Data Sources > Data Volumes.
3. Select the shared directory.
4. Upload a java.security file to the shared directory.
Configure EzPresto to use the java.security file.
1. In the left navigation bar, go to Tools & Frameworks > Data Engineering > EzPresto.
2. Click on the three dots and select Configure.
3. In the JVM configs for both the Presto master and worker, append the java.security.properties property.
  1. Go to the section prestoMst jvmConfig section (cmnConfigMaps > prestoMst > jvmConfig) and append it with:
```
"-Djava.security.properties=/data/shared/java.security"
```
  2. Go to the prestoWrk jvmConfig section (cmnConfigMaps > prestoWrk > jvmConfig) and append it with:
```
"-Djava.security.properties=/data/shared/java.security"
```
Click Configure to update the configuration on each of the Presto pods and restart the pods. This operation can take a few minutes.

Cannot create Iceberg connections with hadoop catalog type from the UI

HPE Ezmeral Unified Analytics Software supports Iceberg connections with the hadoop catalog type. However, you cannot create the Iceberg connection through the HPE Ezmeral Unified Analytics Software UI. You must create the connection from the command line using a curl command with a JSON configuration that calls the EzPresto backend API to create the data source connection.

You can create Iceberg connections (with hadoop catalog type) to the following types of storage:

HPE Ezmeral Data Fabric Object Store
HPE Ezmeral Data Fabric File Store
Local or mounted file system that is locally accessible

To create an Iceberg connection with catalog type hadoop:

Create a JSON configuration for your storage type, replacing all values in angle brackets (<>) with values for your environment. The following sections provide JSON configurations for each storage type:

HPE Ezmeral Data Fabric Object Store

{
    "catalogName": "<catalog_name>",
    "connectorName": "iceberg",
    "properties": {
        "iceberg.catalog.type": "hadoop",
        "iceberg.catalog.warehouse": "<S3 Warehouse Location>",
        "iceberg.catalog.cached-catalog-num": "10",
        "hive.s3.aws-access-key": "<S3 Access key>",
        "hive.s3.aws-secret-key": "<S3 Secret Key>",
        "hive.s3.endpoint": "<S3 End Point>",
        "hive.s3.path-style-access": true,
        "hive.s3.ssl.enabled": false
    }
}

HPE Ezmeral Data Fabric File Store

{
  "catalogName": "<catalog_name>",
  "connectorName": "iceberg",
  "properties": {
    "iceberg.catalog.type": "hadoop",
    "hive.hdfs.authentication.type": "MAPRSASL",
    "df.cluster.details": "<DF Cluster Details>",
    "hive.hdfs.df.ticket":"<DF Cluster Ticker Details>",
    "iceberg.catalog.warehouse": "<MAPR FS Warehouse Location>"
  },
  "fileProperties": {
    "iceberg.hadoop.config.resources": [
      "PGNvbmZpZ3VyYXRpb24+CiAgICA8cHJvcGVydHk+CiAgICAgICAgPG5hbWU+ZnMubWFwcmZzLmltcGw8L25hbWU+CiAgICAgICAgPHZhbHVlPmNvbS5tYXByLmZzLk1hcFJGaWxlU3lzdGVtPC92YWx1ZT4KICAgIDwvcHJvcGVydHk+CjwvY29uZmlndXJhdGlvbj4="
    ]
  }
}

Local or mounted file system that is locally accessible

{
    "catalogName": "<catalog_name>",
    "connectorName": "iceberg",
    "properties": {
        "iceberg.catalog.type": "hadoop",
        "iceberg.catalog.warehouse": "<Locally Mounted Warehouse Location>"
    }
}

Run the following curl command from any machine that can access the Unified Analytics cluster endpoint (https://<your-ua-cluster-domain>.com/v1/catalog):
```
curl -u <username>:<password> --location '<EZPresto End Point>/v1/catalog' --header 'Content-Type: application/json' --insecure --data '<JSON DATA>'

//<username>:<password> (Replace with your Unified Analytics username and password.)
//<EzPresto End Point> (Go to Tools & Frameworks > Data Engineering > EzPresto to get the endpoint URL.)
//<JSON DATA> (Use the JSON configuration from step 1.)
```
IMPORTANT
On a Unified Analytics 1.5.2 cluster, you must include a refresh token instead of a password. For example:
```
curl -u <username>:<refresh_token> --location '<EZPresto End Point>/v1/catalog' --header 'Content-Type: application/json' --insecure --data '<JSON DATA>'
```
To generate a refresh token, go to the following URL in an incognito browser:
```
https://token-service.<your-ua-cluster-domain>/refresh-token-download
```
When prompted, enter your Unified Analytics credentials. A refresh-token.txt file automatically downloads. This file contains the refresh token that you use when you run the curl command.
You should now see the Iceberg connection in Unified Analytics by going to Data Engineering > Data Sources and clicking on tab that correlates with the data source type, such as Structured Data.

EzPresto installation fails due to mysql pod entering CrashLoopBackOff state

During EzPresto deployment, the HPE Ezmeral Unified Analytics Software installation fails due to slow disk I/O, which leads to the mysql pod in EzPresto entering a CrashLoopBackOff state.

When the mysql pod is deployed, a lifecycle hook expects the pod to be ready within thirty seconds. If the pod is not ready within thirty seconds, Kubernetes continuously tries to restart the pod which leads to the pod being in a CrashLoopBackOff state.

To resolve this issue, complete the following steps:

Stop the mysql pod:

kubectl scale deployment ezpresto-dep-mysql --replicas=0 -n ezpresto

Edit the mysql deployment:

kubectl edit deployment ezpresto-dep-mysql -n ezpresto

Remove the following lifecycle hook:

lifecycle: 
          postStart: 
            exec: 
              command: 
                - "sh" 
                - "-c" 
                - > 
                  sleep 30 ; 
                  mysql -u root -p$MYSQL_ROOT_PASSWORD -e "GRANT ALL PRIVILEGES ON *.* TO '$MYSQL_USER'@'%' WITH GRANT OPTION";

Delete the mysql pvc:

kubectl delete pvc ezpresto-pvc-mysql -n ezpresto

Create a file named mysql.pvc and copy the following content into the file:

apiVersion: v1 
kind: PersistentVolumeClaim 
metadata: 
  annotations: 
    meta.helm.sh/release-name: ezpresto 
    meta.helm.sh/release-namespace: ezpresto 
    volume.beta.kubernetes.io/storage-provisioner: com.mapr.csi-kdf 
    volume.kubernetes.io/storage-provisioner: com.mapr.csi-kdf 
  labels: 
    app.kubernetes.io/managed-by: Helm 
  name: ezpresto-pvc-mysql 
  namespace: ezpresto 
spec: 
  accessModes: 
  - ReadWriteMany 
  resources: 
    requests: 
      storage: 5Gi 
  storageClassName: edf 
  volumeMode: Filesystem

Create a mysql pvc:
```
kubectl apply -f mysql.pvc -n ezpresto 
```

Start the mysql pods:

kubectl scale deployment ezpresto-dep-mysql --replicas=1 -n ezpresto

Restart the web service pods:

kubectl rollout restart deployment ezpresto-dep-web -n ezpresto

Installation is complete and you can use EzPresto once all pods in the ezpresto namespace are running.

Trying to Access a Hive Directory Results in an Access Denied Error

Any schema created with impersonation returns an access denied error if the directory ownership is not set correctly for the impersonating user. To avoid access denied errors, correct the ownership/permissions on the directory before performing any operations:

hadoop fs [-chown [-R] [OWNER][:[GROUP]] PATH...]
hadoop fs [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]

For example, SSH in to the HPE Ezmeral Data Fabric cluster node. If the mapr user ticket was used for hive impersonation, then it should be used for following operation:

export MAPR_TICKETFILE_LOCATION=/home/bob123/mapruserticket
hadoop fs -chown bob123:ldap maprfs://user01/user/hive/warehouse/foo.db
hadoop fs -chmod 775 maprfs://user01/user/hive/warehouse/foo.db

Cannot Add Iceberg as a Data Source when Catalog Type is Hadoop

Recent changes introduced by open source PrestoDB cause Iceberg data connections to fail in Unified Analytics when the Catalog Type is Hadoop.

Workaround for New Installation

To connect Unified Analytics to an Iceberg data source with Catalog Type set as Hadoop, complete the following steps:

To update the EzPresto images, run the following kubectl commands:

kubectl set image statefulset/ezpresto-sts-mst presto-coordinator=marketplace.us1.greenlake-hpe.com/ezua/gcr.io/mapr-252711/ezsql-test/presto-0.285-fy24-q2:0.0.61 --namespace=ezpresto

kubectl set image statefulset/ezpresto-sts-wrk presto-worker=marketplace.us1.greenlake-hpe.com/ezua/gcr.io/mapr-252711/ezsql-test/presto-0.285-fy24-q2:0.0.61 --namespace=ezpresto

Workaround for Upgrade

If you want to upgrade Unified Analytics from version 1.3 to 1.4, and you have an Iceberg data source in place with Catalog Type set as Hadoop, complete the following steps:

Sign in to Unified Analytics.
Delete the Iceberg connection.
Upgrade to Unified Analytics version 1.4.

To update the EzPresto images, run the following kubectl commands:

kubectl set image statefulset/ezpresto-sts-mst presto-coordinator=marketplace.us1.greenlake-hpe.com/ezua/gcr.io/mapr-252711/ezsql-test/presto-0.285-fy24-q2:0.0.61 --namespace=ezpresto

kubectl set image statefulset/ezpresto-sts-wrk presto-worker=marketplace.us1.greenlake-hpe.com/ezua/gcr.io/mapr-252711/ezsql-test/presto-0.285-fy24-q2:0.0.61 --namespace=ezpresto

Insufficient Memory

Currently, the maximum memory available to queries is based on the memory resources of a single worker node instead of total cluster memory (all worker nodes). As a result, queries may fail due to insufficient memory. To address this issue, modify the EzPresto configuration as described in the following steps:

In the left navigation bar, go to Tools & Frameworks > Data Engineering > EzPresto.
Click on the three dots and select Configure.

In window that appears, remove the entire cmnConfigMaps section and replace it with the following:

cmnConfigMaps:
  # Configmaps common to both Presto Master and Worker
  logConfig:
    log.properties: |
      # Enable verbose logging from Presto
      #com.facebook.presto=DEBUG
 
  # Configmaps specific to Presto Master
  prestoMst:
    cmnPrestoCoordinatorConfig:
      config.properties: |
        http-server.http.port={{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }}
        discovery.uri=http://{{ tpl .Values.ezsqlPresto.locatorService.fullname $ }}:{{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }}
        coordinator=true
        node-scheduler.include-coordinator=false
        discovery-server.enabled=true
        catalog.config-dir = {{ .Values.ezsqlPresto.stsDeployment.volumeMount.mountPathCatalog }}
        catalog.disabled-connectors-for-dynamic-operation=drill,parquet,csv,salesforce,sharepoint,prestodb,raptor,kudu,redis,accumulo,elasticsearch,redshift,localfile,bigquery,prometheus,mongodb,pinot,druid,cassandra,kafka,atop,presto-thrift,ampool,hive-cache,memory,blackhole,tpch,tpcds,system,example-http,jmx
        generic-cache-enabled=true
        transparent-cache-enabled=false
        generic-cache-catalog-name=cache
        generic-cache-change-detection-interval=300
        catalog.config-dir.shared=true
        node.environment=production
        plugin.dir=/usr/lib/presto/plugin
        log.output-file=/data/presto/server.log
        log.levels-file=/usr/lib/presto/etc/log.properties
        query.max-history=1000
        query.max-stage-count=1000
        query.max-memory={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB
        query.max-total-memory={{ mulf 0.7 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB
        # query.max-memory-per-node={{ mulf 0.5 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB
        # query.max-total-memory-per-node={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB
        # memory.heap-headroom-per-node={{ mulf 0.3 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB
        experimental.spill-enabled=false
        experimental.spiller-spill-path=/tmp
        orm-database-url=jdbc:sqlite:/data/cache/metadata.db
        plugin.disabled-connectors=accumulo,atop,cassandra,example-http,kafka,kudu,localfile,memory,mongodb,pinot,presto-bigquery,prestodb,presto-druid,presto-elasticsearch,prometheus,raptor,redis,redshift
        log.max-size=100MB
        log.max-history=10
        discovery.http-client.max-requests-queued-per-destination=10000
        dynamic.http-client.max-requests-queued-per-destination=10000
        event.http-client.max-requests-queued-per-destination=10000
        exchange.http-client.max-requests-queued-per-destination=10000
        failure-detector.http-client.max-requests-queued-per-destination=10000
        memoryManager.http-client.max-requests-queued-per-destination=10000
        node-manager.http-client.max-requests-queued-per-destination=10000
        scheduler.http-client.max-requests-queued-per-destination=10000
        workerInfo.http-client.max-requests-queued-per-destination=10000
 
  # Configmaps specific to Presto Worker
  prestoWrk:
    prestoWorkerConfig:
      config.properties: |
        coordinator=false
        http-server.http.port={{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }}
        discovery.uri=http://{{ tpl .Values.ezsqlPresto.locatorService.fullname $ }}:{{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }}
        catalog.config-dir = {{ .Values.ezsqlPresto.stsDeployment.volumeMount.mountPathCatalog }}
        catalog.disabled-connectors-for-dynamic-operation=drill,parquet,csv,salesforce,sharepoint,prestodb,raptor,kudu,redis,accumulo,elasticsearch,redshift,localfile,bigquery,prometheus,mongodb,pinot,druid,cassandra,kafka,atop,presto-thrift,ampool,hive-cache,memory,blackhole,tpch,tpcds,system,example-http,jmx
        generic-cache-enabled=true
        transparent-cache-enabled=false
        generic-cache-catalog-name=cache
        catalog.config-dir.shared=true
        node.environment=production
        plugin.dir=/usr/lib/presto/plugin
        log.output-file=/data/presto/server.log
        log.levels-file=/usr/lib/presto/etc/log.properties
        query.max-memory={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB
        query.max-total-memory={{ mulf 0.7 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB
        query.max-memory-per-node={{ mulf 0.5 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB
        query.max-total-memory-per-node={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB
        memory.heap-headroom-per-node={{ mulf 0.2 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB
        experimental.spill-enabled=false
        experimental.spiller-spill-path=/tmp
        orm-database-url=jdbc:sqlite:/data/cache/metadata.db
        plugin.disabled-connectors=accumulo,atop,cassandra,example-http,kafka,kudu,localfile,memory,mongodb,pinot,presto-bigquery,prestodb,presto-druid,presto-elasticsearch,prometheus,raptor,redis,redshift
        log.max-size=100MB
        log.max-history=10
        discovery.http-client.max-requests-queued-per-destination=10000
        event.http-client.max-requests-queued-per-destination=10000
        exchange.http-client.max-requests-queued-per-destination=10000
        node-manager.http-client.max-requests-queued-per-destination=10000
        workerInfo.http-client.max-requests-queued-per-destination=10000
### values_cmn_configmap.yaml contents END

Click Configure to update the configuration on each of the presto pods and restart the pods. This operation takes a few minutes.

If this workaound does not resolve the issue, contact HPE Support.

Failed Queries

If queries fail, go to the Presto UI and view the stack trace for the queries. You can also view the EzPresto log files.

You can access the Presto UI from the HPE Ezmeral Unified Analytics Software UI.

In the left navigation bar, select Tools & Frameworks.
Select the Data Engineering tab.
In the EzPresto tile, click on the Endpoint URL.
In the Presto UI, select the Failed state.
Locate the query and click on the Query ID.
Scroll down to the Error Information section to view the stack trace.

You can also view the logs in the shared directory.

In the left navigation bar, select Data Engineering > Data Sources.
On the Data Sources screen, click Browse.
Select the following directories in the order shown:
1. shared/
2. logs/
3. apps/
4. app-core/
5. ezpresto/
Select the log directory for which you want to view EzPresto logs.

Hive Data Source Connection Failure (S3-Based External Data Souce)

The following sections describe some issues that can cause Hive connection failures when using Hive to connect to an external s3-based data source, such as HPE Ezmeral Data Fabric Object Store. A workaround is provided for each issue.

Files have 0 length

The folder that contains the CSV or Parquet files has files with 0 length. For example, the files are empty or they are like the files generated by Spark jobs (_SUCCESS).

Workaround: Remove the empty files.

CSV file with an empty line

A CSV file has an empty line either in the data or in the last line of the file.

Workaround: Remove the empty lines in the file.

S3 folder with incorrect MIME type

The S3 folder that contains the CSV and Parquet files was created through the HPE Ezmeral Data Fabric Object Store UI. In pre-1.3 versions of HPE Ezmeral Unified Analytics Software, EzPresto does not recognize the folders created through the HPE Ezmeral Data Fabric Object Store UI because the S3 folder MIME type is different than the type set by AWS s3cmd.

Workaround: Use AWS s3cmd to create a folder and upload files to a bucket in HPE Ezmeral Data Fabric Object Store, for example, s3://<bucket>/<folder1>/<folder2>/data.csv.

NOTE

You cannot put files directly in the Data Dir path that you specified when you created the Hive connection. You must create a folder within the Data Dir path that you specified and put files there. For example, if you entered s3://mytestbucket/ as the Data Dir, you must create a folder within that directory, such as s3://mytestbucket/data/ and put files there.

Data Source Connection Failure (File-Based)

If a file system-based data connection fails, verify that the storage or file location starts with the appropriate scheme, for example maprfs://, hdfs://, or file:/.

HPE Ezmeral Unified Analytics Software 1.5 Documentation
Abstract	HPE Ezmeral Unified Analytics Software is a usage-based Software-as-a-Service (SaaS) model that operationalizes hybrid and multi-cloud modern analytical workloads through a simple user interface, easily installed and deployed in minutes. HPE Ezmeral Unified Analytics Software separates compute and storage for flexible, cost-efficient scalability to securely access data stored in multiple data platforms, enabling you to run traditional and advanced analytics workloads with open-source tools.
Published	July 2025
Edition	1.5.0
Topic last updated	2025-01-10