EzPresto
Describes how to identify and debug issues for EzPresto .
Cannot create Iceberg connections with hadoop catalog type from the UI
HPE Ezmeral Unified Analytics Software supports
Iceberg connections with the hadoop catalog type. However, you cannot create the Iceberg
connection through the HPE Ezmeral Unified Analytics Software UI. You must create the connection from the command line using a
curl
command with a JSON configuration that calls the EzPresto backend API to create the data source
connection.
- HPE Ezmeral Data Fabric Object Store
- HPE Ezmeral Data Fabric File Store
- Local or mounted file system that is locally accessible
- Create a JSON configuration for your storage type, replacing all values in angle
brackets (
<>
) with values for your environment. The following sections provide JSON configurations for each storage type:- HPE Ezmeral Data Fabric Object Store
-
{ "catalogName": "<catalog_name>", "connectorName": "iceberg", "properties": { "iceberg.catalog.type": "hadoop", "iceberg.catalog.warehouse": "<S3 Warehouse Location>", "iceberg.catalog.cached-catalog-num": "10", "hive.s3.aws-access-key": "<S3 Access key>", "hive.s3.aws-secret-key": "<S3 Secret Key>", "hive.s3.endpoint": "<S3 End Point>", "hive.s3.path-style-access": true, "hive.s3.ssl.enabled": false } }
- HPE Ezmeral Data Fabric File Store
-
{ "catalogName": "<catalog_name>", "connectorName": "iceberg", "properties": { "iceberg.catalog.type": "hadoop", "hive.hdfs.authentication.type": "MAPRSASL", "df.cluster.details": "<DF Cluster Details>", "hive.hdfs.df.ticket":"<DF Cluster Ticker Details>", "iceberg.catalog.warehouse": "<MAPR FS Warehouse Location>" }, "fileProperties": { "iceberg.hadoop.config.resources": [ "PGNvbmZpZ3VyYXRpb24+CiAgICA8cHJvcGVydHk+CiAgICAgICAgPG5hbWU+ZnMubWFwcmZzLmltcGw8L25hbWU+CiAgICAgICAgPHZhbHVlPmNvbS5tYXByLmZzLk1hcFJGaWxlU3lzdGVtPC92YWx1ZT4KICAgIDwvcHJvcGVydHk+CjwvY29uZmlndXJhdGlvbj4=" ] } }
- Local or mounted file system that is locally accessible
-
{ "catalogName": "<catalog_name>", "connectorName": "iceberg", "properties": { "iceberg.catalog.type": "hadoop", "iceberg.catalog.warehouse": "<Locally Mounted Warehouse Location>" } }
- Run the following
curl
command from any machine that can access the Unified Analytics cluster endpoint (https://<your-ua-cluster-domain>.com/v1/catalog
):curl -u <username>:<password> --location '<EZPresto End Point>/v1/catalog' --header 'Content-Type: application/json' --insecure --data '<JSON DATA>' //<username>:<password> (Replace with your Unified Analytics username and password.) //<EzPresto End Point> (Go to Tools & Frameworks > Data Engineering > EzPresto to get the endpoint URL.) //<JSON DATA> (Use the JSON configuration from step 1.)
IMPORTANTOn a Unified Analytics 1.5.2 cluster, you must include a refresh token instead of a password. For example:
To generate a refresh token, go to the following URL in an incognito browser:curl -u <username>:<refresh_token> --location '<EZPresto End Point>/v1/catalog' --header 'Content-Type: application/json' --insecure --data '<JSON DATA>'
When prompted, enter your Unified Analytics credentials. Ahttps://token-service.<your-ua-cluster-domain>/refresh-token-download
refresh-token.txt
file automatically downloads. This file contains the refresh token that you use when you run thecurl
command.You should now see the Iceberg connection in Unified Analytics by going to Data Engineering > Data Sources and clicking on tab that correlates with the data source type, such as Structured Data.
EzPresto installation fails due to mysql pod entering CrashLoopBackOff state
During EzPresto deployment, the HPE Ezmeral Unified Analytics Software installation fails due to slow disk I/O, which leads to the mysql pod in EzPresto entering a CrashLoopBackOff state.
When the mysql pod is deployed, a lifecycle hook expects the pod to be ready within thirty seconds. If the pod is not ready within thirty seconds, Kubernetes continuously tries to restart the pod which leads to the pod being in a CrashLoopBackOff state.
- Stop the mysql
pod:
kubectl scale deployment ezpresto-dep-mysql --replicas=0 -n ezpresto
- Edit the mysql
deployment:
kubectl edit deployment ezpresto-dep-mysql -n ezpresto
- Remove the following lifecycle hook:
lifecycle: postStart: exec: command: - "sh" - "-c" - > sleep 30 ; mysql -u root -p$MYSQL_ROOT_PASSWORD -e "GRANT ALL PRIVILEGES ON *.* TO '$MYSQL_USER'@'%' WITH GRANT OPTION";
- Delete the mysql
pvc:
kubectl delete pvc ezpresto-pvc-mysql -n ezpresto
- Create a file named
mysql.pvc
and copy the following content into the file:apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: meta.helm.sh/release-name: ezpresto meta.helm.sh/release-namespace: ezpresto volume.beta.kubernetes.io/storage-provisioner: com.mapr.csi-kdf volume.kubernetes.io/storage-provisioner: com.mapr.csi-kdf labels: app.kubernetes.io/managed-by: Helm name: ezpresto-pvc-mysql namespace: ezpresto spec: accessModes: - ReadWriteMany resources: requests: storage: 5Gi storageClassName: edf volumeMode: Filesystem
- Create a mysql pvc:
kubectl apply -f mysql.pvc -n ezpresto
- Start the mysql
pods:
kubectl scale deployment ezpresto-dep-mysql --replicas=1 -n ezpresto
- Restart the web service
pods:
kubectl rollout restart deployment ezpresto-dep-web -n ezpresto
ezpresto
namespace are
running.Trying to Access a Hive Directory Results in an Access Denied Error
hadoop fs [-chown [-R] [OWNER][:[GROUP]] PATH...]
hadoop fs [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
For example, SSH in to the HPE Ezmeral Data Fabric cluster node. If the mapr
user ticket was used
for hive impersonation, then it should be used for following
operation:export MAPR_TICKETFILE_LOCATION=/home/bob123/mapruserticket
hadoop fs -chown bob123:ldap maprfs://user01/user/hive/warehouse/foo.db
hadoop fs -chmod 775 maprfs://user01/user/hive/warehouse/foo.db
Cannot Add Iceberg as a Data Source when Catalog Type is Hadoop
- Workaround for New Installation
- To connect Unified Analytics to an
Iceberg data source with Catalog Type set as Hadoop, complete the following steps:
- To update the EzPresto images,
run the following
kubectl
commands:kubectl set image statefulset/ezpresto-sts-mst presto-coordinator=marketplace.us1.greenlake-hpe.com/ezua/gcr.io/mapr-252711/ezsql-test/presto-0.285-fy24-q2:0.0.61 --namespace=ezpresto kubectl set image statefulset/ezpresto-sts-wrk presto-worker=marketplace.us1.greenlake-hpe.com/ezua/gcr.io/mapr-252711/ezsql-test/presto-0.285-fy24-q2:0.0.61 --namespace=ezpresto
- Sign in to Unified Analytics and add the Iceberg data source with the Catalog Type set as Hadoop.
- To update the EzPresto images,
run the following
- Workaround for Upgrade
- If you want to upgrade Unified Analytics from version 1.3 to 1.4, and you have an Iceberg data source in
place with Catalog Type set as Hadoop, complete the following steps:
- Sign in to Unified Analytics.
- Delete the Iceberg connection.
- Upgrade to Unified Analytics version 1.4.
- To update the EzPresto images,
run the following
kubectl
commands:kubectl set image statefulset/ezpresto-sts-mst presto-coordinator=marketplace.us1.greenlake-hpe.com/ezua/gcr.io/mapr-252711/ezsql-test/presto-0.285-fy24-q2:0.0.61 --namespace=ezpresto kubectl set image statefulset/ezpresto-sts-wrk presto-worker=marketplace.us1.greenlake-hpe.com/ezua/gcr.io/mapr-252711/ezsql-test/presto-0.285-fy24-q2:0.0.61 --namespace=ezpresto
- Sign in to Unified Analytics and add the Iceberg data source with the Catalog Type set as Hadoop.
Insufficient Memory
- In the left navigation bar, go to Tools & Frameworks > Data Engineering > EzPresto.
- Click on the three dots and select Configure.
- In window that appears, remove the entire
cmnConfigMaps
section and replace it with the following:cmnConfigMaps: # Configmaps common to both Presto Master and Worker logConfig: log.properties: | # Enable verbose logging from Presto #com.facebook.presto=DEBUG # Configmaps specific to Presto Master prestoMst: cmnPrestoCoordinatorConfig: config.properties: | http-server.http.port={{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }} discovery.uri=http://{{ tpl .Values.ezsqlPresto.locatorService.fullname $ }}:{{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }} coordinator=true node-scheduler.include-coordinator=false discovery-server.enabled=true catalog.config-dir = {{ .Values.ezsqlPresto.stsDeployment.volumeMount.mountPathCatalog }} catalog.disabled-connectors-for-dynamic-operation=drill,parquet,csv,salesforce,sharepoint,prestodb,raptor,kudu,redis,accumulo,elasticsearch,redshift,localfile,bigquery,prometheus,mongodb,pinot,druid,cassandra,kafka,atop,presto-thrift,ampool,hive-cache,memory,blackhole,tpch,tpcds,system,example-http,jmx generic-cache-enabled=true transparent-cache-enabled=false generic-cache-catalog-name=cache generic-cache-change-detection-interval=300 catalog.config-dir.shared=true node.environment=production plugin.dir=/usr/lib/presto/plugin log.output-file=/data/presto/server.log log.levels-file=/usr/lib/presto/etc/log.properties query.max-history=1000 query.max-stage-count=1000 query.max-memory={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB query.max-total-memory={{ mulf 0.7 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB # query.max-memory-per-node={{ mulf 0.5 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB # query.max-total-memory-per-node={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB # memory.heap-headroom-per-node={{ mulf 0.3 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB experimental.spill-enabled=false experimental.spiller-spill-path=/tmp orm-database-url=jdbc:sqlite:/data/cache/metadata.db plugin.disabled-connectors=accumulo,atop,cassandra,example-http,kafka,kudu,localfile,memory,mongodb,pinot,presto-bigquery,prestodb,presto-druid,presto-elasticsearch,prometheus,raptor,redis,redshift log.max-size=100MB log.max-history=10 discovery.http-client.max-requests-queued-per-destination=10000 dynamic.http-client.max-requests-queued-per-destination=10000 event.http-client.max-requests-queued-per-destination=10000 exchange.http-client.max-requests-queued-per-destination=10000 failure-detector.http-client.max-requests-queued-per-destination=10000 memoryManager.http-client.max-requests-queued-per-destination=10000 node-manager.http-client.max-requests-queued-per-destination=10000 scheduler.http-client.max-requests-queued-per-destination=10000 workerInfo.http-client.max-requests-queued-per-destination=10000 # Configmaps specific to Presto Worker prestoWrk: prestoWorkerConfig: config.properties: | coordinator=false http-server.http.port={{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }} discovery.uri=http://{{ tpl .Values.ezsqlPresto.locatorService.fullname $ }}:{{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }} catalog.config-dir = {{ .Values.ezsqlPresto.stsDeployment.volumeMount.mountPathCatalog }} catalog.disabled-connectors-for-dynamic-operation=drill,parquet,csv,salesforce,sharepoint,prestodb,raptor,kudu,redis,accumulo,elasticsearch,redshift,localfile,bigquery,prometheus,mongodb,pinot,druid,cassandra,kafka,atop,presto-thrift,ampool,hive-cache,memory,blackhole,tpch,tpcds,system,example-http,jmx generic-cache-enabled=true transparent-cache-enabled=false generic-cache-catalog-name=cache catalog.config-dir.shared=true node.environment=production plugin.dir=/usr/lib/presto/plugin log.output-file=/data/presto/server.log log.levels-file=/usr/lib/presto/etc/log.properties query.max-memory={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB query.max-total-memory={{ mulf 0.7 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB query.max-memory-per-node={{ mulf 0.5 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB query.max-total-memory-per-node={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB memory.heap-headroom-per-node={{ mulf 0.2 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB experimental.spill-enabled=false experimental.spiller-spill-path=/tmp orm-database-url=jdbc:sqlite:/data/cache/metadata.db plugin.disabled-connectors=accumulo,atop,cassandra,example-http,kafka,kudu,localfile,memory,mongodb,pinot,presto-bigquery,prestodb,presto-druid,presto-elasticsearch,prometheus,raptor,redis,redshift log.max-size=100MB log.max-history=10 discovery.http-client.max-requests-queued-per-destination=10000 event.http-client.max-requests-queued-per-destination=10000 exchange.http-client.max-requests-queued-per-destination=10000 node-manager.http-client.max-requests-queued-per-destination=10000 workerInfo.http-client.max-requests-queued-per-destination=10000 ### values_cmn_configmap.yaml contents END
- Click Configure to update the configuration on each of the presto pods and restart the pods. This operation takes a few minutes.
Failed Queries
If queries fail, go to the Presto UI and view the stack trace for the queries. You can also view the EzPresto log files.
You can access the Presto UI from the HPE Ezmeral Unified Analytics Software UI.
- In the left navigation bar, select Tools & Frameworks.
- Select the Data Engineering tab.
- In the EzPresto tile, click on the Endpoint URL.
- In the Presto UI, select the Failed state.
- Locate the query and click on the Query ID.
- Scroll down to the Error Information section to view the stack trace.
You can also view the logs in the shared directory.
- In the left navigation bar, select Data Engineering > Data Sources.
- On the Data Sources screen, click Browse.
- Select the following directories in the order shown:
- shared/
- logs/
- apps/
- app-core/
- ezpresto/
- Select the log directory for which you want to view EzPresto logs.
Hive Data Source Connection Failure (S3-Based External Data Souce)
- Files have 0 length
- The folder that contains the CSV or Parquet files has files with 0 length. For
example, the files are empty or they are like the files generated by Spark jobs
(_SUCCESS).
Workaround: Remove the empty files.
- CSV file with an empty line
- A CSV file has an empty line either in the data or in the last line of the
file.
Workaround: Remove the empty lines in the file.
- S3 folder with incorrect MIME type
- The S3 folder that contains the CSV and Parquet files was created through the
HPE Ezmeral Data Fabric Object Store UI. In pre-1.3 versions of HPE Ezmeral Unified Analytics Software, EzPresto does not recognize the folders created through the HPE Ezmeral Data Fabric Object Store UI because the S3 folder MIME type is different than the type set by
AWS
s3cmd
.Workaround: Use AWSs3cmd
to create a folder and upload files to a bucket in HPE Ezmeral Data Fabric Object Store, for example,s3://<bucket>/<folder1>/<folder2>/data.csv
.NOTEYou cannot put files directly in the Data Dir path that you specified when you created the Hive connection. You must create a folder within the Data Dir path that you specified and put files there. For example, if you entereds3://mytestbucket/
as the Data Dir, you must create a folder within that directory, such ass3://mytestbucket/data/
and put files there.
Data Source Connection Failure (File-Based)
If a file system-based data connection fails, verify that the storage or file location
starts with the appropriate scheme, for example maprfs://
,
hdfs://
, or file:/
.