Configuring a Hive Data Source with Kerberos Authentication

Describes the required prerequisite steps to complete before you connect HPE Ezmeral Unified Analytics Software to a Hive data source that uses Kerberos authentication.

You can connect HPE Ezmeral Unified Analytics Software to a Hive data source that uses a Hive metastore and Kerberos for authentication. However, before you create the connection, manually complete the following steps:

Step 1 - Upload a krb5 configuration file to the shared location

Step 2 - Configure EzPresto to use the krb5.conf file

Step 3 - Connect HPE Ezmeral Unified Analytics Software to the Hive data source

Step 1 - Upload a krb5 configuration file to the shared location

The krb5.conf file contains Kerberos configuration information, including the locations of the KDCs and admin servers for the Kerberos realms used in the Hive configuration. To upload the krb5.conf file to a shared location, complete the following steps:

Sign in to HPE Ezmeral Unified Analytics Software.
In the left navigation bar, go to Data Engineering > Data Sources > Data Volumes..
Select the shared directory.
Upload a krb5.conf file to the shared directory.
TIP
The name of the file must be krb5.conf.

Step 2 - Configure EzPresto to use the krb5.conf file

In the left navigation bar, go to Tools & Frameworks > Data Engineering > EzPresto.
Click on the three dots and select Configure.

In the window that appears, remove the entire cmnConfigMaps section and replace it with the following JVM properties:

cmnConfigMaps:
  # Configmaps common to both Presto Master and Worker
  logConfig:
    log.properties: |
      # Enable verbose logging from Presto
      #com.facebook.presto=DEBUG
  prestoMst:
    cmnPrestoCoordinatorConfig:
      config.properties: |
        http-server.http.port={{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }}
        discovery.uri=http://{{ tpl .Values.ezsqlPresto.locatorService.fullname $ }}:{{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }}
        coordinator=true
        node-scheduler.include-coordinator=false
        discovery-server.enabled=true
        catalog.config-dir = {{ .Values.ezsqlPresto.stsDeployment.volumeMount.mountPathCatalog }}
        catalog.disabled-connectors-for-dynamic-operation=drill,parquet,csv,salesforce,sharepoint,prestodb,raptor,kudu,redis,accumulo,elasticsearch,redshift,localfile,bigquery,prometheus,mongodb,pinot,druid,cassandra,kafka,atop,presto-thrift,ampool,hive-cache,memory,blackhole,tpch,tpcds,system,example-http,jmx
        generic-cache-enabled=true
        transparent-cache-enabled=false
        generic-cache-catalog-name=cache
        generic-cache-change-detection-interval=300
        catalog.config-dir.shared=true
        node.environment=production
        plugin.dir=/usr/lib/presto/plugin
        log.output-file=/data/presto/server.log
        log.levels-file=/usr/lib/presto/etc/log.properties
        query.max-history=1000
        query.max-stage-count=1000
        query.max-memory={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB
        query.max-total-memory={{ mulf 0.7 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB
        # query.max-memory-per-node={{ mulf 0.5 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB
        # query.max-total-memory-per-node={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB
        # memory.heap-headroom-per-node={{ mulf 0.3 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB
        experimental.spill-enabled=false
        experimental.spiller-spill-path=/tmp
        orm-database-url=jdbc:sqlite:/data/cache/metadata.db
        plugin.disabled-connectors=accumulo,atop,cassandra,example-http,kafka,kudu,localfile,memory,mongodb,pinot,presto-bigquery,prestodb,presto-druid,presto-elasticsearch,prometheus,raptor,redis,redshift
        log.max-size=100MB
        log.max-history=10
        discovery.http-client.max-requests-queued-per-destination=10000
        dynamic.http-client.max-requests-queued-per-destination=10000
        event.http-client.max-requests-queued-per-destination=10000
        exchange.http-client.max-requests-queued-per-destination=10000
        failure-detector.http-client.max-requests-queued-per-destination=10000
        memoryManager.http-client.max-requests-queued-per-destination=10000
        node-manager.http-client.max-requests-queued-per-destination=10000
        scheduler.http-client.max-requests-queued-per-destination=10000
        workerInfo.http-client.max-requests-queued-per-destination=10000
    jvmConfig:
      jvm.config: |
        -server
        -Xms{{ tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.minHeapSize . | floor }}M
        -Xmx{{ tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . | floor }}M
        -XX:-UseBiasedLocking
        -XX:+UseG1GC
        -XX:G1HeapRegionSize={{ .Values.ezsqlPresto.configMapProp.mst.jvmProp.G1HeapRegionSize }}
        -XX:+ExplicitGCInvokesConcurrent
        -XX:+HeapDumpOnOutOfMemoryError
        -XX:+UseGCOverheadLimit
        -XX:+ExitOnOutOfMemoryError
        -XX:ReservedCodeCacheSize={{ .Values.ezsqlPresto.configMapProp.mst.jvmProp.ReservedCodeCacheSize }}
        -XX:PerMethodRecompilationCutoff=10000
        -XX:PerBytecodeRecompilationCutoff=10000
        -Djdk.attach.allowAttachSelf=true
        -Djdk.nio.maxCachedBufferSize={{ .Values.ezsqlPresto.configMapProp.jvmProp.maxCachedBufferSize }}
        -Dcom.amazonaws.sdk.disableCertChecking=true
        -Djava.security.krb5.conf=/data/shared/krb5.conf
  prestoWrk:
    prestoWorkerConfig:
      config.properties: |
        coordinator=false
        http-server.http.port={{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }}
        discovery.uri=http://{{ tpl .Values.ezsqlPresto.locatorService.fullname $ }}:{{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }}
        catalog.config-dir = {{ .Values.ezsqlPresto.stsDeployment.volumeMount.mountPathCatalog }}
        catalog.disabled-connectors-for-dynamic-operation=drill,parquet,csv,salesforce,sharepoint,prestodb,raptor,kudu,redis,accumulo,elasticsearch,redshift,localfile,bigquery,prometheus,mongodb,pinot,druid,cassandra,kafka,atop,presto-thrift,ampool,hive-cache,memory,blackhole,tpch,tpcds,system,example-http,jmx
        generic-cache-enabled=true
        transparent-cache-enabled=false
        generic-cache-catalog-name=cache
        catalog.config-dir.shared=true
        node.environment=production
        plugin.dir=/usr/lib/presto/plugin
        log.output-file=/data/presto/server.log
        log.levels-file=/usr/lib/presto/etc/log.properties
        query.max-memory={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB
        query.max-total-memory={{ mulf 0.7 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB
        query.max-memory-per-node={{ mulf 0.5 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB
        query.max-total-memory-per-node={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB
        memory.heap-headroom-per-node={{ mulf 0.2 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB
        experimental.spill-enabled=false
        experimental.spiller-spill-path=/tmp
        orm-database-url=jdbc:sqlite:/data/cache/metadata.db
        plugin.disabled-connectors=accumulo,atop,cassandra,example-http,kafka,kudu,localfile,memory,mongodb,pinot,presto-bigquery,prestodb,presto-druid,presto-elasticsearch,prometheus,raptor,redis,redshift
        log.max-size=100MB
        log.max-history=10
        discovery.http-client.max-requests-queued-per-destination=10000
        event.http-client.max-requests-queued-per-destination=10000
        exchange.http-client.max-requests-queued-per-destination=10000
        node-manager.http-client.max-requests-queued-per-destination=10000
        workerInfo.http-client.max-requests-queued-per-destination=10000
    jvmConfig:
      jvm.config: |
        -server
        -Xms{{ tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.minHeapSize . | floor }}M
        -Xmx{{ tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . | floor }}M
        -XX:-UseBiasedLocking
        -XX:+UseG1GC
        -XX:G1HeapRegionSize={{ .Values.ezsqlPresto.configMapProp.wrk.jvmProp.G1HeapRegionSize }}
        -XX:+ExplicitGCInvokesConcurrent
        -XX:+HeapDumpOnOutOfMemoryError
        -XX:+UseGCOverheadLimit
        -XX:+ExitOnOutOfMemoryError
        -XX:ReservedCodeCacheSize={{ .Values.ezsqlPresto.configMapProp.wrk.jvmProp.ReservedCodeCacheSize }}
        -XX:PerMethodRecompilationCutoff=10000
        -XX:PerBytecodeRecompilationCutoff=10000
        -Djdk.attach.allowAttachSelf=true
        -Djdk.nio.maxCachedBufferSize={{ .Values.ezsqlPresto.configMapProp.jvmProp.maxCachedBufferSize }}
        -Dcom.amazonaws.sdk.disableCertChecking=true
        -Djava.security.krb5.conf=/data/shared/krb5.conf
### values_cmn_configmap.yaml contents END

Click Configure. This updates the configuration on each of the presto pods and restarts the pods. This operation can take a few minutes.

Step 3 - Connect HPE Ezmeral Unified Analytics Software to the Hive data source

In the left navigation bar, go to Data Engineering > Data Sources.
Click Add New Data Source.
In the Hive tile, click Create Connection.

Using the following connection properties as an example, add the connection properties for your environment and then Connect.

Name = kdchive
Hive Metastore = Thrift
Hive Metastore Uri = thrift://m2-dev.mip.storage.mycorp.net:9083
Hive Metastore Authentication Type=KERBEROS
Hive Metastore Service Principal=hive/_HOST@MYCORP.NET
Hive Metastore Client Principal=supergroup@MYCORP.NET
Hive Metastore Client Keytab=<Uploaded the keytab file for supergroup user>
Hive Hdfs Authentication Type=KERBEROS
Hive Hdfs Presto Principal=supergroup@MYCORP.NET
Hive Hdfs Presto Keytab=<Uploaded the keytab file for supergroup user>

HPE Ezmeral Unified Analytics Software 1.5 Documentation
Abstract	HPE Ezmeral Unified Analytics Software is a usage-based Software-as-a-Service (SaaS) model that operationalizes hybrid and multi-cloud modern analytical workloads through a simple user interface, easily installed and deployed in minutes. HPE Ezmeral Unified Analytics Software separates compute and storage for flexible, cost-efficient scalability to securely access data stored in multiple data platforms, enabling you to run traditional and advanced analytics workloads with open-source tools.
Published	July 2025
Edition	1.5.0
Topic last updated	2023-12-14