Configuring a Hive Data Source with Kerberos Authentication

Describes the required prerequisite steps to complete before you connect HPE Ezmeral Unified Analytics Software to a Hive data source that uses Kerberos authentication.

You can connect HPE Ezmeral Unified Analytics Software to a Hive data source that uses a Hive metastore and Kerberos for authentication. However, before you create the connection, manually complete the following steps:

Step 1 - Upload a krb5 configuration file to the shared location

Step 2 - Configure EzPresto to use the krb5.conf file

Step 3 - Connect HPE Ezmeral Unified Analytics Software to the Hive data source

Step 1 - Upload a krb5 configuration file to the shared location

The krb5.conf file contains Kerberos configuration information, including the locations of the KDCs and admin servers for the Kerberos realms used in the Hive configuration. To upload the krb5.conf file to a shared location, complete the following steps:
  1. Sign in to HPE Ezmeral Unified Analytics Software.
  2. In the left navigation bar, go to Data Engineering > Data Sources > Data Volumes..
  3. Select the shared directory.
  4. Upload a krb5.conf file to the shared directory.
    TIP
    The name of the file must be krb5.conf.

Step 2 - Configure EzPresto to use the krb5.conf file

  1. In the left navigation bar, go to Tools & Frameworks > Data Engineering > EzPresto.
  2. Click on the three dots and select Configure.
  3. In the window that appears, remove the entire cmnConfigMaps section and replace it with the following JVM properties:
    cmnConfigMaps:
      # Configmaps common to both Presto Master and Worker
      logConfig:
        log.properties: |
          # Enable verbose logging from Presto
          #com.facebook.presto=DEBUG
      prestoMst:
        cmnPrestoCoordinatorConfig:
          config.properties: |
            http-server.http.port={{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }}
            discovery.uri=http://{{ tpl .Values.ezsqlPresto.locatorService.fullname $ }}:{{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }}
            coordinator=true
            node-scheduler.include-coordinator=false
            discovery-server.enabled=true
            catalog.config-dir = {{ .Values.ezsqlPresto.stsDeployment.volumeMount.mountPathCatalog }}
            catalog.disabled-connectors-for-dynamic-operation=drill,parquet,csv,salesforce,sharepoint,prestodb,raptor,kudu,redis,accumulo,elasticsearch,redshift,localfile,bigquery,prometheus,mongodb,pinot,druid,cassandra,kafka,atop,presto-thrift,ampool,hive-cache,memory,blackhole,tpch,tpcds,system,example-http,jmx
            generic-cache-enabled=true
            transparent-cache-enabled=false
            generic-cache-catalog-name=cache
            generic-cache-change-detection-interval=300
            catalog.config-dir.shared=true
            node.environment=production
            plugin.dir=/usr/lib/presto/plugin
            log.output-file=/data/presto/server.log
            log.levels-file=/usr/lib/presto/etc/log.properties
            query.max-history=1000
            query.max-stage-count=1000
            query.max-memory={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB
            query.max-total-memory={{ mulf 0.7 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB
            # query.max-memory-per-node={{ mulf 0.5 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB
            # query.max-total-memory-per-node={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB
            # memory.heap-headroom-per-node={{ mulf 0.3 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB
            experimental.spill-enabled=false
            experimental.spiller-spill-path=/tmp
            orm-database-url=jdbc:sqlite:/data/cache/metadata.db
            plugin.disabled-connectors=accumulo,atop,cassandra,example-http,kafka,kudu,localfile,memory,mongodb,pinot,presto-bigquery,prestodb,presto-druid,presto-elasticsearch,prometheus,raptor,redis,redshift
            log.max-size=100MB
            log.max-history=10
            discovery.http-client.max-requests-queued-per-destination=10000
            dynamic.http-client.max-requests-queued-per-destination=10000
            event.http-client.max-requests-queued-per-destination=10000
            exchange.http-client.max-requests-queued-per-destination=10000
            failure-detector.http-client.max-requests-queued-per-destination=10000
            memoryManager.http-client.max-requests-queued-per-destination=10000
            node-manager.http-client.max-requests-queued-per-destination=10000
            scheduler.http-client.max-requests-queued-per-destination=10000
            workerInfo.http-client.max-requests-queued-per-destination=10000
        jvmConfig:
          jvm.config: |
            -server
            -Xms{{ tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.minHeapSize . | floor }}M
            -Xmx{{ tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . | floor }}M
            -XX:-UseBiasedLocking
            -XX:+UseG1GC
            -XX:G1HeapRegionSize={{ .Values.ezsqlPresto.configMapProp.mst.jvmProp.G1HeapRegionSize }}
            -XX:+ExplicitGCInvokesConcurrent
            -XX:+HeapDumpOnOutOfMemoryError
            -XX:+UseGCOverheadLimit
            -XX:+ExitOnOutOfMemoryError
            -XX:ReservedCodeCacheSize={{ .Values.ezsqlPresto.configMapProp.mst.jvmProp.ReservedCodeCacheSize }}
            -XX:PerMethodRecompilationCutoff=10000
            -XX:PerBytecodeRecompilationCutoff=10000
            -Djdk.attach.allowAttachSelf=true
            -Djdk.nio.maxCachedBufferSize={{ .Values.ezsqlPresto.configMapProp.jvmProp.maxCachedBufferSize }}
            -Dcom.amazonaws.sdk.disableCertChecking=true
            -Djava.security.krb5.conf=/data/shared/krb5.conf
      prestoWrk:
        prestoWorkerConfig:
          config.properties: |
            coordinator=false
            http-server.http.port={{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }}
            discovery.uri=http://{{ tpl .Values.ezsqlPresto.locatorService.fullname $ }}:{{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }}
            catalog.config-dir = {{ .Values.ezsqlPresto.stsDeployment.volumeMount.mountPathCatalog }}
            catalog.disabled-connectors-for-dynamic-operation=drill,parquet,csv,salesforce,sharepoint,prestodb,raptor,kudu,redis,accumulo,elasticsearch,redshift,localfile,bigquery,prometheus,mongodb,pinot,druid,cassandra,kafka,atop,presto-thrift,ampool,hive-cache,memory,blackhole,tpch,tpcds,system,example-http,jmx
            generic-cache-enabled=true
            transparent-cache-enabled=false
            generic-cache-catalog-name=cache
            catalog.config-dir.shared=true
            node.environment=production
            plugin.dir=/usr/lib/presto/plugin
            log.output-file=/data/presto/server.log
            log.levels-file=/usr/lib/presto/etc/log.properties
            query.max-memory={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB
            query.max-total-memory={{ mulf 0.7 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB
            query.max-memory-per-node={{ mulf 0.5 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB
            query.max-total-memory-per-node={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB
            memory.heap-headroom-per-node={{ mulf 0.2 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB
            experimental.spill-enabled=false
            experimental.spiller-spill-path=/tmp
            orm-database-url=jdbc:sqlite:/data/cache/metadata.db
            plugin.disabled-connectors=accumulo,atop,cassandra,example-http,kafka,kudu,localfile,memory,mongodb,pinot,presto-bigquery,prestodb,presto-druid,presto-elasticsearch,prometheus,raptor,redis,redshift
            log.max-size=100MB
            log.max-history=10
            discovery.http-client.max-requests-queued-per-destination=10000
            event.http-client.max-requests-queued-per-destination=10000
            exchange.http-client.max-requests-queued-per-destination=10000
            node-manager.http-client.max-requests-queued-per-destination=10000
            workerInfo.http-client.max-requests-queued-per-destination=10000
        jvmConfig:
          jvm.config: |
            -server
            -Xms{{ tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.minHeapSize . | floor }}M
            -Xmx{{ tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . | floor }}M
            -XX:-UseBiasedLocking
            -XX:+UseG1GC
            -XX:G1HeapRegionSize={{ .Values.ezsqlPresto.configMapProp.wrk.jvmProp.G1HeapRegionSize }}
            -XX:+ExplicitGCInvokesConcurrent
            -XX:+HeapDumpOnOutOfMemoryError
            -XX:+UseGCOverheadLimit
            -XX:+ExitOnOutOfMemoryError
            -XX:ReservedCodeCacheSize={{ .Values.ezsqlPresto.configMapProp.wrk.jvmProp.ReservedCodeCacheSize }}
            -XX:PerMethodRecompilationCutoff=10000
            -XX:PerBytecodeRecompilationCutoff=10000
            -Djdk.attach.allowAttachSelf=true
            -Djdk.nio.maxCachedBufferSize={{ .Values.ezsqlPresto.configMapProp.jvmProp.maxCachedBufferSize }}
            -Dcom.amazonaws.sdk.disableCertChecking=true
            -Djava.security.krb5.conf=/data/shared/krb5.conf
    ### values_cmn_configmap.yaml contents END
  4. Click Configure. This updates the configuration on each of the presto pods and restarts the pods. This operation can take a few minutes.

Step 3 - Connect HPE Ezmeral Unified Analytics Software to the Hive data source

  1. In the left navigation bar, go to Data Engineering > Data Sources.
  2. Click Add New Data Source.
  3. In the Hive tile, click Create Connection.
  4. Using the following connection properties as an example, add the connection properties for your environment and then Connect.
    Name = kdchive
    Hive Metastore = Thrift
    Hive Metastore Uri = thrift://m2-dev.mip.storage.mycorp.net:9083
    Hive Metastore Authentication Type=KERBEROS
    Hive Metastore Service Principal=hive/_HOST@MYCORP.NET
    Hive Metastore Client Principal=supergroup@MYCORP.NET
    Hive Metastore Client Keytab=<Uploaded the keytab file for supergroup user>
    Hive Hdfs Authentication Type=KERBEROS
    Hive Hdfs Presto Principal=supergroup@MYCORP.NET
    Hive Hdfs Presto Keytab=<Uploaded the keytab file for supergroup user>