Configuring a Hive Data Source with Kerberos Authentication
Describes the required prerequisite steps to complete before you connect HPE Ezmeral Unified Analytics Software to a Hive data source that uses Kerberos authentication.
You can connect HPE Ezmeral Unified Analytics Software to a Hive data source that uses a Hive metastore and Kerberos for authentication. However, before you create the connection, manually complete the following steps:
Step 1 - Upload a krb5 configuration file to the shared location
Step 2 - Configure EzPresto to use the krb5.conf file
Step 3 - Connect HPE Ezmeral Unified Analytics Software to the Hive data source
Step 1 - Upload a krb5 configuration file to the shared location
The
krb5.conf
file contains Kerberos configuration information, including
the locations of the KDCs and admin servers for the Kerberos realms used in the Hive
configuration. To upload the krb5.conf
file to a shared location, complete
the following steps:- Sign in to HPE Ezmeral Unified Analytics Software.
- In the left navigation bar, go to Data Engineering > Data Sources > Data Volumes..
- Select the shared directory.
- Upload a
krb5.conf file
to the shared directory.TIPThe name of the file must bekrb5.conf
.
Step 2 - Configure EzPresto to use the krb5.conf file
- In the left navigation bar, go to Tools & Frameworks > Data Engineering > EzPresto.
- Click on the three dots and select Configure.
- In the window that appears, remove the entire
cmnConfigMaps
section and replace it with the following JVM properties:cmnConfigMaps: # Configmaps common to both Presto Master and Worker logConfig: log.properties: | # Enable verbose logging from Presto #com.facebook.presto=DEBUG prestoMst: cmnPrestoCoordinatorConfig: config.properties: | http-server.http.port={{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }} discovery.uri=http://{{ tpl .Values.ezsqlPresto.locatorService.fullname $ }}:{{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }} coordinator=true node-scheduler.include-coordinator=false discovery-server.enabled=true catalog.config-dir = {{ .Values.ezsqlPresto.stsDeployment.volumeMount.mountPathCatalog }} catalog.disabled-connectors-for-dynamic-operation=drill,parquet,csv,salesforce,sharepoint,prestodb,raptor,kudu,redis,accumulo,elasticsearch,redshift,localfile,bigquery,prometheus,mongodb,pinot,druid,cassandra,kafka,atop,presto-thrift,ampool,hive-cache,memory,blackhole,tpch,tpcds,system,example-http,jmx generic-cache-enabled=true transparent-cache-enabled=false generic-cache-catalog-name=cache generic-cache-change-detection-interval=300 catalog.config-dir.shared=true node.environment=production plugin.dir=/usr/lib/presto/plugin log.output-file=/data/presto/server.log log.levels-file=/usr/lib/presto/etc/log.properties query.max-history=1000 query.max-stage-count=1000 query.max-memory={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB query.max-total-memory={{ mulf 0.7 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB # query.max-memory-per-node={{ mulf 0.5 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB # query.max-total-memory-per-node={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB # memory.heap-headroom-per-node={{ mulf 0.3 ( tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . ) | floor }}MB experimental.spill-enabled=false experimental.spiller-spill-path=/tmp orm-database-url=jdbc:sqlite:/data/cache/metadata.db plugin.disabled-connectors=accumulo,atop,cassandra,example-http,kafka,kudu,localfile,memory,mongodb,pinot,presto-bigquery,prestodb,presto-druid,presto-elasticsearch,prometheus,raptor,redis,redshift log.max-size=100MB log.max-history=10 discovery.http-client.max-requests-queued-per-destination=10000 dynamic.http-client.max-requests-queued-per-destination=10000 event.http-client.max-requests-queued-per-destination=10000 exchange.http-client.max-requests-queued-per-destination=10000 failure-detector.http-client.max-requests-queued-per-destination=10000 memoryManager.http-client.max-requests-queued-per-destination=10000 node-manager.http-client.max-requests-queued-per-destination=10000 scheduler.http-client.max-requests-queued-per-destination=10000 workerInfo.http-client.max-requests-queued-per-destination=10000 jvmConfig: jvm.config: | -server -Xms{{ tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.minHeapSize . | floor }}M -Xmx{{ tpl .Values.ezsqlPresto.configMapProp.mst.jvmProp.maxHeapSize . | floor }}M -XX:-UseBiasedLocking -XX:+UseG1GC -XX:G1HeapRegionSize={{ .Values.ezsqlPresto.configMapProp.mst.jvmProp.G1HeapRegionSize }} -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:+UseGCOverheadLimit -XX:+ExitOnOutOfMemoryError -XX:ReservedCodeCacheSize={{ .Values.ezsqlPresto.configMapProp.mst.jvmProp.ReservedCodeCacheSize }} -XX:PerMethodRecompilationCutoff=10000 -XX:PerBytecodeRecompilationCutoff=10000 -Djdk.attach.allowAttachSelf=true -Djdk.nio.maxCachedBufferSize={{ .Values.ezsqlPresto.configMapProp.jvmProp.maxCachedBufferSize }} -Dcom.amazonaws.sdk.disableCertChecking=true -Djava.security.krb5.conf=/data/shared/krb5.conf prestoWrk: prestoWorkerConfig: config.properties: | coordinator=false http-server.http.port={{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }} discovery.uri=http://{{ tpl .Values.ezsqlPresto.locatorService.fullname $ }}:{{ tpl .Values.ezsqlPresto.locatorService.locatorSvcPort $ }} catalog.config-dir = {{ .Values.ezsqlPresto.stsDeployment.volumeMount.mountPathCatalog }} catalog.disabled-connectors-for-dynamic-operation=drill,parquet,csv,salesforce,sharepoint,prestodb,raptor,kudu,redis,accumulo,elasticsearch,redshift,localfile,bigquery,prometheus,mongodb,pinot,druid,cassandra,kafka,atop,presto-thrift,ampool,hive-cache,memory,blackhole,tpch,tpcds,system,example-http,jmx generic-cache-enabled=true transparent-cache-enabled=false generic-cache-catalog-name=cache catalog.config-dir.shared=true node.environment=production plugin.dir=/usr/lib/presto/plugin log.output-file=/data/presto/server.log log.levels-file=/usr/lib/presto/etc/log.properties query.max-memory={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB query.max-total-memory={{ mulf 0.7 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) ( .Values.ezsqlPresto.stsDeployment.wrk.replicaCount ) | floor }}MB query.max-memory-per-node={{ mulf 0.5 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB query.max-total-memory-per-node={{ mulf 0.6 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB memory.heap-headroom-per-node={{ mulf 0.2 ( tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . ) | floor }}MB experimental.spill-enabled=false experimental.spiller-spill-path=/tmp orm-database-url=jdbc:sqlite:/data/cache/metadata.db plugin.disabled-connectors=accumulo,atop,cassandra,example-http,kafka,kudu,localfile,memory,mongodb,pinot,presto-bigquery,prestodb,presto-druid,presto-elasticsearch,prometheus,raptor,redis,redshift log.max-size=100MB log.max-history=10 discovery.http-client.max-requests-queued-per-destination=10000 event.http-client.max-requests-queued-per-destination=10000 exchange.http-client.max-requests-queued-per-destination=10000 node-manager.http-client.max-requests-queued-per-destination=10000 workerInfo.http-client.max-requests-queued-per-destination=10000 jvmConfig: jvm.config: | -server -Xms{{ tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.minHeapSize . | floor }}M -Xmx{{ tpl .Values.ezsqlPresto.configMapProp.wrk.jvmProp.maxHeapSize . | floor }}M -XX:-UseBiasedLocking -XX:+UseG1GC -XX:G1HeapRegionSize={{ .Values.ezsqlPresto.configMapProp.wrk.jvmProp.G1HeapRegionSize }} -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:+UseGCOverheadLimit -XX:+ExitOnOutOfMemoryError -XX:ReservedCodeCacheSize={{ .Values.ezsqlPresto.configMapProp.wrk.jvmProp.ReservedCodeCacheSize }} -XX:PerMethodRecompilationCutoff=10000 -XX:PerBytecodeRecompilationCutoff=10000 -Djdk.attach.allowAttachSelf=true -Djdk.nio.maxCachedBufferSize={{ .Values.ezsqlPresto.configMapProp.jvmProp.maxCachedBufferSize }} -Dcom.amazonaws.sdk.disableCertChecking=true -Djava.security.krb5.conf=/data/shared/krb5.conf ### values_cmn_configmap.yaml contents END
- Click Configure. This updates the configuration on each of the presto pods and restarts the pods. This operation can take a few minutes.
Step 3 - Connect HPE Ezmeral Unified Analytics Software to the Hive data source
- In the left navigation bar, go to Data Engineering > Data Sources.
- Click Add New Data Source.
- In the Hive tile, click Create Connection.
- Using the following connection properties as an example, add the connection properties
for your environment and then Connect.
Name = kdchive Hive Metastore = Thrift Hive Metastore Uri = thrift://m2-dev.mip.storage.mycorp.net:9083 Hive Metastore Authentication Type=KERBEROS Hive Metastore Service Principal=hive/_HOST@MYCORP.NET Hive Metastore Client Principal=supergroup@MYCORP.NET Hive Metastore Client Keytab=<Uploaded the keytab file for supergroup user> Hive Hdfs Authentication Type=KERBEROS Hive Hdfs Presto Principal=supergroup@MYCORP.NET Hive Hdfs Presto Keytab=<Uploaded the keytab file for supergroup user>