HDFS DataTap Cross-Realm Kerberos Authentication
Cross-realm Kerberos authentication allows the users of one Kerberos realm to access
services that reside inside a different Kerberos realm. To do this, both realms must
share a key for the same principal, and both keys must share the same version number.
For example, to allow a user in REALM_A
to access REALM_B
, then both realms must share a key for a principal named krbtgt/REALM_B@REALM_A
. This key is unidirectional; for a user in REALM_B
to access REALM_A
, both realms must share
a key for krbtgt/REALM_A@REALM_B
.
Most of the responsibilities of the remote KDC server can be offloaded to a local KDC
that Kerberizes the compute clusters within a tenant, while the DataTap uses a
KDC server specific to the enterprise datalake. The users of the cluster
come from the existing enterprise central KDC. Assuming that the enterprise has a
network DNS name of ENTERPRISE.COM
, the three Kerberos realms could
be named as follows:
- KDC Realm CORP.ENTERPRISE.COM: This central KDC realm manages the
users who run jobs in the Hadoop compute clusters. For example,
user@CORP.ENTERPRISE.COM
. - KDC Realm CP.ENTERPRISE.COM: This local KDC realm Kerberizes the Hadoop compute clusters.
- KDC Realm: DATALAKE.ENTERPRISE.COM: This KDC Kerberizes the remote HDFS file
system accessed via DataTap. For example,
dtap://remotedata
.
In this example, the user user@CORP.ENTERPRISE.COM
can run jobs in the
compute cluster that belongs to the CP.ENTERPRISE.COM
realm, and jobs
can access data residing in dtap://remotedata
in the DATALAKE.ENTERPRISE.COM
realm. This scenario requires a one-way Kerberos trust
relationship between realms CORP.ENTERPRISE.COM
and DATALAKE.ENTERPRISE.COM
and CP.ENTERPRISE.COM
, as well as a
one-way trust relationship between realm DATALAKE.ENTERPRISE.COM
and
realm CP.ENTERPRISE.COM
. More specifically:
- CP.ENTERPRISE.COM trusts CORP.ENTERPRISE.COM: The user
user@CORP.ENTERPRISE.COM
needs to be able to access services within the compute cluster in order to perform tasks such as submitting jobs to the compute cluster YARN Resource Manager and writing the job history to the local HDFS. - DATALAKE.ENTERPRISE.COM trusts CORP.ENTERPRISE.COM: The user
user@CORP.ENTERPRISE.COM
needs to be able to accessdtap://remotedata/
from the compute cluster. - DATALAKE.ENTERPRISE.COM trusts CP.ENTERPRISE.COM: When the user
user@CORP.ENTERPRISE.COM
accesses dtap://remotedata/ from the compute cluster to run jobs, theYARN/rm
user of the compute cluster (CP.ENTERPRISE.COM
) also needs to be able to accessdtap://remotedata/
to get partition information and to renew the HDFS delegation token.
This article describes the following:
- Access: See Accessing a Passthrough DataTap with Cross-Realm Authentication.
- Authentication: See One-Way Cross-Realm Authentication.
- Using Ambari to configure cross-realm authentication: See Using Ambari to Configure /etc/krb5.conf.
- Troubleshooting: See Debugging.
Accessing a Passthrough DataTap with Cross-Realm Authentication
This diagram displays the high-level authentication flow for accessing a passthrough DataTap with cross-realm authentication:
In this example, the user user@CORP.ENTERPRISE.COM
submits a job
to the cluster that is Kerberized by the KDC realm CP.ENTERPRISE.COM
and accesses data stored on a remote HDFS file system
Kerberized by the KDC realm DATALAKE.ENTERPRISE.COM
using the
following flow (numbers correspond to the callouts in the preceding diagram):
- The user
user@CORP.ENTERPRISE.COM
wants to send a job to a service in KDC realmCP.ENTERPRISE.COM
. This realm is different from the one that the user belongs to. Normally, this is not allowed. However, since there is a trust relationship where realmCP.ENTERPRISE.COM
trusts realmCORP.ENTERPRISE.COM
, the useruser@CORP.ENTERPRISE.COM
is able to request a temporary service ticket from its home realm (CORP.ENTERPRISE.COM
) that will be valid when submitted to the TGS (Ticket Granting Service) of the foreign realm,CP.ENTERPRISE.COM
. - The user
user@CORP.ENTERPRISE.COM
submits the temporary service ticket issued by theCORP.ENTERPRISE.COM
realm to the Ticket Granting Service (TGS) of theCP.ENTERPRISE.COM
realm. - The user
user@CORP.ENTERPRISE.COM
then submits this service ticket to the YARN service in the HDP compute cluster in order to run the job. - When the job that user
user@CORP.ENTERPRISE.COM
submitted needs to get data from the remote HDFS, the DataTap forwards the user's TGT to the deployment CNODE service. The CNODE service finds that the realm of the TGT foruser@CORP.ENTERPRISER.COM
isCORP.ENTERPRISE.COM
and not the same as KDC realmDATALAKE.ENTERPRISE.COM
, which is the one used by the HDFS file system configured for the DataTap. - The CNODE service obtains a temporary service ticket from the
CORP.ENTERPRISE.COM
KDC server. - Since there is a trust relationship where realm
DATALAKE.ENTERPRISE.COM
trusts realmCORP.ENTERPRISE.COM
, the CNODE service then obtains a service ticket from realmDATALAKE.ENTERPRISE.COMCORP.ENTERPRISE.COM
server. - The CNODE service uses the
DATALAKE.ENTERRPRISE.COM
service ticket to authenticate with the NameNode service of the remote HDFS file system and access the data as useruser@CORP.ENTERPRISE.COM
. - While running the job submitted by
user@CORP.ENTERPRISE.COM
to the cluster, the YARN Resource Manager service will need to access the remote HDFS file system in order to get partition information. The Resource Manager service runs with principalrm@CP.ENTERPRISE.COM
. In order to access the remote HDFS, it will need to obtain a Ticket-Granting Ticket (TGT) from the localCP.ENTERPRISE.COM
KDC server. - The user
rm@CP.ENTERPRIE.COM
then accesses the DataTap. The DataTap forwards the user's TGT to the deployment CNODE service. The CNODE service finds that the realm of the TGT forrm@CP.ENTERPRISE.COM
is not the same as KDC realmDATALAKE.ENTERPRISE.COM
, which is the one used by the HDFS service configured for the DataTap. - The CNODE service obtains a temporary service ticket from the
CP.ENTERPRISE.COM
KDC server. - Since there is a trust relationship where realm
DATALAKE.ENTERPRISE.COM
trusts realmCP.ENTERPRISE.COM
, the CNODE service then obtains a service ticket from realmDATALAKE.ENTERPRIE.COM
, the KDC protecting access to the remote HDFS file system based on the temporary service ticket that was issued by the realmCP.ENTERPRISE.COM
server. - The CNODE service uses the service ticket to authenticate with the NameNode
Service of the remote HDFS file system and access the data as user
rm@CP.ENTERPRISE.COM
. - When the Resource Manager it does not use the CNODE service when it needs to
renew an HDFS delegation token. Instead, the
rm@CP.ENTERPRISE.COM
user requests a temporary service ticket for theDATALAKE.ENTERPRISE.COM
realm from the local (CP.ENTERPRISE.COM
) KDC server. Since there is a trust relationship between realmsCP.ENTERPRISE.COM
andDATALAKE.ENTERPRISE.COM
, theCP.ENTERPRISE.COM
KDC server is able to issue the temporary service ticket. - The
rm@CP.ENTERPRISE.COM
user submits the temporary service ticket to the KDC server of theDATALAKE.ENTERPRISE.COM
realm and gets a service ticket for the NameNode service of the remote HDFS file system. - The
rm@CP.ENTERPRISE.COM
user can then use the service ticket to renew the HDFS delegation token with the NameNode service of the remote HDFS.
One-Way Cross-Realm Authentication
Allowing the user user@CORP.ENTERPRISE.COM
to run jobs on the
Hadoop compute cluster requires configuring one-way cross-realm authentication
between the realms CORP.ENTERPRISE.COM
and CP.ENTERPRISE.COM
. Further, allowing the user user@CORP.ENTERPRISER.COM
to use the DataTap dtap://remotedata
within the Hadoop compute cluster requires configuring
one-way cross-realm authentication between the realms CORP.ENTERPRISE.COM
and DATALAKE.ENTERPRISE.COM
, and
between the realms CP.ENTERPRISE.COM
and DATALAKE.ENTERPRISE.COM
. In other words, the realm DATALAKE.ENTERPRISE.COM
trusts the realms CP.ENTERPRISE.COM
and CORP.ENTERPRISE.COM
, and the realm
CP.ENTERPRISE.COM
trusts realm CORP.ENTERPRISE.COM
.
To enable these one-way cross-realm trust relationships, you will need to configure the following:
- KDC: See Step 1: KDC Configuration.
- Host: See Step 2: Host Configuration.
- DataTap: See Step 3: Remote DataTap Configuration.
- Cluster: See Step 4: Hadoop Compute Cluster Configuration.
See Using Ambari to Configure /etc/krb5.conf for information on using the Ambari interface to configure cross-realm authentication.
Step 1: KDC Configuration
Configure the KDCs as follows:
-
On the KDC server for realm DATALAKE.ENTERPRISE.COM add the following two principals:
krbtgt/DATALAKE.ENTERPRISE.COM@CORP.ENTERPRISE.COM krbtgt/DATALAKE.ENTERPRISE.COM@CP.ENTERPRISE.COM
-
On the KDC server for realm CP.ENTERPRISE.COM, add the following two principals:
krbtgt/DATALAKE.ENTERPRISE.COM@CP.ENTERPRISE.COM krbtgt/CP.ENTERPRISE.COM@CORP.ENTERPRISE.COM
-
On the KDC server for realm CORP.ENTERPRISE.COM, add the following two principals:
krbtgt/DATALAKE.ENTERPRISE.COM@CORP.ENTERPRISE.COM krbtgt/CP.ENTERPRISE.COM@CORP.ENTERPRISE.COM
Step 2: Host Configuration
On the host(s) where the CNODE service is running, modify the [realms]
and [domain_realm]
sections of the /etc/bluedata/krb5.conf
file to add the CORP.ENTERPRISE.COM
, CP.ENTERPRISE.COMDATALAKE.ENTERPRISE.COM
realms. For example:
[root@yav-028 ~]# !cat cat /etc/bluedata/krb5.conf [logging] default = FILE:/var/log/krb5/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] default_realm = CP.ENTERPRISE.COM dns_loookup_realm = false dns_lookup_kdc = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true [realms] CP.ENTERPRISE.COM = { kdc = kerberos.cp.enterprise.com } CORP.ENTERPRISE.COM = { kdc = kerberos.corp.enterprise.com } DATALAKE.ENTERPRISE.COM = { kdc = kerberos.datalake.enterprise.com } [domain_realm] .cp.enterprise.com = CP.ENTERPRISE.COM .datalake.enterprise.com = DATALAKE.ENTERPRISE.COM .corp.enterprise.com = CORP.ENTERPRISE.COM [root@yav-28 ~]#
Step 3: Remote DataTap Configuration
On the remote HDFS NameNode service pointed to by the DataTap dtap://remotedata/
, append the auth_to_local
on the
Hadoop cluster as follows:
RULE:[1:$1@$0](ambari-qa-hdp@epic.ENTERPRISE.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hbase-hdp@epic.ENTERPRISE.COM)s/.*/hbase/
RULE:[1:$1@$0](hdfs-hdp@epic.ENTERPRISE.COM)s/.*/hdfs/
RULE:[1:$1@$0](.*@CP.ENTERPRISE.COM)s/@.*//
RULE:[2:$1@$0](dn@CP.ENTERPRISE.COM)s/.*/hdfs/
RULE:[2:$1@$0](hbase@CP.ENTERPRISE.COM)s/.*/hbase/
RULE:[2:$1@$0](hive@CP.ENTERPRISE.COM)s/.*/hive/
RULE:[2:$1@$0](jhs@CP.ENTERPRISE.COM)s/.*/mapred/
RULE:[2:$1@$0](nm@CP.ENTERPRISE.COM)s/.*/yarn/
RULE:[2:$1@$0](nn@CP.ENTERPRISE.COM)s/.*/hdfs/
RULE:[2:$1@$0](rm@CP.ENTERPRISE.COM)s/.*/yarn/
RULE:[2:$1@$0](yarn@CP.ENTERPRISE.COM)s/.*/yarn/
RULE:[1:$1@$0](.*@CORP.ENTERPRISE.COM)s/@.*//
DEFAULT
Step 4: Hadoop Compute Cluster Configuration
To configure the Hadoop compute cluster:
- Modify the
/etc/krb5.conf
file on each of the virtual nodes, as follows:[realms] CP.ENTERPRISE.COM = { kdc = kerberos.cp.enterprise.com } CORP.ENTERPRISE.COM = { kdc = kerberos.corp.enterprise.com } DATALAKE.ENTERPRISE.COM = { kdc=kerberos.datalake.enterprise.com } [domain_realm] .cp.enterprise.com = CP.ENTERPRISE.COM .datalake.enterprise.com = DATALAKE.ENTERPRISE.COM .corp.enterprise.com = CORP.ENTERPRISE.COM
-
Users in the realm
CORP.ENTERPRISE.COM
also need access to the HDFS file system in the Hadoop compute cluster. Enable this by adding the following rule to thehadoop.security.auth_to_local
configuration file:RULE:[1:$1@$0](.*@CORP.ENTERPRISE.COM)s/@.*//
- Restart the Hadoop services once you have finished making these changes. Do not
restart the Kerberos service, because Ambari will overwrite the modified
/etc/krb5.conf
file with the original version when it finds a mismatch.
Using Ambari to Configure /etc/krb5.conf
You may modify the /etc/krb5.conf
file using the Ambari interface
by selecting Admin>Kerberos>Configs. The advantage of using Ambari to
modify the /etc/krb5.conf
file is that you can freely restart all
services.
Here, the Domains field is used to map server host names to the name of the Kerberos realm:
The krb5-conf template field allows you to append additional server host names to the realm name mapping:
You may also append additional realm/KDC declarations in the krb5-conf template field.
Debugging
If a failure occurs while trying to access the remote HDFS storage resource using the DataTap, you may try accessing the namenode of the remote HDFS storage resource directly.
You may view DataTap configuration using the Edit DataTap screen, as described in Editing an Existing DataTap.
This image shows a sample central dtap://remotedata/
configuration:
To test the configuration, log into any node in the Hadoop compute cluster and
execute the kinit
command to create a KDC session. The user you
are logged in as must be able to be authenticated against either the CORP.ENTERPRISE.COM
or the CP.ENTERPRISE.COM
KDC realms.
Once the kinit
completes successfully, you should be able to
access the namenode of the remote HDFS storage resource directly, without involving
the deployment CNODE service, by executing the command bluedata-1 ~]$
hdfs dfs -ls hdfs://hdfs.datalake.enterprise.com/
.
- If this command completes successfully, then test accessing the namenode of the
remote HDFS file system via the deployment CNODE service and DataTap by
executing the command
bluedata-1 ~]$ dfs -ls dtap://remotedata/.
- If either of these commands fails, then there is an error in the KDC/HDP/HDFS configuration that must be resolved.
The following commands enable HDFS client debugging. Execute these commands before
executing the hdfs dfs -ls
command in order to log additional
output:
export HADOOP_ROOT_LOGGER=DEBUG,console
export HADOOP_OPTS="-Dsun.security.krb5.debug=true -
Djavax.net.debug=ssl"