HDFS DataTap Cross-Realm Kerberos Authentication

NOTE This article only applies to HDFS DataTaps.

Cross-realm Kerberos authentication allows the users of one Kerberos realm to access services that reside inside a different Kerberos realm. To do this, both realms must share a key for the same principal, and both keys must share the same version number. For example, to allow a user in REALM_A to access REALM_B, then both realms must share a key for a principal named krbtgt/REALM_B@REALM_A. This key is unidirectional; for a user in REALM_B to access REALM_A, both realms must share a key for krbtgt/REALM_A@REALM_B.

Most of the responsibilities of the remote KDC server can be offloaded to a local KDC that Kerberizes the compute clusters within a tenant, while the DataTap uses a KDC server specific to the enterprise datalake. The users of the cluster come from the existing enterprise central KDC. Assuming that the enterprise has a network DNS name of ENTERPRISE.COM, the three Kerberos realms could be named as follows:

  • KDC Realm CORP.ENTERPRISE.COM: This central KDC realm manages the users who run jobs in the Hadoop compute clusters. For example, user@CORP.ENTERPRISE.COM.
  • KDC Realm CP.ENTERPRISE.COM: This local KDC realm Kerberizes the Hadoop compute clusters.
  • KDC Realm: DATALAKE.ENTERPRISE.COM: This KDC Kerberizes the remote HDFS file system accessed via DataTap. For example, dtap://remotedata.

In this example, the user user@CORP.ENTERPRISE.COM can run jobs in the compute cluster that belongs to the CP.ENTERPRISE.COM realm, and jobs can access data residing in dtap://remotedata in the DATALAKE.ENTERPRISE.COM realm. This scenario requires a one-way Kerberos trust relationship between realms CORP.ENTERPRISE.COM and DATALAKE.ENTERPRISE.COM and CP.ENTERPRISE.COM, as well as a one-way trust relationship between realm DATALAKE.ENTERPRISE.COM and realm CP.ENTERPRISE.COM. More specifically:

  • CP.ENTERPRISE.COM trusts CORP.ENTERPRISE.COM: The user user@CORP.ENTERPRISE.COM needs to be able to access services within the compute cluster in order to perform tasks such as submitting jobs to the compute cluster YARN Resource Manager and writing the job history to the local HDFS.
  • DATALAKE.ENTERPRISE.COM trusts CORP.ENTERPRISE.COM: The user user@CORP.ENTERPRISE.COM needs to be able to access dtap://remotedata/ from the compute cluster.
  • DATALAKE.ENTERPRISE.COM trusts CP.ENTERPRISE.COM: When the user user@CORP.ENTERPRISE.COM accesses dtap://remotedata/ from the compute cluster to run jobs, the YARN/rm user of the compute cluster (CP.ENTERPRISE.COM) also needs to be able to access dtap://remotedata/ to get partition information and to renew the HDFS delegation token.

This article describes the following:

Accessing a Passthrough DataTap with Cross-Realm Authentication

This diagram displays the high-level authentication flow for accessing a passthrough DataTap with cross-realm authentication:



In this example, the user user@CORP.ENTERPRISE.COM submits a job to the cluster that is Kerberized by the KDC realm CP.ENTERPRISE.COM and accesses data stored on a remote HDFS file system Kerberized by the KDC realm DATALAKE.ENTERPRISE.COM using the following flow (numbers correspond to the callouts in the preceding diagram):

  1. The user user@CORP.ENTERPRISE.COM wants to send a job to a service in KDC realm CP.ENTERPRISE.COM. This realm is different from the one that the user belongs to. Normally, this is not allowed. However, since there is a trust relationship where realm CP.ENTERPRISE.COM trusts realm CORP.ENTERPRISE.COM, the user user@CORP.ENTERPRISE.COM is able to request a temporary service ticket from its home realm (CORP.ENTERPRISE.COM) that will be valid when submitted to the TGS (Ticket Granting Service) of the foreign realm, CP.ENTERPRISE.COM.
  2. The user user@CORP.ENTERPRISE.COM submits the temporary service ticket issued by the CORP.ENTERPRISE.COM realm to the Ticket Granting Service (TGS) of the CP.ENTERPRISE.COM realm.
  3. The user user@CORP.ENTERPRISE.COM then submits this service ticket to the YARN service in the HDP compute cluster in order to run the job.
  4. When the job that user user@CORP.ENTERPRISE.COM submitted needs to get data from the remote HDFS, the DataTap forwards the user's TGT to the deployment CNODE service. The CNODE service finds that the realm of the TGT for user@CORP.ENTERPRISER.COM is CORP.ENTERPRISE.COM and not the same as KDC realm DATALAKE.ENTERPRISE.COM, which is the one used by the HDFS file system configured for the DataTap.
  5. The CNODE service obtains a temporary service ticket from the CORP.ENTERPRISE.COM KDC server.
  6. Since there is a trust relationship where realm DATALAKE.ENTERPRISE.COM trusts realm CORP.ENTERPRISE.COM, the CNODE service then obtains a service ticket from realm DATALAKE.ENTERPRISE.COMCORP.ENTERPRISE.COM server.
  7. The CNODE service uses the DATALAKE.ENTERRPRISE.COM service ticket to authenticate with the NameNode service of the remote HDFS file system and access the data as user user@CORP.ENTERPRISE.COM.
  8. While running the job submitted by user@CORP.ENTERPRISE.COM to the cluster, the YARN Resource Manager service will need to access the remote HDFS file system in order to get partition information. The Resource Manager service runs with principal rm@CP.ENTERPRISE.COM. In order to access the remote HDFS, it will need to obtain a Ticket-Granting Ticket (TGT) from the local CP.ENTERPRISE.COM KDC server.
  9. The user rm@CP.ENTERPRIE.COM then accesses the DataTap. The DataTap forwards the user's TGT to the deployment CNODE service. The CNODE service finds that the realm of the TGT for rm@CP.ENTERPRISE.COM is not the same as KDC realm DATALAKE.ENTERPRISE.COM, which is the one used by the HDFS service configured for the DataTap.
  10. The CNODE service obtains a temporary service ticket from the CP.ENTERPRISE.COM KDC server.
  11. Since there is a trust relationship where realm DATALAKE.ENTERPRISE.COM trusts realm CP.ENTERPRISE.COM, the CNODE service then obtains a service ticket from realm DATALAKE.ENTERPRIE.COM, the KDC protecting access to the remote HDFS file system based on the temporary service ticket that was issued by the realm CP.ENTERPRISE.COM server.
  12. The CNODE service uses the service ticket to authenticate with the NameNode Service of the remote HDFS file system and access the data as user rm@CP.ENTERPRISE.COM.
  13. When the Resource Manager it does not use the CNODE service when it needs to renew an HDFS delegation token. Instead, the rm@CP.ENTERPRISE.COM user requests a temporary service ticket for the DATALAKE.ENTERPRISE.COM realm from the local (CP.ENTERPRISE.COM) KDC server. Since there is a trust relationship between realms CP.ENTERPRISE.COM and DATALAKE.ENTERPRISE.COM, the CP.ENTERPRISE.COM KDC server is able to issue the temporary service ticket.
  14. The rm@CP.ENTERPRISE.COM user submits the temporary service ticket to the KDC server of the DATALAKE.ENTERPRISE.COM realm and gets a service ticket for the NameNode service of the remote HDFS file system.
  15. The rm@CP.ENTERPRISE.COM user can then use the service ticket to renew the HDFS delegation token with the NameNode service of the remote HDFS.

One-Way Cross-Realm Authentication

Allowing the user user@CORP.ENTERPRISE.COM to run jobs on the Hadoop compute cluster requires configuring one-way cross-realm authentication between the realms CORP.ENTERPRISE.COM and CP.ENTERPRISE.COM. Further, allowing the user user@CORP.ENTERPRISER.COM to use the DataTap dtap://remotedata within the Hadoop compute cluster requires configuring one-way cross-realm authentication between the realms CORP.ENTERPRISE.COM and DATALAKE.ENTERPRISE.COM, and between the realms CP.ENTERPRISE.COM and DATALAKE.ENTERPRISE.COM. In other words, the realm DATALAKE.ENTERPRISE.COM trusts the realms CP.ENTERPRISE.COM and CORP.ENTERPRISE.COM, and the realm CP.ENTERPRISE.COM trusts realm CORP.ENTERPRISE.COM.

To enable these one-way cross-realm trust relationships, you will need to configure the following:

See Using Ambari to Configure /etc/krb5.conf for information on using the Ambari interface to configure cross-realm authentication.

Step 1: KDC Configuration

Configure the KDCs as follows:

  1. On the KDC server for realm DATALAKE.ENTERPRISE.COM add the following two principals:

    krbtgt/DATALAKE.ENTERPRISE.COM@CORP.ENTERPRISE.COM
            krbtgt/DATALAKE.ENTERPRISE.COM@CP.ENTERPRISE.COM
  2. On the KDC server for realm CP.ENTERPRISE.COM, add the following two principals:

    krbtgt/DATALAKE.ENTERPRISE.COM@CP.ENTERPRISE.COM
            krbtgt/CP.ENTERPRISE.COM@CORP.ENTERPRISE.COM
  3. On the KDC server for realm CORP.ENTERPRISE.COM, add the following two principals:

    krbtgt/DATALAKE.ENTERPRISE.COM@CORP.ENTERPRISE.COM
            krbtgt/CP.ENTERPRISE.COM@CORP.ENTERPRISE.COM

Step 2: Host Configuration

On the host(s) where the CNODE service is running, modify the [realms] and [domain_realm] sections of the /etc/bluedata/krb5.conf file to add the CORP.ENTERPRISE.COM, CP.ENTERPRISE.COMDATALAKE.ENTERPRISE.COM realms. For example:

                        [root@yav-028 ~]# !cat
                        cat /etc/bluedata/krb5.conf
                        [logging]
                          default = FILE:/var/log/krb5/krb5libs.log
                          kdc = FILE:/var/log/krb5kdc.log
                          admin_server = FILE:/var/log/kadmind.log
                        
                        [libdefaults]
                          default_realm = CP.ENTERPRISE.COM
                          dns_loookup_realm = false
                          dns_lookup_kdc = false
                          ticket_lifetime = 24h
                          renew_lifetime = 7d
                          forwardable = true
                        
                        [realms]
                          CP.ENTERPRISE.COM = {
                            kdc = kerberos.cp.enterprise.com
                          }
                          CORP.ENTERPRISE.COM = {
                            kdc = kerberos.corp.enterprise.com
                          }
                          DATALAKE.ENTERPRISE.COM = {
                            kdc = kerberos.datalake.enterprise.com
                          }
                        
                        [domain_realm]
                          .cp.enterprise.com = CP.ENTERPRISE.COM
                          .datalake.enterprise.com = DATALAKE.ENTERPRISE.COM
                          .corp.enterprise.com = CORP.ENTERPRISE.COM
                        [root@yav-28 ~]#

Step 3: Remote DataTap Configuration

On the remote HDFS NameNode service pointed to by the DataTap dtap://remotedata/, append the auth_to_local on the Hadoop cluster as follows:

RULE:[1:$1@$0](ambari-qa-hdp@epic.ENTERPRISE.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hbase-hdp@epic.ENTERPRISE.COM)s/.*/hbase/
RULE:[1:$1@$0](hdfs-hdp@epic.ENTERPRISE.COM)s/.*/hdfs/
RULE:[1:$1@$0](.*@CP.ENTERPRISE.COM)s/@.*//
RULE:[2:$1@$0](dn@CP.ENTERPRISE.COM)s/.*/hdfs/
RULE:[2:$1@$0](hbase@CP.ENTERPRISE.COM)s/.*/hbase/
RULE:[2:$1@$0](hive@CP.ENTERPRISE.COM)s/.*/hive/
RULE:[2:$1@$0](jhs@CP.ENTERPRISE.COM)s/.*/mapred/
RULE:[2:$1@$0](nm@CP.ENTERPRISE.COM)s/.*/yarn/
RULE:[2:$1@$0](nn@CP.ENTERPRISE.COM)s/.*/hdfs/
RULE:[2:$1@$0](rm@CP.ENTERPRISE.COM)s/.*/yarn/
RULE:[2:$1@$0](yarn@CP.ENTERPRISE.COM)s/.*/yarn/
RULE:[1:$1@$0](.*@CORP.ENTERPRISE.COM)s/@.*//
DEFAULT

Step 4: Hadoop Compute Cluster Configuration

To configure the Hadoop compute cluster:

  1. Modify the /etc/krb5.conf file on each of the virtual nodes, as follows:
    [realms]
      CP.ENTERPRISE.COM = {
        kdc = kerberos.cp.enterprise.com
      }
      CORP.ENTERPRISE.COM = {
        kdc = kerberos.corp.enterprise.com
        }
       DATALAKE.ENTERPRISE.COM = {
        kdc=kerberos.datalake.enterprise.com
      }
    
    [domain_realm]
      .cp.enterprise.com = CP.ENTERPRISE.COM
      .datalake.enterprise.com = DATALAKE.ENTERPRISE.COM
      .corp.enterprise.com = CORP.ENTERPRISE.COM
  2. Users in the realm CORP.ENTERPRISE.COM also need access to the HDFS file system in the Hadoop compute cluster. Enable this by adding the following rule to the hadoop.security.auth_to_local configuration file:

    RULE:[1:$1@$0](.*@CORP.ENTERPRISE.COM)s/@.*//
  3. Restart the Hadoop services once you have finished making these changes. Do not restart the Kerberos service, because Ambari will overwrite the modified /etc/krb5.conf file with the original version when it finds a mismatch.

Using Ambari to Configure /etc/krb5.conf

You may modify the /etc/krb5.conf file using the Ambari interface by selecting Admin>Kerberos>Configs. The advantage of using Ambari to modify the /etc/krb5.conf file is that you can freely restart all services.

Here, the Domains field is used to map server host names to the name of the Kerberos realm:



The krb5-conf template field allows you to append additional server host names to the realm name mapping:



You may also append additional realm/KDC declarations in the krb5-conf template field.



Debugging

If a failure occurs while trying to access the remote HDFS storage resource using the DataTap, you may try accessing the namenode of the remote HDFS storage resource directly.

You may view DataTap configuration using the Edit DataTap screen, as described in Editing an Existing DataTap.

This image shows a sample central dtap://remotedata/ configuration:



To test the configuration, log into any node in the Hadoop compute cluster and execute the kinit command to create a KDC session. The user you are logged in as must be able to be authenticated against either the CORP.ENTERPRISE.COM or the CP.ENTERPRISE.COM KDC realms. Once the kinit completes successfully, you should be able to access the namenode of the remote HDFS storage resource directly, without involving the deployment CNODE service, by executing the command bluedata-1 ~]$ hdfs dfs -ls hdfs://hdfs.datalake.enterprise.com/.

  • If this command completes successfully, then test accessing the namenode of the remote HDFS file system via the deployment CNODE service and DataTap by executing the command bluedata-1 ~]$ dfs -ls dtap://remotedata/.
  • If either of these commands fails, then there is an error in the KDC/HDP/HDFS configuration that must be resolved.

The following commands enable HDFS client debugging. Execute these commands before executing the hdfs dfs -ls command in order to log additional output:

export HADOOP_ROOT_LOGGER=DEBUG,console
export HADOOP_OPTS="-Dsun.security.krb5.debug=true -
Djavax.net.debug=ssl"