HDFS DataTap TDE Configuration

NOTE
This article only applies to HDFS DataTaps.

Transparent Data Encryption (TDE) provides end-to-end data encryption between virtual clusters and HDFS storage resources. This encryption and decryption are transparent, because no changes are required to the application code. Only the virtual cluster can encrypt and decrypt this data; the storage resource never stores nor accesses unencrypted data or the keys required to decrypt that data. This means that data is encrypted both when it is at rest (residing on storage media such as a disk) and in transit (being transmitted across a network). DataTaps handle TDE because encrypting and decrypting data is computationally intensive, and this method will only affect the container that accesses the TDE-enabled DataTap.

A virtual cluster that will use TDE-enabled DataTaps must be Kerberized, because the DataTap uses Kerberos authentication when communicating with the Key Management Service (KMS).

Enabling TDE requires several configuration changes to the remote HDFS storage resource, including:

  • Installing and configuring a KMS, including the Access Control List (ACL) and SSL.
  • Configuring the remote HDFS storage resource to use the KMS.
  • Creating an encryption key and encryption zone on the remote HDFS storage resource.

The instructions in this article assume that the remote HDFS storage resource and KMS have been correctly configured before proceeding to create and configure the DataTap. Please see Sample TDE Configuration, below, for a sample CDH-based HDFS and KMS) configuration.

On the HPE Ezmeral Runtime Enterprise side, enabling and supporting TDE requires the following configuration updates to the virtual cluster itself:

  • KMS URL: The DataTap and HDFS client use this information to locate the KMS.
  • Truststore: The DataTap and HDFS client use this information to authenticate with the KMS server because the protocol is based on HTTPS.
NOTE
The DataTap must be configured in passthrough mode (see HDFS DataTap Kerberos Security) in order to enable TDE.

Please see the appropriate section below for instructions on configuring a virtual cluster:

Configuring Cloudera Clusters for TDE

Configuring Cloudera clusters for TDE is a two-phase process:

Phase 1: Configuring the KMS URL (CDH)

To configure the KMS URL for a Cloudera virtual cluster:

  1. In the remote HDFS storage resource, add the dfs.encryption.key.provider.uri property to the following:
    • HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml.

    • HDFS Client Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml.

  2. In the remote HDFS storage resource, add the hadoop.security.key.provider.path property is added to Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml.

Phase 2: Configuring the Truststore (CDH)

To configure the Truststore for a Cloudera virtual cluster:

  1. Verify that the certificate file for the KMS server is ready. This example assumes that the certificate file is named selfsigned.cer.
  2. Execute the following commands to import the certificate into the truststore:
    cp /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/cacerts /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/jssecacerts
    /usr/java/jdk1.7.0_67-cloudera/jre/bin/keytool -import -alias kmshost -file /opt/cloudera/security/jks/selfsigned.cer -keystore
    /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/jssecacerts -storepass changeit
  3. Copy the truststore file (named jssecacerts in this example) to all of the virtual nodes/containers in the HPE Ezmeral Runtime Enterprise virtual cluster. The path to the truststore file must be identical on all nodes.
  4. In the remote HDFS storage resource, select HDFS>Configs>Advanced.

    The Advanced tab appears.

  5. Modify the ssl.client.truststore.location and ssl.client.truststore.password properties.
NOTE
If the ssl.client.truststore.location property is not configured for a Cloudera virtual cluster, then the Oracle JDK will search for the /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/jssecacerts file by default. This means that you can ignore the ssl.client.truststore.location and ssl.client.truststore.password properties if you are using this default configuration.

Configuring Hadoop Clusters for TDE

Configuring Hadoop clusters for TDE is a two-phase process:

Phase 1: Configuring the KMS URL (HDP)

To configure the KMS URL for a Hadoop virtual cluster:

  1. In the remote HDFS storage resource, select HDFS>Configs>Advanced.

    The Advanced tab appears.

  2. Modify the hadoop.security.key.provider.path property in the Advanced core-site section.
  3. Modify the dfs.encryption.key.provider.uri property in the Advanced hdfs-site section.

Phase 2: Configuring the Truststore (HDP)

To configure the Truststore for a Cloudera virtual cluster:

  1. Verify that the certificate file for the KMS server is ready. This example assumes that the certificate file is named selfsigned.cer.
  2. Execute the following commands to import the certificate into the truststore:
    cp /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/cacerts /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/jssecacerts
    /usr/java/jdk1.7.0_67-cloudera/jre/bin/keytool -import -alias kmshost -file /opt/cloudera/security/jks/selfsigned.cer -keystore
    /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/jssecacerts -storepass changeit
  3. Copy the truststore file (named jssecacerts in this example) to all of the virtual nodes/containers in the HPE Ezmeral Runtime Enterprise virtual cluster. The path to the truststore file must be identical on all nodes.
  4. In the remote HDFS storage resource, select HDFS>Configs>Advanced.

    The Advanced tab appears.

  5. Modify the ssl.client.truststore.location and ssl.client.truststore.password properties.

Configuration Example

This example demonstrates how to configure a sample remote HDFS storage device and KMS for use with a Cloudera virtual cluster. To do this:

  1. Kerberize all of the virtual nodes/containers in the Cloudera virtual cluster.
  2. In the remote HDFS storage resource, select HDFS>Actions>Set up HDFS Data At Rest Encryption.
  3. Follow the listed steps to enable TDE. The key goal here is to key point is to generate the keystore for the KMS server to enable HTTPS. This example uses a self-signed certificate for simplicity.

  4. Execute the following command on the node that hosts the Java KeyStore KMS service. (This example uses a cn of bluedata-4.encryption that should be replaced by the actual FQDN):

    /usr/java/jdk1.7.0_67-cloudera/jre/bin/keytool -genkeypair -alias kmshost -keyalg RSA -keysize 2048
            -dname "cn=bluedata-4.encryption, ou=EN, o=BD, l=SC, st=CA, c=US" -keypass password -keystore kmshost-keystore.jks -storepass password
    NOTE
    The keypass and storepass must be the same.
  5. Copy the generated keystore file to /opt/cloudera/security/jks/kmshost-keystore.jks.
    NOTE

    You can execute the following command to export the KMS certificate and then use that certificate generate the KMS client truststore:

    /usr/java/jdk1.7.0_67-cloudera/jre/bin/keytool -export -alias kmshost -keystore kmshost-keystore.jks -rfc -file selfsigned.cer
    .
  6. Based on the generated keystore file, configure TLS/SSL as shown here:

This procedure configures the KMS ACL, which will appear similar to the following:

<property>
  <name>hadoop.kms.acl.CREATE</name>
  <value>xou,kishore xou,kishore</value>
</property>
<property>
  <name>hadoop.kms.acl.DELETE</name>
  <value>xou,kishore xou,kishore</value>
</property>
<property>
  <name>hadoop.kms.acl.ROLLOVER</name>
  <value>xou,kishore xou,kishore</value>
</property>
<property>
  <name>hadoop.kms.acl.GET</name>
  <value></value>
</property>
<property>
  <name>hadoop.kms.acl.GET_KEYS</name>
  <value>xou,kishore xou,kishore</value>
</property>
<property>
  <name>hadoop.kms.acl.GET_METADATA</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>hadoop.kms.acl.SET_KEY_MATERIAL</name>
  <value></value>
</property>
<property>
  <name>hadoop.kms.acl.GENERATE_EEK</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>hadoop.kms.acl.DECRYPT_EEK</name>
  <value></value>
</property>
<property>
  <name>hadoop.kms.blacklist.CREATE</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>hadoop.kms.blacklist.DELETE</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>hadoop.kms.blacklist.ROLLOVER</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>hadoop.kms.blacklist.GET</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.kms.blacklist.GET_KEYS</name>
  <value></value>
</property>
<property>
  <name>hadoop.kms.blacklist.SET_KEY_MATERIAL</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.kms.blacklist.DECRYPT_EEK</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>default.key.acl.MANAGEMENT</name>
  <value></value>
</property>
<property>
  <name>default.key.acl.GENERATE_EEK</name>
  <value></value>
</property>
<property>
  <name>default.key.acl.DECRYPT_EEK</name>
  <value></value>
</property>
<property>
  <name>default.key.acl.READ</name>
  <value></value>
</property>
<property>
  <name>default.key.acl.MIGRATE</name>
  <value></value>
</property>
<property>
  <name>whitelist.key.acl.MANAGEMENT</name>
  <value>xou,kishore xou,kishore</value>
</property>
<property>
  <name>hadoop.kms.acl.CREATE</name>
  <value>xou,kishore xou,kishore</value>
</property>
<property>
  <name>hadoop.kms.acl.DELETE</name>
  <value>xou,kishore xou,kishore</value>
</property>
<property>
  <name>hadoop.kms.acl.ROLLOVER</name>
  <value>xou,kishore xou,kishore</value>
</property>
<property>
  <name>hadoop.kms.acl.GET</name>
  <value></value>
</property>
<property>
  <name>hadoop.kms.acl.GET_KEYS</name>
  <value>xou,kishore xou,kishore</value>
</property>
<property>
  <name>hadoop.kms.acl.GET_METADATA</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>hadoop.kms.acl.SET_KEY_MATERIAL</name>
  <value></value>
</property>
<property>
  <name>hadoop.kms.acl.GENERATE_EEK</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>hadoop.kms.acl.DECRYPT_EEK</name>
  <value></value>
</property>
<property>
  <name>hadoop.kms.blacklist.CREATE</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>hadoop.kms.blacklist.DELETE</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>hadoop.kms.blacklist.ROLLOVER</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>hadoop.kms.blacklist.GET</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.kms.blacklist.GET_KEYS</name>
  <value></value>
</property>
<property>
  <name>hadoop.kms.blacklist.SET_KEY_MATERIAL</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.kms.blacklist.DECRYPT_EEK</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>default.key.acl.MANAGEMENT</name>
  <value></value>
</property>
<property>
  <name>default.key.acl.GENERATE_EEK</name>
  <value></value>
</property>
<property>
  <name>default.key.acl.DECRYPT_EEK</name>
  <value></value>
</property>
<property>
  <name>default.key.acl.READ</name>
  <value></value>
</property>
<property>
  <name>default.key.acl.MIGRATE</name>
  <value></value>
</property>
<property>
  <name>whitelist.key.acl.MANAGEMENT</name>
  <value>xou,kishore xou,kishore</value>
</property>
<property>
  <name>whitelist.key.acl.READ</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>whitelist.key.acl.GENERATE_EEK</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>whitelist.key.acl.DECRYPT_EEK</name>
  <value>xou,kishore,yarn,nm xou,kishore,yarn,nm</value>
</property>property>
  <name>whitelist.key.acl.READ</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>whitelist.key.acl.GENERATE_EEK</name>
  <value>hdfs supergroup</value>
</property>
<property>
  <name>whitelist.key.acl.DECRYPT_EEK</name>
  <value>xou,kishore,yarn,nm xou,kishore,yarn,nm</value>
</property>
NOTE
In order to run a job, the YARN user of a Cloudera cluster and the NM user of a Hadoop cluster must have the DECRYPT_EEK privilege in order to access files in the encryption zone if the source/destination files of the job are located at the encryption zone.

Validation

To validate the TDE configuration:

  1. In the virtual cluster, execute the command hadoop key list to verify the communication between the cluster and KMS, as shown here:
                            [bluedata@bluedata-1 ~]$ hadoop key list
                            Listing keys for KeyProvider: KMSClientProvider[https://bluedata-4.encryption:16000/kms/v1/]
                            mykey1
  2. In the virtual cluster, execute the command openssl s_client -connect host.fqdn.name:port to check TLS/SSL negotiation. The output will appear similar to the following if the test is successful:
                            [bluedata@bluedata-1 ~]$ openssl s_client -connect bluedata-4.encryption:16000
                            CONNECTED(00000003)
                            depth=0 C = US, ST = CA, L = SC, O = BD, OU = EN, CN = bluedata-4.encryption
                            verify error:num=18:self signed certificate
                            verify return:1
                            depth=0 C = US, ST = CA, L = SC, O = BD, OU = EN, CN = bluedata-4.encryption
                            verify return:1
                            ---
                            Certificate chain
                            0 s:/C=US/ST=CA/L=SC/O=BD/OU=EN/CN=bluedata-4.encryption
                               i:/C=US/ST=CA/L=SC/O=BD/OU=EN/CN=bluedata-4.encryption
                            ---
                            Server certificate
                            -----BEGIN CERTIFICATE-----
                            MIIDYTCCAkmgAwIBAgIEQDnyMzANBgkqhkiG9w0BAQsFADBhMQswCQYDVQQGEwJV UzELMAkGA1UECBMCQ0ExCzAJBgNVBAcTAlNDMQswCQYDVQQKEwJCRDELMAkGA1UE CxMCRU4xHjAcBgNVBAMTFWJsdWVkYXRhLTQuZW5jcnlwdGlvbjAeFw0xNzA3MTgx NzExMDRaFw0xNzEwMTYxNzExMDRaMGExCzAJBgNVBAYTAlVTMQswCQYDVQQIEwJD QTELMAkGA1UEBxMCU0MxCzAJBgNVBAoTAkJEMQswCQYDVQQLEwJFTjEeMBwGA1UE AxMVYmx1ZWRhdGEtNC5lbmNyeXB0aW9uMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A MIIBCgKCAQEAivCHpkQfYy88Fg8dLnA5E3JJ4i9R7FRhi6zmNx9k+SI/QZLEERZ2 DJPUtvfvABHsSM9eSUSMGay6yYdwAjvrBaBI1Nwvcl+Sq2q+1kbcFf80F09b3oe0 2Ac3TyOlDVYWkXYquQFjsExMWJD32cgohrmhHzjU/zomxDO1Yltko4s7Bq+2jR9D w6PLMhno4qgtItqTeUqCqQg/iUdGVbdxWnXIFCZtMxIMZBub6vXsi8s2rnRi8PU5 IgmfO4HCqw84VNgKU5Z5i71wm7ZPJXM6Atb+fd/3TKvuY76dcz+YjSBOmBqn2Brm IkMYwOtOtXFQs4BHPZPlsPfLHeTBQy+LMwIDAQABoyEwHzAdBgNVHQ4EFgQUK92j s0W3FVtiB6G2MpKnmVI6mK0wDQYJKoZIhvcNAQELBQADggEBACavBuJ8n033GGjv oElJ+2FEjEItfci0dY50TCkKTlSJilLpVGOaWgqNAS6sD5qnodOQ5XhQ+smawNF4 XZ1zjhlN/AzwEInATvIgIICDgxKg30TWI5cJZ+Rr2fErr3SO1EPh8azsVy38UbjB /TtzrN4VWK+NeYZddGfo5SMyxSMAN2vf6Sn3Cll/spmDQCR9fXqQrNt/McDfm1rK BASWCAnMe0OQafXR9eYgy1mtSnP5KQc1A2rqK6oZC7tv+qiZtk0jfh4bAlWHgLOt X/yZRF4f49bdP7NioR9KsMnxc20JjwaDpYdyXK3b4U36/lphksllM4jCiGUvlcXI B/g+k1E=
                            -----END CERTIFICATE-----
                            subject=/C=US/ST=CA/L=SC/O=BD/OU=EN/CN=bluedata-4.encryption
                            issuer=/C=US/ST=CA/L=SC/O=BD/OU=EN/CN=bluedata-4.encryption
                            ---
                            No client certificate CA names sent
                            Server Temp Key: ECDH, secp521r1, 521 bits
                            ---
                            SSL handshake has read 1477 bytes and written 497 bytes
                            ---
                            New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-SHA384
                            Server public key is 2048 bit
                            Secure Renegotiation IS supported
                            Compression: NONE
                            Expansion: NONE
                            SSL-Session:
                                Protocol  : TLSv1.2
                                Cipher    : ECDHE-RSA-AES256-SHA384
                                Session-ID: 598B79F355B64A6106A82E735689E44F570DD6926B41082DDCD9E89B0E8CC49E
                                Session-ID-ctx:
                                Master-Key: 70000BD0F41E60933EACB912446AFD4C2F7A83E43444FEE1D989DB6D446A57B9D860BDAE6CE31BBAA4A498847C437FDD
                                Key-Arg   : None
                                Krb5 Principal: None
                                PSK identity: None
                                PSK identity hint: None
                                Start Time: 1502312947
                                Timeout   : 300 (sec)
                                Verify return code: 18 (self signed certificate)
                            ---
                            &H94;C
                            [bluedata@bluedata-1 ~]$
  3. In the virtual cluster, execute the following commands to output debugging information:
    export HADOOP_ROOT_LOGGER=DEBUG,console
    export HADOOP_OPTS="-Dsun.security.krb5.debug=true -Djavax.net.debug=ssl"