HDFS DataTap TDE Configuration
Transparent Data Encryption (TDE) provides end-to-end data encryption between virtual clusters and HDFS storage resources. This encryption and decryption are transparent, because no changes are required to the application code. Only the virtual cluster can encrypt and decrypt this data; the storage resource never stores nor accesses unencrypted data or the keys required to decrypt that data. This means that data is encrypted both when it is at rest (residing on storage media such as a disk) and in transit (being transmitted across a network). DataTaps handle TDE because encrypting and decrypting data is computationally intensive, and this method will only affect the container that accesses the TDE-enabled DataTap.
A virtual cluster that will use TDE-enabled DataTaps must be Kerberized, because the DataTap uses Kerberos authentication when communicating with the Key Management Service (KMS).
Enabling TDE requires several configuration changes to the remote HDFS storage resource, including:
- Installing and configuring a KMS, including the Access Control List (ACL) and SSL.
- Configuring the remote HDFS storage resource to use the KMS.
- Creating an encryption key and encryption zone on the remote HDFS storage resource.
The instructions in this article assume that the remote HDFS storage resource and KMS have been correctly configured before proceeding to create and configure the DataTap. Please see Sample TDE Configuration, below, for a sample CDH-based HDFS and KMS) configuration.
On the HPE Ezmeral Runtime Enterprise side, enabling and supporting TDE requires the following configuration updates to the virtual cluster itself:
- KMS URL: The DataTap and HDFS client use this information to locate the KMS.
- Truststore: The DataTap and HDFS client use this information to authenticate with the KMS server because the protocol is based on HTTPS.
Please see the appropriate section below for instructions on configuring a virtual cluster:
- CDH clusters: See TDE Configuration for Cloudera Clusters.
- HDP clusters: See TDE Configuration for Hadoop Clusters.
Configuring Cloudera Clusters for TDE
Configuring Cloudera clusters for TDE is a two-phase process:
- KMS URL: See Phase 1: Configuring the KMS URL (CDH).
- Truststore: See Phase 2: Configuring the Truststore (CDH).
Phase 1: Configuring the KMS URL (CDH)
To configure the KMS URL for a Cloudera virtual cluster:
- In the remote HDFS storage resource, add the
dfs.encryption.key.provider.uri
property to the following:HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml.
HDFS Client Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml.
-
In the remote HDFS storage resource, add the
hadoop.security.key.provider.path
property is added to Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml.
Phase 2: Configuring the Truststore (CDH)
To configure the Truststore for a Cloudera virtual cluster:
- Verify that the certificate file for the KMS server is ready. This example
assumes that the certificate file is named
selfsigned.cer
. - Execute the following commands to import the certificate into the truststore:
cp /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/cacerts /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/jssecacerts /usr/java/jdk1.7.0_67-cloudera/jre/bin/keytool -import -alias kmshost -file /opt/cloudera/security/jks/selfsigned.cer -keystore /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/jssecacerts -storepass changeit
- Copy the truststore file (named
jssecacerts
in this example) to all of the virtual nodes/containers in the HPE Ezmeral Runtime Enterprise virtual cluster. The path to the truststore file must be identical on all nodes. -
In the remote HDFS storage resource, select HDFS>Configs>Advanced.
The Advanced tab appears.
- Modify the
ssl.client.truststore.location
andssl.client.truststore.password
properties.
ssl.client.truststore.location
property is not configured for a Cloudera virtual cluster, then
the Oracle JDK will search for the /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/jssecacerts
file by default. This means that you can ignore the
ssl.client.truststore.location
and
ssl.client.truststore.password
properties if you are using this
default configuration.
Configuring Hadoop Clusters for TDE
Configuring Hadoop clusters for TDE is a two-phase process:
- KMS URL: See Phase 1: Configuring the KMS URL (HDP).
- Truststore: See Phase 2: Configuring the Truststore (HDP).
Phase 1: Configuring the KMS URL (HDP)
To configure the KMS URL for a Hadoop virtual cluster:
-
In the remote HDFS storage resource, select HDFS>Configs>Advanced.
The Advanced tab appears.
- Modify the
hadoop.security.key.provider.path
property in the Advanced core-site section. - Modify the
dfs.encryption.key.provider.uri
property in the Advanced hdfs-site section.
Phase 2: Configuring the Truststore (HDP)
To configure the Truststore for a Cloudera virtual cluster:
- Verify that the certificate file for the KMS server is ready. This example
assumes that the certificate file is named
selfsigned.cer
. - Execute the following commands to import the certificate into the truststore:
cp /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/cacerts /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/jssecacerts /usr/java/jdk1.7.0_67-cloudera/jre/bin/keytool -import -alias kmshost -file /opt/cloudera/security/jks/selfsigned.cer -keystore /usr/java/jdk1.7.0_67-cloudera/jre/lib/security/jssecacerts -storepass changeit
- Copy the truststore file (named
jssecacerts
in this example) to all of the virtual nodes/containers in the HPE Ezmeral Runtime Enterprise virtual cluster. The path to the truststore file must be identical on all nodes. -
In the remote HDFS storage resource, select HDFS>Configs>Advanced.
The Advanced tab appears.
- Modify the
ssl.client.truststore.location
andssl.client.truststore.password
properties.
Configuration Example
This example demonstrates how to configure a sample remote HDFS storage device and KMS for use with a Cloudera virtual cluster. To do this:
- Kerberize all of the virtual nodes/containers in the Cloudera virtual cluster.
- In the remote HDFS storage resource, select HDFS>Actions>Set up HDFS Data At Rest Encryption.
-
Follow the listed steps to enable TDE. The key goal here is to key point is to generate the keystore for the KMS server to enable HTTPS. This example uses a self-signed certificate for simplicity.
-
Execute the following command on the node that hosts the Java KeyStore KMS service. (This example uses a
cn
ofbluedata-4.encryption
that should be replaced by the actual FQDN):/usr/java/jdk1.7.0_67-cloudera/jre/bin/keytool -genkeypair -alias kmshost -keyalg RSA -keysize 2048 -dname "cn=bluedata-4.encryption, ou=EN, o=BD, l=SC, st=CA, c=US" -keypass password -keystore kmshost-keystore.jks -storepass password
NOTEThe keypass and storepass must be the same. - Copy the generated keystore file to
/opt/cloudera/security/jks/kmshost-keystore.jks
.NOTEYou can execute the following command to export the KMS certificate and then use that certificate generate the KMS client truststore:
./usr/java/jdk1.7.0_67-cloudera/jre/bin/keytool -export -alias kmshost -keystore kmshost-keystore.jks -rfc -file selfsigned.cer
-
Based on the generated keystore file, configure TLS/SSL as shown here:
This procedure configures the KMS ACL, which will appear similar to the following:
<property>
<name>hadoop.kms.acl.CREATE</name>
<value>xou,kishore xou,kishore</value>
</property>
<property>
<name>hadoop.kms.acl.DELETE</name>
<value>xou,kishore xou,kishore</value>
</property>
<property>
<name>hadoop.kms.acl.ROLLOVER</name>
<value>xou,kishore xou,kishore</value>
</property>
<property>
<name>hadoop.kms.acl.GET</name>
<value></value>
</property>
<property>
<name>hadoop.kms.acl.GET_KEYS</name>
<value>xou,kishore xou,kishore</value>
</property>
<property>
<name>hadoop.kms.acl.GET_METADATA</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>hadoop.kms.acl.SET_KEY_MATERIAL</name>
<value></value>
</property>
<property>
<name>hadoop.kms.acl.GENERATE_EEK</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>hadoop.kms.acl.DECRYPT_EEK</name>
<value></value>
</property>
<property>
<name>hadoop.kms.blacklist.CREATE</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>hadoop.kms.blacklist.DELETE</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>hadoop.kms.blacklist.ROLLOVER</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>hadoop.kms.blacklist.GET</name>
<value>*</value>
</property>
<property>
<name>hadoop.kms.blacklist.GET_KEYS</name>
<value></value>
</property>
<property>
<name>hadoop.kms.blacklist.SET_KEY_MATERIAL</name>
<value>*</value>
</property>
<property>
<name>hadoop.kms.blacklist.DECRYPT_EEK</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>default.key.acl.MANAGEMENT</name>
<value></value>
</property>
<property>
<name>default.key.acl.GENERATE_EEK</name>
<value></value>
</property>
<property>
<name>default.key.acl.DECRYPT_EEK</name>
<value></value>
</property>
<property>
<name>default.key.acl.READ</name>
<value></value>
</property>
<property>
<name>default.key.acl.MIGRATE</name>
<value></value>
</property>
<property>
<name>whitelist.key.acl.MANAGEMENT</name>
<value>xou,kishore xou,kishore</value>
</property>
<property>
<name>hadoop.kms.acl.CREATE</name>
<value>xou,kishore xou,kishore</value>
</property>
<property>
<name>hadoop.kms.acl.DELETE</name>
<value>xou,kishore xou,kishore</value>
</property>
<property>
<name>hadoop.kms.acl.ROLLOVER</name>
<value>xou,kishore xou,kishore</value>
</property>
<property>
<name>hadoop.kms.acl.GET</name>
<value></value>
</property>
<property>
<name>hadoop.kms.acl.GET_KEYS</name>
<value>xou,kishore xou,kishore</value>
</property>
<property>
<name>hadoop.kms.acl.GET_METADATA</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>hadoop.kms.acl.SET_KEY_MATERIAL</name>
<value></value>
</property>
<property>
<name>hadoop.kms.acl.GENERATE_EEK</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>hadoop.kms.acl.DECRYPT_EEK</name>
<value></value>
</property>
<property>
<name>hadoop.kms.blacklist.CREATE</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>hadoop.kms.blacklist.DELETE</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>hadoop.kms.blacklist.ROLLOVER</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>hadoop.kms.blacklist.GET</name>
<value>*</value>
</property>
<property>
<name>hadoop.kms.blacklist.GET_KEYS</name>
<value></value>
</property>
<property>
<name>hadoop.kms.blacklist.SET_KEY_MATERIAL</name>
<value>*</value>
</property>
<property>
<name>hadoop.kms.blacklist.DECRYPT_EEK</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>default.key.acl.MANAGEMENT</name>
<value></value>
</property>
<property>
<name>default.key.acl.GENERATE_EEK</name>
<value></value>
</property>
<property>
<name>default.key.acl.DECRYPT_EEK</name>
<value></value>
</property>
<property>
<name>default.key.acl.READ</name>
<value></value>
</property>
<property>
<name>default.key.acl.MIGRATE</name>
<value></value>
</property>
<property>
<name>whitelist.key.acl.MANAGEMENT</name>
<value>xou,kishore xou,kishore</value>
</property>
<property>
<name>whitelist.key.acl.READ</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>whitelist.key.acl.GENERATE_EEK</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>whitelist.key.acl.DECRYPT_EEK</name>
<value>xou,kishore,yarn,nm xou,kishore,yarn,nm</value>
</property>property>
<name>whitelist.key.acl.READ</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>whitelist.key.acl.GENERATE_EEK</name>
<value>hdfs supergroup</value>
</property>
<property>
<name>whitelist.key.acl.DECRYPT_EEK</name>
<value>xou,kishore,yarn,nm xou,kishore,yarn,nm</value>
</property>
Validation
To validate the TDE configuration:
- In the virtual cluster, execute the command
hadoop key list
to verify the communication between the cluster and KMS, as shown here:[bluedata@bluedata-1 ~]$ hadoop key list Listing keys for KeyProvider: KMSClientProvider[https://bluedata-4.encryption:16000/kms/v1/] mykey1
- In the virtual cluster, execute the command
openssl s_client -connect host.fqdn.name:port
to check TLS/SSL negotiation. The output will appear similar to the following if the test is successful:[bluedata@bluedata-1 ~]$ openssl s_client -connect bluedata-4.encryption:16000 CONNECTED(00000003) depth=0 C = US, ST = CA, L = SC, O = BD, OU = EN, CN = bluedata-4.encryption verify error:num=18:self signed certificate verify return:1 depth=0 C = US, ST = CA, L = SC, O = BD, OU = EN, CN = bluedata-4.encryption verify return:1 --- Certificate chain 0 s:/C=US/ST=CA/L=SC/O=BD/OU=EN/CN=bluedata-4.encryption i:/C=US/ST=CA/L=SC/O=BD/OU=EN/CN=bluedata-4.encryption --- Server certificate -----BEGIN CERTIFICATE----- MIIDYTCCAkmgAwIBAgIEQDnyMzANBgkqhkiG9w0BAQsFADBhMQswCQYDVQQGEwJV UzELMAkGA1UECBMCQ0ExCzAJBgNVBAcTAlNDMQswCQYDVQQKEwJCRDELMAkGA1UE CxMCRU4xHjAcBgNVBAMTFWJsdWVkYXRhLTQuZW5jcnlwdGlvbjAeFw0xNzA3MTgx NzExMDRaFw0xNzEwMTYxNzExMDRaMGExCzAJBgNVBAYTAlVTMQswCQYDVQQIEwJD QTELMAkGA1UEBxMCU0MxCzAJBgNVBAoTAkJEMQswCQYDVQQLEwJFTjEeMBwGA1UE AxMVYmx1ZWRhdGEtNC5lbmNyeXB0aW9uMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A MIIBCgKCAQEAivCHpkQfYy88Fg8dLnA5E3JJ4i9R7FRhi6zmNx9k+SI/QZLEERZ2 DJPUtvfvABHsSM9eSUSMGay6yYdwAjvrBaBI1Nwvcl+Sq2q+1kbcFf80F09b3oe0 2Ac3TyOlDVYWkXYquQFjsExMWJD32cgohrmhHzjU/zomxDO1Yltko4s7Bq+2jR9D w6PLMhno4qgtItqTeUqCqQg/iUdGVbdxWnXIFCZtMxIMZBub6vXsi8s2rnRi8PU5 IgmfO4HCqw84VNgKU5Z5i71wm7ZPJXM6Atb+fd/3TKvuY76dcz+YjSBOmBqn2Brm IkMYwOtOtXFQs4BHPZPlsPfLHeTBQy+LMwIDAQABoyEwHzAdBgNVHQ4EFgQUK92j s0W3FVtiB6G2MpKnmVI6mK0wDQYJKoZIhvcNAQELBQADggEBACavBuJ8n033GGjv oElJ+2FEjEItfci0dY50TCkKTlSJilLpVGOaWgqNAS6sD5qnodOQ5XhQ+smawNF4 XZ1zjhlN/AzwEInATvIgIICDgxKg30TWI5cJZ+Rr2fErr3SO1EPh8azsVy38UbjB /TtzrN4VWK+NeYZddGfo5SMyxSMAN2vf6Sn3Cll/spmDQCR9fXqQrNt/McDfm1rK BASWCAnMe0OQafXR9eYgy1mtSnP5KQc1A2rqK6oZC7tv+qiZtk0jfh4bAlWHgLOt X/yZRF4f49bdP7NioR9KsMnxc20JjwaDpYdyXK3b4U36/lphksllM4jCiGUvlcXI B/g+k1E= -----END CERTIFICATE----- subject=/C=US/ST=CA/L=SC/O=BD/OU=EN/CN=bluedata-4.encryption issuer=/C=US/ST=CA/L=SC/O=BD/OU=EN/CN=bluedata-4.encryption --- No client certificate CA names sent Server Temp Key: ECDH, secp521r1, 521 bits --- SSL handshake has read 1477 bytes and written 497 bytes --- New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-SHA384 Server public key is 2048 bit Secure Renegotiation IS supported Compression: NONE Expansion: NONE SSL-Session: Protocol : TLSv1.2 Cipher : ECDHE-RSA-AES256-SHA384 Session-ID: 598B79F355B64A6106A82E735689E44F570DD6926B41082DDCD9E89B0E8CC49E Session-ID-ctx: Master-Key: 70000BD0F41E60933EACB912446AFD4C2F7A83E43444FEE1D989DB6D446A57B9D860BDAE6CE31BBAA4A498847C437FDD Key-Arg : None Krb5 Principal: None PSK identity: None PSK identity hint: None Start Time: 1502312947 Timeout : 300 (sec) Verify return code: 18 (self signed certificate) --- &H94;C [bluedata@bluedata-1 ~]$
- In the virtual cluster, execute the following commands to output debugging
information:
export HADOOP_ROOT_LOGGER=DEBUG,console export HADOOP_OPTS="-Dsun.security.krb5.debug=true -Djavax.net.debug=ssl"