Spark Security

This topic describes the Spark security concepts in HPE Ezmeral Runtime Enterprise.

Authentication for Spark on Kubernetes

Kubernetes authentication and authorization rules are applicable to Spark applications of kind SparkApplication or ScheduledSparkApplication.

For example: You can create, edit, delete, and submit the Spark applications according to RBAC configuration in a tenant namespace.

User Secrets

Spark application images are run as a root user. You must start Spark applications as a user who submits the Spark Application.

HPE Ezmeral Data Fabric is configured with Data Fabric SASL security. When you create the Spark applications in a Data Fabric which is HPE Ezmeral Data Fabric on Kubernetes tenant or in HPE Ezmeral Data Fabric on Bare Metal tenant, you must authenticate Spark driver pods against the HPE Ezmeral Data Fabric.

To start a Spark application as a user who submits the Spark application and to authenticate Spark driver pods against the Data Fabric, you must create a secret. A secret contains the user information like user id, user name, user’s main group id and group name, and user’s MapR ticket.

Creating User Secrets

You can create a user secret in three different ways:

Automatically creating secrets:

The autoticketgenerator webhook intercepts all the Create Spark Application requests.

The webhook automatically generates a ticket and secret when AD/LDAP integration is enabled on Data Fabric and Kubernetes cluster. This ticket has a default expiration time of 14 days.

You cannot change or renew the expiration time of ticket.

The generated secrets will be deleted when you delete the Spark Application.

Manually creating secrets with ticketcreator utility:
The Data Fabric tenants contain the tenantcli pod. You can manually create your user secrets using the ticketcreator.sh script in the tenantcli pod.

This ticket has a default expiration time of 14 days. You provide this ticket to the Spark applications using the secrets; thus, you cannot change or renew the expiration time of ticket.

Perform the following steps to use ticketcreator.sh script from tenantcli pod:
  1. Run the following command to enter into tenantcli pod on tenant namespace.
    kubectl exec -it tenantcli-0 -n <namespace> -- bash
  2. Run the ticketcreator.sh utility by using the following command:
     /opt/mapr/kubernetes/ticketcreator.sh
  3. Enter the following information on the prompt:
    1. The username and password of the user for whom to create the secret.
    2. The name of the user secret. The default name is randomized for security.

Add the secret name to spark.mapr.user.secret field on your Spark application yaml file.

Manually creating secret without ticketcreator utility:

Some Spark applications have a long runtime, for example, Spark streaming applications. In such cases, you will lose the access to HPE Ezmeral Data Fabric services like HPE Ezmeral Data Fabric Filesystem in 14 days.

For the Spark applications which must run for a long time (greater than 14 days), you can create the ticket secrets with a longer expiration time using -duration option of the maprlogin utility. The maprlogin utility is available at Kubernetes cluster or at tentantcli-0 pod and admincli-0 pod at HPE Ezmeral Data Fabric on Kubernetes cluster. See Tickets and mapr Command Examples.

For example: If you have a ticket saved at /home/user/maprticket file, you can run the following command to manually create ticket secrets with a long expiration time:

kubectl -n <namespace> create secret generic <secret-name> \
--from-file=CONTAINER_TICKET=/home/user/maprticket \ 
--from-literal=MAPR_SPARK_USER="[username]" \ 
--from-literal=MAPR_SPARK_GROUP="[usergroup]" \
--from-literal=MAPR_SPARK_UID="[uid]" \
--from-literal=MAPR_SPARK_GID="[main_gid]"

Add the secret name to spark.mapr.user.secret field on your Spark Application yaml file.