Submitting Spark Applications

This section describes how to submit the Spark applications using the Spark Operator on HPE Ezmeral Runtime Enterprise.

About this task

Spark resources are created in the tenant namespace managed by Kubernetes. When you submit the Spark applications, Spark Operator creates a Kubernetes spark-submit job. The spark-submit job spawns the driver pod and driver creates the executor pods. The driver pod remains in the Completed or Error state and executor pods terminate after the completion or failure of Spark applications. The driver pod does not consume any Kubernetes resources in the Completed state and now you can view the logs to see execution details and results.

To create and submit the Spark applications using the HPE Ezmeral Runtime Enterprise new UI, see Submitting and Managing Spark Applications Using HPE Ezmeral Runtime Enterprise new UI.

To manually create and submit the Spark applications using the Spark Operator in HPE Ezmeral Runtime Enterprise, perform the following steps:
  1. Log in to HPE Ezmeral Runtime Enterprise as the Kubernetes Tenant Administrator or a Kubernetes Tenant Member. See Assigning/Revoking User Roles (Local) for local users or Assigning/Revoking User Roles (LDAP/AD) for LDAP/AD users.
  2. If you are a local user or if you have not enabled LDAP/AD, you must use ticketcreator.sh script from tenantcli pod to create the ticket secrets. Add the secret name to spark.mapr.user.secret field on your Spark application YAML file (for example, spark-wc.yaml). See Spark Security.
  3. Create a specification in yaml format to store all the necessary configurations required for the application.

    For example: Spark 3.3.1 Wordcount Example for HPE Ezmeral Runtime Enterprise 5.6.

    The Spark application specification is defined as kind SparkApplication or ScheduledSparkApplication, seeSpark application CRDs.
  4. Upload the application file, for example application JAR, Python, or R files, in the FsMounts, DataTaps, or S3 location in the cluster.
  5. To run the kubectl commands, access the Kubernetes Web Terminal on HPE Ezmeral Runtime Enterprise GUI or configure the kubectl on your local machine, see Using the HPE Kubectl Plugin. If you are running kubectl from your local machine, you can store the yaml file on your local machine.
  6. Create a Spark application from YAML file by running the following kubectl command:
    kubectl apply -f /<path-to-spark-job-yaml-file> -n <tenant_namespace>

Results

Spark Operator receives and submits the configured Spark applications to run on the Kubernetes cluster.

Example

To run the Spark application to count the words in a file using FsMounts as the file system storage, perform the following steps:
  1. Log in to HPE Ezmeral Runtime Enterprise as the Kubernetes Tenant Administrator or a Kubernetes Tenant Member.
  2. In the FsMounts screen, click the TenantShare link in the Name column of the table to open the Data Source Browser screen.
  3. Create the data and apps subdirectory in the TenantShare filesystem mount.
  4. Create a text file or download wordcount.txt example file from wordcount GitHub.
  5. Upload the wordcount.txt file to the data subdirectory. Navigate to subdirectory in HPE Ezmeral Runtime Enterprise by /hcp/tenant-<tenant_id>/fsmount/data/wordcount.txt.
  6. Download the spark-wc.yaml file. For example:To locate Spark examples for other versions of HPE Ezmeral Runtime Enterprise, navigate to the release branch of your choice at Spark on K8s GitHub location and find the examples in examples folder.
  7. Update the namespace on YAML file to <tenant-namespace>, input filename to spark-wordcount, and add path to wordcount.txt to arguments field as - maprfs:///hcp/tenant-<tenant_id>/fsmount/data/workdcount.txt.
  8. Upload the wordcount YAML at /bd-fs-mnt/TenantShare/apps/ location in HPE Ezmeral Runtime Enterprise GUI as spark-wc.yaml file.
  9. To run the kubectl commands, access the Kubernetes Web Terminal on HPE Ezmeral Runtime Enterprise GUI or configure the kubectl on your local machine, see Using the HPE Kubectl Plugin.
  10. To run the Spark wordcount (wordcount.txt) example, execute:
    kubectl apply -f /bd-fs-mnt/TenantShare/apps/spark-wc.yaml -n <tenant_namespace>
    You will get the following output:
    sparkapplication.sparkoperator.hpe.com/spark-wordcount created
  11. To check the pods running within the tenant namespace, run:
    kubectl get pods -n <tenant-namespace>
    You will get the following output:
    
    NAME                              READY     STATUS       RESTARTS     AGE
    hivemeta-9b4c8cfb5-hgbjf           1/1      Running         7         23h
    spark-wordcount-driver            1/1      Running         0         13m
    sparkhs-7bfb88bc4-m6b54            1/1      Running         6         23h
    tenantcli-0                        0/1      Running         0         23h
    
    After the job completion, the status will change to Completed.
  12. To show the logs of the driver pod for the submitted Spark applications, run:
    kubectl logs spark-wordcount-driver --follow -n <tenant-namespace>