Submitting Spark Applications
This section describes how to submit the Spark applications using the Spark Operator on HPE Ezmeral Runtime Enterprise.
About this task
Spark resources are created in the tenant namespace managed by Kubernetes. When you submit
        the Spark applications, Spark Operator creates a Kubernetes spark-submit
        job. The spark-submit job spawns the driver pod and driver creates the
        executor pods. The driver pod remains in the Completed or
          Error state and executor pods terminate after the completion or
        failure of Spark applications. The driver pod does not consume any Kubernetes resources in
        the Completed state and now you can view the logs to see execution
        details and results.
To create and submit the Spark applications using the HPE Ezmeral Runtime Enterprise new UI, see Submitting and Managing Spark Applications Using HPE Ezmeral Runtime Enterprise new UI.
- Log in to HPE Ezmeral Runtime Enterprise as the Kubernetes Tenant Administrator or a Kubernetes Tenant Member. See Assigning/Revoking User Roles (Local) for local users or Assigning/Revoking User Roles (LDAP/AD) for LDAP/AD users.
- If you are a local user or if you have not enabled LDAP/AD, you must use
              ticketcreator.shscript from tenantcli pod to create the ticket secrets. Add the secret name tospark.mapr.user.secretfield on your Spark application YAML file (for example,spark-wc.yaml). See Spark Security.
- Create a specification in yamlformat to store all the necessary configurations required for the application.For example: Spark 3.3.1 Wordcount Example for HPE Ezmeral Runtime Enterprise 5.6. The Spark application specification is defined as kindSparkApplicationorScheduledSparkApplication, seeSpark application CRDs.
- Upload the application file, for example application JAR, Python, or R files, in the FsMounts, DataTaps, or S3 location in the cluster.
- To run the kubectlcommands, access the Kubernetes Web Terminal on HPE Ezmeral Runtime Enterprise GUI or configure thekubectlon your local machine, see Using the HPE Kubectl Plugin. If you are runningkubectlfrom your local machine, you can store theyamlfile on your local machine.
- Create a Spark application from YAML file by running the following
              kubectlcommand:kubectl apply -f /<path-to-spark-job-yaml-file> -n <tenant_namespace>
Results
Spark Operator receives and submits the configured Spark applications to run on the Kubernetes cluster.
Example
- Log in to HPE Ezmeral Runtime Enterprise as the Kubernetes Tenant Administrator or a Kubernetes Tenant Member.
- In the FsMounts screen, click the TenantShare link in the Name column of the table to open the Data Source Browser screen.
- Create the dataandappssubdirectory in the TenantShare filesystem mount.
- Create a text file or download wordcount.txtexample file from wordcount GitHub.
- Upload the wordcount.txtfile to thedatasubdirectory. Navigate to subdirectory in HPE Ezmeral Runtime Enterprise by/hcp/tenant-<tenant_id>/fsmount/data/wordcount.txt.
- Download the spark-wc.yamlfile. For example:- Spark 2.4.7 Wordcount Example for HPE Ezmeral Runtime Enterprise 5.6
- Spark 3.3.1 Wordcount Example for HPE Ezmeral Runtime Enterprise 5.6
 examplesfolder.
- Update the namespace on YAML file to <tenant-namespace>, input filename tospark-wordcount, and add path towordcount.txtto arguments field as- maprfs:///hcp/tenant-<tenant_id>/fsmount/data/workdcount.txt.
-  Upload the wordcount YAML at /bd-fs-mnt/TenantShare/apps/location in HPE Ezmeral Runtime Enterprise GUI asspark-wc.yamlfile.
- To run the kubectlcommands, access the Kubernetes Web Terminal on HPE Ezmeral Runtime Enterprise GUI or configure the kubectl on your local machine, see Using the HPE Kubectl Plugin.
- To run the Spark wordcount (wordcount.txt) example,
          execute:
 You will get the following output:kubectl apply -f /bd-fs-mnt/TenantShare/apps/spark-wc.yaml -n <tenant_namespace>sparkapplication.sparkoperator.hpe.com/spark-wordcount created
- To check the pods running within the tenant namespace,
          run:
 You will get the following output:kubectl get pods -n <tenant-namespace>
 After the job completion, the status will change to Completed.NAME READY STATUS RESTARTS AGE hivemeta-9b4c8cfb5-hgbjf 1/1 Running 7 23h spark-wordcount-driver 1/1 Running 0 13m sparkhs-7bfb88bc4-m6b54 1/1 Running 6 23h tenantcli-0 0/1 Running 0 23h
- To show the logs of the driver pod for the submitted Spark applications,
          run:kubectl logs spark-wordcount-driver --follow -n <tenant-namespace>