Submitting Spark Applications
This section describes how to submit the Spark applications using the Spark Operator on HPE Ezmeral Runtime Enterprise.
About this task
Spark resources are created in the tenant namespace managed by Kubernetes. When you submit
the Spark applications, Spark Operator creates a Kubernetes spark-submit
job. The spark-submit
job spawns the driver pod and driver creates the
executor pods. The driver pod remains in the Completed or
Error state and executor pods terminate after the completion or
failure of Spark applications. The driver pod does not consume any Kubernetes resources in
the Completed state and now you can view the logs to see execution
details and results.
To create and submit the Spark applications using the HPE Ezmeral Runtime Enterprise new UI, see Submitting and Managing Spark Applications Using HPE Ezmeral Runtime Enterprise new UI.
- Log in to HPE Ezmeral Runtime Enterprise as the Kubernetes Tenant Administrator or a Kubernetes Tenant Member. See Assigning/Revoking User Roles (Local) for local users or Assigning/Revoking User Roles (LDAP/AD) for LDAP/AD users.
- If you are a local user or if you have not enabled LDAP/AD, you must use
ticketcreator.sh
script from tenantcli pod to create the ticket secrets. Add the secret name tospark.mapr.user.secret
field on your Spark application YAML file (for example,spark-wc.yaml
). See Spark Security. - Create a specification in
yaml
format to store all the necessary configurations required for the application.For example: Spark 3.3.1 Wordcount Example for HPE Ezmeral Runtime Enterprise 5.6.
The Spark application specification is defined as kindSparkApplication
orScheduledSparkApplication
, seeSpark application CRDs. - Upload the application file, for example application JAR, Python, or R files, in the FsMounts, DataTaps, or S3 location in the cluster.
- To run the
kubectl
commands, access the Kubernetes Web Terminal on HPE Ezmeral Runtime Enterprise GUI or configure thekubectl
on your local machine, see Using the HPE Kubectl Plugin. If you are runningkubectl
from your local machine, you can store theyaml
file on your local machine. - Create a Spark application from YAML file by running the following
kubectl
command:kubectl apply -f /<path-to-spark-job-yaml-file> -n <tenant_namespace>
Results
Spark Operator receives and submits the configured Spark applications to run on the Kubernetes cluster.
Example
- Log in to HPE Ezmeral Runtime Enterprise as the Kubernetes Tenant Administrator or a Kubernetes Tenant Member.
- In the FsMounts screen, click the TenantShare link in the Name column of the table to open the Data Source Browser screen.
- Create the
data
andapps
subdirectory in the TenantShare filesystem mount. - Create a text file or download
wordcount.txt
example file from wordcount GitHub. - Upload the
wordcount.txt
file to thedata
subdirectory. Navigate to subdirectory in HPE Ezmeral Runtime Enterprise by/hcp/tenant-<tenant_id>/fsmount/data/wordcount.txt
. - Download the
spark-wc.yaml
file. For example:- Spark 2.4.7 Wordcount Example for HPE Ezmeral Runtime Enterprise 5.6
- Spark 3.3.1 Wordcount Example for HPE Ezmeral Runtime Enterprise 5.6
examples
folder. - Update the namespace on YAML file to
<tenant-namespace>
, input filename tospark-wordcount
, and add path towordcount.txt
to arguments field as- maprfs:///hcp/tenant-<tenant_id>/fsmount/data/workdcount.txt
. - Upload the wordcount YAML at
/bd-fs-mnt/TenantShare/apps/
location in HPE Ezmeral Runtime Enterprise GUI asspark-wc.yaml
file. - To run the
kubectl
commands, access the Kubernetes Web Terminal on HPE Ezmeral Runtime Enterprise GUI or configure the kubectl on your local machine, see Using the HPE Kubectl Plugin. - To run the Spark wordcount (wordcount.txt) example,
execute:
You will get the following output:kubectl apply -f /bd-fs-mnt/TenantShare/apps/spark-wc.yaml -n <tenant_namespace>
sparkapplication.sparkoperator.hpe.com/spark-wordcount created
- To check the pods running within the tenant namespace,
run:
You will get the following output:kubectl get pods -n <tenant-namespace>
After the job completion, the status will change to Completed.NAME READY STATUS RESTARTS AGE hivemeta-9b4c8cfb5-hgbjf 1/1 Running 7 23h spark-wordcount-driver 1/1 Running 0 13m sparkhs-7bfb88bc4-m6b54 1/1 Running 6 23h tenantcli-0 0/1 Running 0 23h
- To show the logs of the driver pod for the submitted Spark applications,
run:
kubectl logs spark-wordcount-driver --follow -n <tenant-namespace>