Submitting Spark Application Using Livy
This section guides you through starting Apache Livy session and exceuting a code in a Livy session.This page shows some examples of Livy supporting multiple APIs and Livy batches.
To find out which Livy images to use with installed python packages for PySpark, installed R packages for SparkR, and for basic spark sessions with Scala. See Spark Images.
Start Livy Session
If you are an LDAP/AD user, you can navigate to Kubernetes > Tenants > Applications > Service Endpoints on HPE Ezmeral Runtime Enterprise to find livy-http URL or Access Point and corresponding port.
Run the following commands to submit REST API call to start a Livy session:
curl -k -v \
-X POST \
-H "Content-Type: application/json" \
-d '{}' \
-u "username:password" \
https://<livy-url>
Code Execution in a Livy Session
- Run the following command to input some text file into the HPE Ezmeral Data Fabric file
system:
kubectl -n sampletenant exec -it tenantcli-0 -- hadoop fs -put /etc/passwd
- Execute the following command to run a Spark job in the Livy session:
curl -k \
-X POST \
-H "Content-Type: application/json" \
-d '{"kind": "spark", "code": "var a = spark.read.csv(\"/user/mapr/passwd\"); a.show();"}' \
-u "username:password" \
https://<livy-url>/sessions/<session-number>/statements
Delete Livy Session
Run the following command to delete the Livy session:
curl -k -X DELETE "https://<livy-URL>/sessions/<session-number>"; echo
When you delete a Livy session, Livy server stops the execution of the Spark job created for the current session and both driver and executor pods remain at a Completed state until it is removed by the Kubernetes API.
Livy Session Supports Multiple APIs
-
Livy Session (PySpark)
Run the following commands to submit REST API call to start a Livy session for PySpark:curl -k \ -X POST \ -H "Content-Type: application/json" \ -d '{"conf":{"spark.kubernetes.container.image":"gcr.io/mapr-252711/<livy-image-for-PySpark>"},"kind":"pyspark"}' \ -u "username:password" \ https://<livy-url>/sessions
Execute the following command to run a Spark job using PySpark:curl -k \ -X POST \ -H "Content-Type: application/json" \ -d '{"code": "sc.parallelize([0, 2, 3, 4, 6], 5).glom().collect();"}' \ -u "username:password" \ https://<livy-url>/sessions/<session-number>/statements
-
Livy Session (R)
Run the following commands to submit REST API call to start a Livy session for SparkR:curl -k \ -X POST \ -H "Content-Type:application/json" \ -d '{"conf":{"spark.kubernetes.container.image":"gcr.io/mapr-252711/<livy-image-for-SparkR"},"kind":"sparkr"}' \ -u "username:password" \ https://<livy-url>/sessions
Execute the following command to run a Spark job using SparkR:curl -k \ -X POST \ -H "Content-Type: application/json" \ -d '{"code": "summary(data.frame( emp_id = c(1:5), emp_name = c(\"Rick\",\"Dan\",\"Michelle\",\"Ryan\",\"Gary\"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = as.Date(c(\"2012-01-01\",\"2013-09- 23\",\"2014-11-15\",\"2014-05-11\",\"2015-03-27\")), stringsAsFactors = TRUE));"}' \ -u "username:password" \ https://<livy-url>/sessions/<session-number>/statements
-
Livy Session (Shared)
Livy server supports multiple APIs in the same Livy session. After creating a Livy session, you can configure the
kind
option for each statement to use Scala, Python, and R in a single Livy session.The following example shows the use of Scala and Python API in the single Livy session:curl -k \ -X POST \ -H "Content-Type:application/json" \ -d '{"conf":{"spark.kubernetes.container.image":"gcr.io/mapr-252711/<livy-image-for-PySpark"}}' \ -u "username:password" \ https://<livy-url>/sessions curl -k \ -X POST \ -H "Content-Type: application/json" \ -d '{"kind": "spark", "code": "var a = spark.read.csv(\"/user/mapr/passwd\"); a.show();"}' \ -u "username:password" \ https://<livy-url>/sessions/<session-number>/statements curl -k \ -X POST \ -H "Content-Type: application/json" \ -d '{"kind": "pyspark", "code": "sc.parallelize([0, 2, 3, 4, 6], 5).glom().collect();"}' \ -u "username:password" \ https://<livy-url>/sessions/<session-number>/statements
Livy Supports Batch Application
You can submit batch applications in Livy through REST APIs.
Some ready-to-use sample Spark applications built into to the container image. These
applications are located at
/opt/mapr/spark/spark-[version]/jars/spark-examples_[full-version].jar
and should be referenced using the local
schema. You may also build
your own applications and make them available in /opt/mapr/
.
If the Spark application is located elsewhere, then modify the file
field to point to that storage and interface.
For example, the Livy server supports the https://
,
maprfs://
, dtap://
, and S3://
interfaces.
Run the following command to submit Spark applications using Livy batches:
curl -k \
-X POST \
-H "Content-Type:application/json" \
-d '{"className": "org.apache.spark.examples.SparkPi", "file": "local:///opt/mapr/spark/spark-<version>/examples/jars/<spark-examples.jar>"}' \
-u "username:password" \
https:/<livy-url>/batches
kubectl logs -f org-apache-spark-examples-sparkpi-1605535907482-driver -n livytenant curl
https://<livy-url>/batches/0/log | jq
Do not use jar
option to set the dependencies for Livy batch
applications. Set the DataTap JAR using the
spark.driver.extraClassPath
and
spark.executor.extraClassPath
options in
conf
section of Spark application.
curl \
-k \
-s \
-u <user1>:<password> \
-H "Content-Type: application/json" \
-d '{
"file": "dtap://TenantStorage/wordcount.py"
, "args": [
"dtap://TenantStorage/passwd"
]
, "conf":{
"spark.ssl.enabled":"false"
, "spark.hadoop.fs.dtap.impl": "com.bluedata.hadoop.bdfs.Bdfs"
, "spark.hadoop.fs.AbstractFileSystem.dtap.impl": "com.bluedata.hadoop.bdfs.BdAbstractFS"
, "spark.hadoop.fs.dtap.impl.disable.cache": "false"
, "spark.kubernetes.driver.label.hpecp.hpe.com/dtap": "hadoop2-job"
, "spark.kubernetes.executor.label.hpecp.hpe.com/dtap": "hadoop2-job"
, "spark.driver.extraClassPath": "local:///opt/bdfs/bluedata-dtap.jar"
, "spark.executor.extraClassPath": "local:///opt/bdfs/bluedata-dtap.jar"
}
}' \
"https://$NODE_IP:$NODE_PORT/batches" | jq