Submitting Spark Application Using Livy

This section guides you through starting Apache Livy session and exceuting a code in a Livy session.This page shows some examples of Livy supporting multiple APIs and Livy batches.

To find out which Livy images to use with installed python packages for PySpark, installed R packages for SparkR, and for basic spark sessions with Scala. See Spark Images.

Start Livy Session

If you are an LDAP/AD user, you can navigate to Kubernetes > Tenants > Applications > Service Endpoints on HPE Ezmeral Runtime Enterprise to find livy-http URL or Access Point and corresponding port.

Run the following commands to submit REST API call to start a Livy session:

curl -k -v \
    -X POST \
    -H "Content-Type: application/json" \
    -d '{}' \
    -u "username:password" \
    https://<livy-url>

Code Execution in a Livy Session

Perform the following steps to execute the code in Livy session:
  1. Run the following command to input some text file into the HPE Ezmeral Data Fabric file system:
    kubectl -n sampletenant exec -it tenantcli-0 -- hadoop fs -put /etc/passwd
  2. Execute the following command to run a Spark job in the Livy session:
curl -k \
    -X POST \
    -H "Content-Type: application/json" \
    -d '{"kind": "spark", "code": "var a = spark.read.csv(\"/user/mapr/passwd\"); a.show();"}' \
    -u "username:password" \
    https://<livy-url>/sessions/<session-number>/statements

Delete Livy Session

Run the following command to delete the Livy session:

curl -k -X DELETE "https://<livy-URL>/sessions/<session-number>"; echo

When you delete a Livy session, Livy server stops the execution of the Spark job created for the current session and both driver and executor pods remain at a Completed state until it is removed by the Kubernetes API.

Livy Session Supports Multiple APIs

The following examples shows that the Livy server supports multiple (Scala, Python, and R) APIs on HPE Ezmeral Runtime Enterprise:
  1. Livy Session (PySpark)

    Run the following commands to submit REST API call to start a Livy session for PySpark:
    curl -k \
        -X POST \
        -H "Content-Type: application/json" \
        -d '{"conf":{"spark.kubernetes.container.image":"gcr.io/mapr-252711/<livy-image-for-PySpark>"},"kind":"pyspark"}' \
        -u "username:password" \
        https://<livy-url>/sessions
    
    Execute the following command to run a Spark job using PySpark:
    curl -k \
        -X POST \
        -H "Content-Type: application/json" \
        -d '{"code": "sc.parallelize([0, 2, 3, 4, 6], 5).glom().collect();"}' \ 
        -u "username:password" \
        https://<livy-url>/sessions/<session-number>/statements
  2. Livy Session (R)

    Run the following commands to submit REST API call to start a Livy session for SparkR:
    curl -k \
        -X POST \
        -H "Content-Type:application/json" \
        -d '{"conf":{"spark.kubernetes.container.image":"gcr.io/mapr-252711/<livy-image-for-SparkR"},"kind":"sparkr"}' \
        -u "username:password" \
        https://<livy-url>/sessions
    Execute the following command to run a Spark job using SparkR:
    curl -k \
        -X POST \
        -H "Content-Type: application/json" \
        -d '{"code": "summary(data.frame( emp_id = c(1:5), emp_name = c(\"Rick\",\"Dan\",\"Michelle\",\"Ryan\",\"Gary\"), salary = c(623.3,515.2,611.0,729.0,843.25), start_date = as.Date(c(\"2012-01-01\",\"2013-09- 23\",\"2014-11-15\",\"2014-05-11\",\"2015-03-27\")), stringsAsFactors = TRUE));"}' \
        -u "username:password" \
        https://<livy-url>/sessions/<session-number>/statements
  3. Livy Session (Shared)

    Livy server supports multiple APIs in the same Livy session. After creating a Livy session, you can configure the kind option for each statement to use Scala, Python, and R in a single Livy session.

    The following example shows the use of Scala and Python API in the single Livy session:
    curl -k \
        -X POST \
        -H "Content-Type:application/json" \
        -d '{"conf":{"spark.kubernetes.container.image":"gcr.io/mapr-252711/<livy-image-for-PySpark"}}' \
        -u "username:password" \
        https://<livy-url>/sessions
    
    curl -k \
        -X POST \
        -H "Content-Type: application/json" \
        -d '{"kind": "spark", "code": "var a = spark.read.csv(\"/user/mapr/passwd\"); a.show();"}' \
        -u "username:password" \
        https://<livy-url>/sessions/<session-number>/statements
    
    curl -k \
        -X POST \
        -H "Content-Type: application/json" \
        -d '{"kind": "pyspark", "code": "sc.parallelize([0, 2, 3, 4, 6], 5).glom().collect();"}' \
        -u "username:password" \
        https://<livy-url>/sessions/<session-number>/statements

Livy Supports Batch Application

You can submit batch applications in Livy through REST APIs.

Some ready-to-use sample Spark applications built into to the container image. These applications are located at /opt/mapr/spark/spark-[version]/jars/spark-examples_[full-version].jar and should be referenced using the local schema. You may also build your own applications and make them available in /opt/mapr/.

If the Spark application is located elsewhere, then modify the file field to point to that storage and interface.

For example, the Livy server supports the https://, maprfs://, dtap://, and S3:// interfaces.

Run the following command to submit Spark applications using Livy batches:

curl -k \
    -X POST \
    -H "Content-Type:application/json" \
    -d '{"className": "org.apache.spark.examples.SparkPi", "file": "local:///opt/mapr/spark/spark-<version>/examples/jars/<spark-examples.jar>"}' \
    -u "username:password" \
    https:/<livy-url>/batches

kubectl logs -f org-apache-spark-examples-sparkpi-1605535907482-driver -n livytenant curl 
https://<livy-url>/batches/0/log | jq
NOTE

Do not use jar option to set the dependencies for Livy batch applications. Set the DataTap JAR using the spark.driver.extraClassPath and spark.executor.extraClassPath options in conf section of Spark application.

For example:
curl \
    -k \
    -s \
    -u <user1>:<password> \
    -H "Content-Type: application/json" \
    -d '{
        "file": "dtap://TenantStorage/wordcount.py"
        , "args": [
            "dtap://TenantStorage/passwd"
        ]
        , "conf":{
            "spark.ssl.enabled":"false"
            , "spark.hadoop.fs.dtap.impl": "com.bluedata.hadoop.bdfs.Bdfs"
            , "spark.hadoop.fs.AbstractFileSystem.dtap.impl": "com.bluedata.hadoop.bdfs.BdAbstractFS"
            , "spark.hadoop.fs.dtap.impl.disable.cache": "false"
            , "spark.kubernetes.driver.label.hpecp.hpe.com/dtap": "hadoop2-job"
            , "spark.kubernetes.executor.label.hpecp.hpe.com/dtap": "hadoop2-job"
            , "spark.driver.extraClassPath": "local:///opt/bdfs/bluedata-dtap.jar"
            , "spark.executor.extraClassPath": "local:///opt/bdfs/bluedata-dtap.jar"
        }
    }' \
    "https://$NODE_IP:$NODE_PORT/batches" | jq