DataTap Integration on Livy

This topic describes how to integrate DataTap on Livy with REST API and Jupyter Notebook in HPE Ezmeral Runtime Enterprise.

Using DataTap with REST API

Start Livy session using a curl command.

curl -k \
    -X POST \
    -H "Content-Type:application/json" \
    -d '{
            "conf": {
                "spark.hadoop.fs.dtap.impl": "com.bluedata.hadoop.bdfs.Bdfs",
                "spark.hadoop.fs.AbstractFileSystem.dtap.impl": "com.bluedata.hadoop.bdfs.BdAbstractFS",
                "spark.hadoop.fs.dtap.impl.disable.cache": "false",
                "spark.kubernetes.driver.label.hpecp.hpe.com/dtap": "hadoop2-job",
                "spark.kubernetes.executor.label.hpecp.hpe.com/dtap": "hadoop2-job"
            },
            "jars": [
                "local:///opt/bdfs/bluedata-dtap.jar"
            ]
        }' \
    -u "username:password" \
    https://xx-xxx-xxx.xx.lab:10075/sessions

NOTE

Do not use jar option to set the dependencies for Livy batch applications.Set the DataTap JAR using the spark.driver.extraClassPath and spark.executor.extraClassPath options in conf section of Livy application. For example: See Submitting Spark Application Using Livy.

Execute a code in Livy session. For example:

NOTE

You must have .csv file in your DataTap storage before executing a curl command.

curl -k \
    -X POST \
    -H "Content-Type: application/json" \
    -d '{
            "kind": "spark",
            "code": "var a = spark.read.csv(\"dtap://TenantStorage/somefile.csv\"); a.show();"
        }' \
    -u "username:password"
    https://xx-xxx-xxx.xx.lab:10075/sessions/0/statements

Using DataTap with Jupyter Notebook

Start Livy session in Kubeflow Jupyter Notebook.

Load the sparkmagic to configure the Livy endpoints in Jupyter Notebook.
```
%load_ext sparkmagic.magics
```

Run the following magic to add the Livy endpoint and to create a Livy session.

%manage_spark

Add the following configuration options to properties when creating a Livy session.

{
    "conf": {	
        "spark.hadoop.fs.dtap.impl": "com.bluedata.hadoop.bdfs.Bdfs",
        "spark.hadoop.fs.AbstractFileSystem.dtap.impl": "com.bluedata.hadoop.bdfs.BdAbstractFS",
        "spark.hadoop.fs.dtap.impl.disable.cache": "false",
        "spark.kubernetes.driver.label.hpecp.hpe.com/dtap": "hadoop2-job",
        "spark.kubernetes.executor.label.hpecp.hpe.com/dtap": "hadoop2-job"
    },
    "jars": [
        "local:///opt/bdfs/bluedata-dtap.jar"
    ]
}

Execute a code in Livy session. For example:
NOTE
You must have .csv file in your DataTap storage before executing a curl command.
```
%%spark

var a = spark.read.csv("dtap://TenantStorage/somefile.csv");
a.show();
```

See About DataTaps for DataTaps descriptions and configuration.

HPE Ezmeral Runtime Enterprise 5.6 Documentation
Abstract	HPE Ezmeral Container Platform is a unified container platform built on open source Kubernetes and designed for both cloud-native applications and non-cloud-native applications running on any infrastructure either on-premises, in multiple public clouds, in a hybrid model, or at the edge.
Published	July 2024
Edition	5.6.0