DataTap Integration on Livy
This topic describes how to integrate DataTap on Livy with REST API and Jupyter Notebook in HPE Ezmeral Runtime Enterprise.
Using DataTap with REST API
- Start Livy session using a
curl
command.curl -k \ -X POST \ -H "Content-Type:application/json" \ -d '{ "conf": { "spark.hadoop.fs.dtap.impl": "com.bluedata.hadoop.bdfs.Bdfs", "spark.hadoop.fs.AbstractFileSystem.dtap.impl": "com.bluedata.hadoop.bdfs.BdAbstractFS", "spark.hadoop.fs.dtap.impl.disable.cache": "false", "spark.kubernetes.driver.label.hpecp.hpe.com/dtap": "hadoop2-job", "spark.kubernetes.executor.label.hpecp.hpe.com/dtap": "hadoop2-job" }, "jars": [ "local:///opt/bdfs/bluedata-dtap.jar" ] }' \ -u "username:password" \ https://xx-xxx-xxx.xx.lab:10075/sessions
NOTEDo not usejar
option to set the dependencies for Livy batch applications.Set the DataTap JAR using thespark.driver.extraClassPath
andspark.executor.extraClassPath
options inconf
section of Livy application. For example: See Submitting Spark Application Using Livy. - Execute a code in Livy session. For example:NOTEYou must have
.csv
file in your DataTap storage before executing a curl command.curl -k \ -X POST \ -H "Content-Type: application/json" \ -d '{ "kind": "spark", "code": "var a = spark.read.csv(\"dtap://TenantStorage/somefile.csv\"); a.show();" }' \ -u "username:password" https://xx-xxx-xxx.xx.lab:10075/sessions/0/statements
Using DataTap with Jupyter Notebook
- Start Livy session in Kubeflow Jupyter Notebook.
- Load the sparkmagic to configure the Livy endpoints in Jupyter
Notebook.
%load_ext sparkmagic.magics
- Run the following magic to add the Livy endpoint and to create a Livy
session.
%manage_spark
Add the following configuration options to properties when creating a Livy session.{ "conf": { "spark.hadoop.fs.dtap.impl": "com.bluedata.hadoop.bdfs.Bdfs", "spark.hadoop.fs.AbstractFileSystem.dtap.impl": "com.bluedata.hadoop.bdfs.BdAbstractFS", "spark.hadoop.fs.dtap.impl.disable.cache": "false", "spark.kubernetes.driver.label.hpecp.hpe.com/dtap": "hadoop2-job", "spark.kubernetes.executor.label.hpecp.hpe.com/dtap": "hadoop2-job" }, "jars": [ "local:///opt/bdfs/bluedata-dtap.jar" ] }
- Load the sparkmagic to configure the Livy endpoints in Jupyter
Notebook.
- Execute a code in Livy session. For example:NOTEYou must have
.csv
file in your DataTap storage before executing a curl command.%%spark var a = spark.read.csv("dtap://TenantStorage/somefile.csv"); a.show();
See About DataTaps for DataTaps descriptions and configuration.