DataTap Integration on Livy

This topic describes how to integrate DataTap on Livy with REST API and Jupyter Notebook in HPE Ezmeral Runtime Enterprise.

Using DataTap with REST API

  1. Start Livy session using a curl command.
    curl -k \
        -X POST \
        -H "Content-Type:application/json" \
        -d '{
                "conf": {
                    "spark.hadoop.fs.dtap.impl": "com.bluedata.hadoop.bdfs.Bdfs",
                    "spark.hadoop.fs.AbstractFileSystem.dtap.impl": "com.bluedata.hadoop.bdfs.BdAbstractFS",
                    "spark.hadoop.fs.dtap.impl.disable.cache": "false",
                    "spark.kubernetes.driver.label.hpecp.hpe.com/dtap": "hadoop2-job",
                    "spark.kubernetes.executor.label.hpecp.hpe.com/dtap": "hadoop2-job"
                },
                "jars": [
                    "local:///opt/bdfs/bluedata-dtap.jar"
                ]
            }' \
        -u "username:password" \
        https://xx-xxx-xxx.xx.lab:10075/sessions
    
    NOTE Do not use jar option to set the dependencies for Livy batch applications.Set the DataTap JAR using the spark.driver.extraClassPath and spark.executor.extraClassPath options in conf section of Livy application. For example: See Submitting Spark Application Using Livy.
  2. Execute a code in Livy session. For example:
    NOTE You must have .csv file in your DataTap storage before executing a curl command.
    curl -k \
        -X POST \
        -H "Content-Type: application/json" \
        -d '{
                "kind": "spark",
                "code": "var a = spark.read.csv(\"dtap://TenantStorage/somefile.csv\"); a.show();"
            }' \
        -u "username:password"
        https://xx-xxx-xxx.xx.lab:10075/sessions/0/statements
    

Using DataTap with Jupyter Notebook

  1. Start Livy session in Kubeflow Jupyter Notebook.
    1. Load the sparkmagic to configure the Livy endpoints in Jupyter Notebook.
      %load_ext sparkmagic.magics
    2. Run the following magic to add the Livy endpoint and to create a Livy session.
      %manage_spark
      Add the following configuration options to properties when creating a Livy session.
      {
          "conf": {	
              "spark.hadoop.fs.dtap.impl": "com.bluedata.hadoop.bdfs.Bdfs",
              "spark.hadoop.fs.AbstractFileSystem.dtap.impl": "com.bluedata.hadoop.bdfs.BdAbstractFS",
              "spark.hadoop.fs.dtap.impl.disable.cache": "false",
              "spark.kubernetes.driver.label.hpecp.hpe.com/dtap": "hadoop2-job",
              "spark.kubernetes.executor.label.hpecp.hpe.com/dtap": "hadoop2-job"
          },
          "jars": [
              "local:///opt/bdfs/bluedata-dtap.jar"
          ]
      }
      
  2. Execute a code in Livy session. For example:
    NOTE You must have .csv file in your DataTap storage before executing a curl command.
    %%spark
    
    var a = spark.read.csv("dtap://TenantStorage/somefile.csv");
    a.show();
    

See About DataTaps for DataTaps descriptions and configuration.