Integrate Hue with Spark (Experimental Only)

About this task

You can configure Hue to use the Spark Notebook UI. This allows users to submit Spark jobs from Hue.
NOTE
Spark Notebook is a feature that utilizes the Spark REST Job Server (Livy). The mapr-livy package must be installed on a node were the mapr-spark package is installed or the Livy service will not start.

Procedure

  1. In the [spark] section of the hue.ini, set the livy_server_host parameter to the host where the Livy server is running.
    [spark]
    # IP or hostname of livy server.
    livy_server_url=https://<host>:8998
    NOTE
    If the Livy server runs on the same node as the Hue UI, you are not required to set this property as the value defaults to the local host.
  2. Restart Hue.
    maprcli node services -name hue -action restart -nodes <hue node>

Results

Additional Information
  • If needed, you can use the Control System or maprcli to start, stop, or restart the Livy Server. For more information, see Managing Services.
NOTE
Troubleshooting Tip
If you have more that one version of Python installed, you may see the following error when executing Python samples:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe...

Workaround:

Set the following environment variables in /opt/mapr/spark/spark-<version>/conf/spark-env.sh:

export PYSPARK_PYTHON=/usr/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/usr/bin/python2.7