Integrate Hue with Spark (Experimental Only)
About this task
NOTE
Spark Notebook is a feature that utilizes the Spark REST Job Server (Livy).
The mapr-livy
package must be installed on a node were the mapr-spark
package is installed or the Livy service will not start.Procedure
-
In the
[spark]
section of thehue.ini
, set thelivy_server_host
parameter to the host where the Livy server is running.[spark] # IP or hostname of livy server. livy_server_url=https://<host>:8998
NOTEIf the Livy server runs on the same node as the Hue UI, you are not required to set this property as the value defaults to the local host. -
Restart Hue.
maprcli node services -name hue -action restart -nodes <hue node>
Results
- If needed, you can use the Control System or
maprcli
to start, stop, or restart the Livy Server. For more information, see Managing Services.
NOTE
Troubleshooting TipIf you have more that one version of Python
installed, you may see the following error when executing Python
samples:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe...
Workaround:
Set the following environment variables in /opt/mapr/spark/spark-<version>/conf/spark-env.sh:
export PYSPARK_PYTHON=/usr/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/usr/bin/python2.7