Integrate Hue With Spark
About this task
IMPORTANT
Hue integration with Spark is an experimental feature. Procedure
-
In the
[spark]section of thehue.inifile, set thelivy_server_urlparameters to the host and port where the Livy server is running:[spark] # The Livy Server URL. livy_server_url=https://node10.cluster.com:8998 -
To configure Hue to use Spark modes, modify
livy.conf(vim /opt/mapr/livy/livy-<version>/conf/livy.conf):-
If Spark jobs run on local mode, set the
livy.spark.masterproperty:… # What spark master Livy sessions should use. livy.spark.master = local[*] …. -
If Spark jobs run on YARN mode, set the
livy.spark.masterandlivy.spark.deployModeproperties (client or cluster). For example:…. # What spark master Livy sessions should use. livy.spark.master = yarn # What spark deploy mode Livy sessions should use. livy.spark.deployMode = cluster …. -
If Spark jobs run on Standalone mode, set the
livy.spark.masterproperty. For example:# What spark master Livy sessions should use. livy.spark.master = spark://ubuntu500:7077 -
If Spark jobs run on Mesos mode, set the
livy.spark.masterproperty. For example:# What spark master Livy sessions should use. livy.spark.master = mesos://<mesos-master-node-ip>:5050NOTEIntegration of Spark on Mesos with Hue is not supported in cluster deployment mode.
-
If Spark jobs run on local mode, set the
-
If you want to be able to access Hive through Spark in Hue, configure Spark
with Hive, and set
livy.repl.enableHiveContexttotrueinlivy.conf. For example:... # Whether to enable HiveContext in livy interpreter, if it is true hive-site.xml will be detected # on user request and then livy server classpath automatically. livy.repl.enableHiveContext = true ... -
If you plan to use PySpark, you must set the PYTHONPATH environment variable in
livy-env.sh(/opt/mapr/livy/livy-<version>/conf/livy-env.sh):... export PYTHONPATH=$SPARK_HOME/python/lib/py4j-<version>- src.zip:$SPARK_HOME/python/:$PYTHONPATHFor example:... export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7- src.zip:$SPARK_HOME/python/:$PYTHONPATH -
Ensure that R is installed on the node if you plan to run SparkR. To install R
to run SparkR jobs:
- On Ubuntu
-
sudo apt-get install r-base - On Red Hat / Rocky
-
sudo yum install R
-
Restart the Spark REST Job Server (Livy).
maprcli node services -name livy -action restart -nodes <livy node> -
Restart Hue:
maprcli node services -name hue -action restart -nodes <hue node>