Integrate Hue With Spark

About this task

IMPORTANT
Hue integration with Spark is an experimental feature.

Procedure

  1. In the [spark] section of the hue.ini file, set the livy_server_url parameters to the host and port where the Livy server is running:
    [spark]
      # The Livy Server URL.
      livy_server_url=https://node10.cluster.com:8998
  2. To configure Hue to use Spark modes, modify livy.conf (vim /opt/mapr/livy/livy-<version>/conf/livy.conf):
    1. If Spark jobs run on local mode, set the livy.spark.master property:
      …
      # What spark master Livy sessions should use.
      livy.spark.master = local[*]
      ….
      
    2. If Spark jobs run on YARN mode, set the livy.spark.master and livy.spark.deployMode properties (client or cluster). For example:
      ….
      # What spark master Livy sessions should use.
      livy.spark.master = yarn
      # What spark deploy mode Livy sessions should use.
      livy.spark.deployMode = cluster
      ….
      
    3. If Spark jobs run on Standalone mode, set the livy.spark.master property. For example:
      # What spark master Livy sessions should use.
      livy.spark.master = spark://ubuntu500:7077
    4. If Spark jobs run on Mesos mode, set the livy.spark.master property. For example:
      # What spark master Livy sessions should use.
      livy.spark.master = mesos://<mesos-master-node-ip>:5050 
      NOTE
      Integration of Spark on Mesos with Hue is not supported in cluster deployment mode.
  3. If you want to be able to access Hive through Spark in Hue, configure Spark with Hive, and set livy.repl.enableHiveContext to true in livy.conf. For example:
    ...
    # Whether to enable HiveContext in livy interpreter, if it is true hive-site.xml will be detected
    # on user request and then livy server classpath automatically.
    livy.repl.enableHiveContext = true
    ...
  4. If you plan to use PySpark, you must set the PYTHONPATH environment variable in livy-env.sh (/opt/mapr/livy/livy-<version>/conf/livy-env.sh):
    ...
    export PYTHONPATH=$SPARK_HOME/python/lib/py4j-<version>-
    src.zip:$SPARK_HOME/python/:$PYTHONPATH
    For example:
    ...
    export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-
    src.zip:$SPARK_HOME/python/:$PYTHONPATH
  5. Ensure that R is installed on the node if you plan to run SparkR. To install R to run SparkR jobs:
    On Ubuntu
    sudo apt-get install r-base
    On Red Hat / Rocky
    sudo yum install R
  6. Restart the Spark REST Job Server (Livy).
    maprcli node services -name livy -action restart -nodes <livy node>
  7. Restart Hue:
    maprcli node services -name hue -action restart -nodes <hue node>