Installing Spark on YARN
This topic describes how to use package managers to download and install Spark on YARN from the EEP repository.
Prerequisites
About this task
Spark is distributed as three separate packages:
Package | Description |
---|---|
mapr-spark |
Install this package on any node where you want
to install Spark. This package is dependent on the mapr-client ,
mapr-hadoop-client , mapr-hadoop-util , and
mapr-librdkafka packages. |
mapr-spark-historyserver |
Install this optional package on Spark History Server nodes. This package is
dependent on the mapr-spark and mapr-core
packages. |
mapr-spark-thriftserver |
Install this optional package on Spark Thrift Server nodes. This package is
available starting in the EEP 4.0
release. It is dependent on the |
To install Spark on YARN (Hadoop 2), execute the following commands as
root
or using sudo:
Procedure
- Verify that JDK 11 or later is installed on the node where you want to install Spark.
-
Create the
/apps/spark
directory on the cluster filesystem, and set the correct permissions on the directory:hadoop fs -mkdir /apps/spark hadoop fs -chmod 777 /apps/spark
NOTE: Beginning with EEP 6.2.0, theconfigure.sh
script creates the/apps/spark
directory automatically when using the Installer. However, you must manually create this directory when performing a manual installation. -
Install the packages:
- On Ubuntu
-
apt-get install mapr-spark mapr-spark-historyserver mapr-spark-thriftserver
- On CentOS 8.x / Red Hat 8.x
-
dnf install mapr-spark mapr-spark-historyserver mapr-spark-thriftserver
- On SLES
-
zypper install mapr-spark mapr-spark-historyserver mapr-spark-thriftserver
NOTE: Themapr-spark-historyserver
andmapr-spark-thriftserver
packages are optional. -
If you want to integrate Spark with HPE Ezmeral Data Fabric Streams,
install the Streams Client on each Spark node:
- On Ubuntu:
apt-get install mapr-kafka
- On CentOS / Red Hat:
yum install mapr-kafka
- On Ubuntu:
-
If you want to use a Streaming Producer, add the
spark-streaming-kafka-producer_2.12.jar
from the data-fabric Maven repository to the Spark classpath (/opt/mapr/spark/spark-<versions>/jars/
).For repository-specific information, see Maven Artifacts for the HPE Ezmeral Data Fabric - After installing Spark on YARN but before running your Spark jobs, follow the steps outlined at Configuring Spark on YARN.