Describes how to use package managers to download and install Spark on YARN from the
EEP repository.
About this task
Spark is distributed as three separate packages:
| Package |
Description |
mapr-spark |
Install this package on any node where you want to install Spark. This package
is dependent on the following packages:
mapr-client
mapr-hadoop-client
mapr-hadoop-util
mapr-librdkafka
|
mapr-spark-historyserver |
Install this optional package on Spark History Server nodes. This package is
dependent on the following packages: |
mapr-spark-thriftserver |
Install this optional package on Spark Thrift Server nodes. This package is
available starting in the EEP 4.0 release.
This package is dependent on the following packages: |
To install Spark on YARN (Hadoop 2), execute the following commands as
root or using sudo:
Procedure
-
Verify that JDK 11 or later is installed on the node where you want to
install Spark.
-
Create the
/apps/spark directory on the cluster file system, and set
the correct permissions on the directory:
hadoop fs -mkdir /apps/spark
hadoop fs -chmod 777 /apps/spark
NOTE
Beginning with
EEP 6.2.0, the
configure.sh script creates the
/apps/spark
directory automatically when using the Installer. However, you must manually create this
directory when performing a manual installation.
-
Install the packages:
- On Ubuntu
-
apt-get install mapr-spark mapr-spark-historyserver mapr-spark-thriftserver
- On Red Hat / Rocky
-
dnf install mapr-spark mapr-spark-historyserver mapr-spark-thriftserver
- On SLES
-
zypper install mapr-spark mapr-spark-historyserver mapr-spark-thriftserver
NOTE
The
mapr-spark-historyserver and
mapr-spark-thriftserver packages are optional.
-
If you want to integrate Spark with HPE Data Fabric Streams,
install the Streams Client on each Spark node:
-
If you want to use a Streaming Producer, add the
spark-streaming-kafka-producer_2.12.jar from the Data Fabric Maven repository to the Spark
classpath (/opt/mapr/spark/spark-<versions>/jars/).
-
Run
configure.sh -R:
/opt/mapr/server/configure.sh -R
-
After installing Spark on YARN but before running your Spark jobs, follow the steps
outlined in Configuring Spark on YARN.