Installing Spark on YARN

Describes how to use package managers to download and install Spark on YARN from the EEP repository.

Prerequisites

Spark is distributed as three separate packages:

Package	Description
`mapr-spark`	Install this package on any node where you want to install Spark. This package is dependent on the following packages: `mapr-client` `mapr-hadoop-client` `mapr-hadoop-util` `mapr-librdkafka`
`mapr-spark-historyserver`	Install this optional package on Spark History Server nodes. This package is dependent on the following packages: `mapr-spark` `mapr-core`
`mapr-spark-thriftserver`	Install this optional package on Spark Thrift Server nodes. This package is available starting in the EEP 4.0 release. This package is dependent on the following packages: `mapr-spark` `mapr-core`

To install Spark on YARN (Hadoop 2), execute the following commands as root or using sudo:

Verify that JDK 11 or later is installed on the node where you want to install Spark.
Create the /apps/spark directory on the cluster file system, and set the correct permissions on the directory:
```
hadoop fs -mkdir /apps/spark
hadoop fs -chmod 777 /apps/spark
```
NOTE
Beginning with EEP 6.2.0, the configure.sh script creates the /apps/spark directory automatically when using the Installer. However, you must manually create this directory when performing a manual installation.

Install the packages:

On Ubuntu

apt-get install mapr-spark mapr-spark-historyserver mapr-spark-thriftserver

On Red Hat / Rocky

dnf install mapr-spark mapr-spark-historyserver mapr-spark-thriftserver

On SLES

zypper install mapr-spark mapr-spark-historyserver mapr-spark-thriftserver

NOTE

The mapr-spark-historyserver and mapr-spark-thriftserver packages are optional.

If you want to integrate Spark with HPE Ezmeral Data Fabric Streams, install the Streams Client on each Spark node:
- On Ubuntu:
```
 apt-get install mapr-kafka
```
- On Red Hat / Rocky:
```
yum install mapr-kafka
```
If you want to use a Streaming Producer, add the spark-streaming-kafka-producer_2.12.jar from the Data Fabric Maven repository to the Spark classpath (/opt/mapr/spark/spark-<versions>/jars/).
For repository-specific information, see Maven Artifacts for the HPE Ezmeral Data Fabric
Run configure.sh -R:
```
/opt/mapr/server/configure.sh -R
```
After installing Spark on YARN but before running your Spark jobs, follow the steps outlined in Configuring Spark on YARN.

HPE Ezmeral Data Fabric – Customer-Managed 7.9.0 Documentation
Abstract	This site contains documentation for the customer-managed platform of the HPE Ezmeral Data Fabric version 7.9.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	April 2025
Edition	7.9.0
Topic last updated	2024-07-02