Installing Spark Standalone

This topic describes how to use package managers to download and install Spark Standalone from the EEP repository.

Prerequisites

To set up the EEP repository, see Step 11: Install Ecosystem Components Manually.

About this task

Spark is distributed as four separate packages:

Package	Description
`mapr-spark`	Install this package on any node where you want to install Spark. This package is dependent on the `mapr-client`, `mapr-hadoop-client`, `mapr-hadoop-util`, and `mapr-librdkafka` packages.
`mapr-spark-master`	Install this package on Spark master nodes. Spark master nodes must be able to communicate with Spark worker nodes over SSH without using passwords. This package is dependent on the `mapr-spark` and the `mapr-core` packages.
`mapr-spark-historyserver`	Install this optional package on Spark History Server nodes. This package is dependent on the `mapr-spark` and `mapr-core` packages.
`mapr-spark-thriftserver`	Install this optional package on Spark Thrift Server nodes. This package is available starting in the EEP 4.0 release. It is dependent on the `mapr‍-‍spark` and `mapr‍-‍core` packages.

Run the following commands as root or using sudo.

Procedure

Create the /apps/spark directory on the cluster filesystem, and set the correct permissions on the directory.
```
hadoop fs -mkdir /apps/spark
hadoop fs -chmod 777 /apps/spark
```
NOTE
Beginning with EEP 6.2.0, the configure.sh script creates the /apps/spark directory automatically.

Install Spark using the appropriate commands for your operating system:

On CentOS 8.x / Red Hat 8.x

dnf install mapr-spark mapr-spark-master mapr-spark-historyserver mapr-spark-thriftserver

On Ubuntu

apt-get install mapr-spark mapr-spark-master mapr-spark-historyserver mapr-spark-thriftserver

On SLES

zypper install mapr-spark mapr-spark-master mapr-spark-historyserver mapr-spark-thriftserver

NOTE

The mapr-spark-historyserver, mapr-spark-master, and mapr-spark-thriftserver packages are optional.

Spark is installed into the /opt/mapr/spark directory.

For Spark 2.x:
Copy the /opt/mapr/spark/spark-<version>/conf/slaves.template into /opt/mapr/spark/spark-<version>/conf/slaves, and add the hostnames of the Spark worker nodes. Put one worker node hostname on each line.

For Spark 3.x:

Copy the /opt/mapr/spark/spark-<version>/conf/workers.template into /opt/mapr/spark/spark-<version>/conf/workers, and add the hostnames of the Spark worker nodes. Put one worker node hostname on each line.
For example:
```
localhost
worker-node-1
worker-node-2
```
Set up passwordless ssh for the mapr user such that the Spark master node has access to all secondary nodes defined in the conf/slaves file for Spark 2.x and conf/workers file for Spark 3.x.
As the mapr user, start the worker nodes by running the following command in the master node. Since the Master daemon is managed by the Warden daemon, do not use the start-all.sh or stop-all.sh command.
For Spark 2.x:
```
/opt/mapr/spark/spark-<version>/sbin/start-slaves.sh
```
For Spark 3.x:
```
/opt/mapr/spark/spark-<version>/sbin/start-workers.sh
```
If you want to integrate Spark with HPE Ezmeral Data Fabric Streams, install the Streams Client on each Spark node:
- On Ubuntu:
```
 apt-get install mapr-kafka
```
- On RedHat/CentOS:
```
yum install mapr-kafka
```
If you want to use a Streaming Producer, add the spark-streaming-kafka-producer_2.12.jar from the HPE Ezmeral Data Fabric Maven repository to the Spark classpath (/opt/mapr/spark/spark-<versions>/jars/).
After installing Spark Standalone but before running your Spark jobs, follow the steps outlined at Configuring Spark Standalone.

HPE Ezmeral Data Fabric – Customer-Managed 7.9.0 Documentation
Abstract	This site contains documentation for the customer-managed platform of the HPE Ezmeral Data Fabric version 7.9.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	April 2025
Edition	7.9.0
Topic last updated	2022-01-10