Configure Scratch Directory for Spark Standalone
By default, Spark uses
the /tmp
directory as scratch space. Map output
files and RDDs are stored in the scratch directory. To use a
different directory, or a comma-separated list of multiple
directories,
set SPARK_LOCAL_DIRS
to
the path to the new directory by adding the following line to
the $SPARK_HOME/conf/spark-env.sh
file:
export SPARK_LOCAL_DIRS=$SPARK_HOME/<path to scratch directory>
Make this change before starting the Spark services.
Community Edition (Without NFS Support)
Reserve space on your local disk to use as the scratch directory for Spark.
Enterprise Edition and Enterprise Database Edition (With NFS Support)
Create a local volume on each node with the
maprcli volume create
command, or from the Control System. Mount that local volume with NFS to a directory. Set that
directory as the scratch directory for Spark.NOTE
Due to https://issues.apache.org/jira/browse/SPARK-6313, make sure to set
spark.files.useFetchCache=false
in your
spark-defaults.conf
file.