Configure Spark JAR Location

About this task

By default, Spark on YARN uses Spark JAR files that are installed locally. The Spark JAR files can also be added to a world-readable location on file system. When you add the JAR files to a world-readable location, YARN can cache them on nodes to avoid distributing them each time an application runs. Complete the following steps to add the Spark JAR files to a world-readable locaton on file system:

Procedure

  1. Create a zip archive containing all the JARs from the SPARK_HOME/jars directory. For example:
    cd /opt/mapr/spark/spark-<version>/jars/
    zip /opt/mapr/spark/spark-<version>/spark-jars.zip ./*
  2. Copy the zip file from the local filesystem to a world-readable location on file system. You can upload it to the home of the current user:
    hadoop fs -put /opt/mapr/spark/spark-<version>/spark-jars.zip

    For example:

    hadoop fs -put /opt/mapr/spark/spark-3.2.0/spark-jars.zip /user/mapr/
  3. Set the spark.yarn.archive property in the spark-defaults.conf file located in /opt/mapr/spark/spark-<version>/conf/spark-defaults.conf to point to the world-readable location where you added the zip file. Apply this setting on the node where you will be submitting your Spark jobs.
    spark.yarn.archive maprfs:///<path to zip>

    For example:

    spark.yarn.archive maprfs:///user/mapr/spark-jars.zip