Deployment Modes

Spark is preconfigured for YARN and does not require any additional configuration to run.

Two deployment modes can be used to launch Spark applications on YARN:

  • In cluster mode, jobs are managed by the YARN cluster. The Spark driver runs inside an Application Master (AM) process that is managed by YARN. This means that the client can go away after initiating the application.
  • In client mode, the Spark driver runs in the client process, and the Application Master is used only to request resources from YARN.

Data Fabric recommends using cluster deployment mode instead of client mode. If the Spark client that runs the job exits after submitting the job, there is no impact on job completion.

Note: In cluster deployment mode, the local directories used by the Spark executors and the Spark driver are the local directories that are configured for YARN (yarn.nodemanager.local-dirs).
NOTE
SPARK_LOCAL_DIRS is ignored when you run Spark on YARN.