Deployment Modes
Spark is preconfigured for YARN and does not require any additional configuration to run.
Two deployment modes can be used to launch Spark applications on YARN:
- In
cluster
mode, jobs are managed by the YARN cluster. The Spark driver runs inside an Application Master (AM) process that is managed by YARN. This means that the client can go away after initiating the application. - In
client
mode, the Spark driver runs in the client process, and the Application Master is used only to request resources from YARN.
Data Fabric recommends using cluster
deployment mode instead of client
mode. If the Spark client that runs the
job exits after submitting the job, there is no impact on job completion.
Note: In
cluster
deployment mode, the local directories used by the
Spark executors and the Spark driver are the local directories that are configured for YARN
(yarn.nodemanager.local-dirs
). NOTE
SPARK_LOCAL_DIRS
is ignored when you run Spark on YARN.