Managing Spark Applications Dependencies

This topic describes how to pass the dependencies to Spark applications in HPE Ezmeral Runtime Enterprise.

You can manage custom Spark dependencies in three different ways:
  • Build the dependencies in main application jar. For example: Use the maven-assembly-plugin. See maven-assembly-plugin.
  • Create a PersistentVolume for dependencies and mount it into driver and executor pods. You can use local schema to reference those dependencies. For example: See PySpark-with-dependencies.yaml.
  • Build the custom images on top of the Spark images provided by HPE Ezmeral Runtime Enterprise. Copy or install the dependencies in your custom images. You can use local schema to reference those dependencies.

Supported Schemas for Main Application File in Spark Applications:

The following schemas are supported for main application file in Spark applications:
  • local
  • dtap
  • s3a
  • maprfs

Supported Schemas for Passing Dependencies

The following schemas are supported for passing dependencies to Spark applications:
  • local
  • s3a(AWS)
  • dtap

Unsupported Schemas for Passing Dependencies

The following schemas are not supported for passing dependencies to Spark applications: