Interface to Spark

Airflow provides an interface to Spark by using the Providers.

See Provider examples of Spark at <airflow_home>/build/env/lib/python3.9/site-packages/airflow/providers/ezmeral/spark/example_dags/.

Hooks

EzSparkSubmitHook: Python path: airflow.providers.ezmeral.spark.hooks.ezspark_submit; Description: Launches Spark applications. This hook is a wrapper around the spark-submit binary to run a spark-submit job.
EzSparkSqlHook: Python path: airflow.providers.ezmeral.spark.hooks.ezspark_sql; Description: Enables interaction with binary tables through spark-sql. This hook is a wrapper around the spark-sql binary.
EzSparkJDBCHook: Python path: airflow.providers.ezmeral.spark.hooks.ezspark_jdbc; Description: Enables data transfers between JDBC databases and Apache Spark.

Operators

EzSparkSubmitOperator: Python path: airflow.providers.ezmeral.spark.operators.ezspark_submit; Description: This operator is a wrapper around the spark-submit binary to run a spark-submit job.
EzSparkSqlOperator: Python path: airflow.providers.ezmeral.spark.operators.ezspark_sql; Description: Executes Spark SQL query.
EzSparkJDBCOperator: Python path: airflow.providers.ezmeral.spark.operators.ezspark_jdbc; Description: Extends the SparkSubmitOperator to enable data transfers between JDBC databases and Apache Spark.

Spark binary path requirement

As of EEP release 9.1.0, you must define the fully-qualified path to the spark-submit binary when configuring the Airflow Spark connection. You can do this by adding the Spark binary path of the spark_default connection to the extra JSON field:

"spark-binary": "/opt/mapr/spark/spark-<spark_version>/bin/spark-submit"

If you need to overwrite this path, you can do so by adding the path to the DAG parameters:

spark_submit = SparkSubmitOperator(
        task_id='run_spark_job',
        application='/PATH/spark_application.jar',
        conn_id='spark_default',
        spark_binary='/opt/mapr/spark/spark-<spark_version>/bin/spark-submit',

Note that either method of specifying the full path name as described above (defining spark-binary in the extra JSON field or defining spark_binary as a parameter of spark_submit) must reference the correct fully-qualified path name of the spark-submit binary on the Airflow node. The path name could be different depending on which version of Spark is installed.

HPE Data Fabric 8.0.0 Software Documentation
Abstract	This site contains documentation for HPE Data Fabric Software version 8.0.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	March 2026
Edition	8.0.0
Topic last updated	2025-09-03