Apache Airflow
This topic provides an overview of Apache Airflow on HPE Ezmeral Data Fabric.
Starting from EEP 8.1.0, HPE Ezmeral Data Fabric supports Apache Airflow on core 6.2.x and core 7.0.0.
You can use Airflow to author, schedule, or monitor workflows or data pipelines.
The following image shows the Apache Airflow workflow:
A workflow is a Directed Acyclic Graph (DAG) of tasks used to handle big data processing pipelines. The workflows are started on a schedule or triggered by an event. DAGs define the order to run the tasks or rerun in case of failures. The tasks define the actions to be performed, like ingest, monitor, report, and others.
Airflow Architecture
The following image shows the Apache Airflow Architecture:
Airflow Components
Airflow consists of the following components:
- Scheduler
- Triggers the scheduled workflows and submits the tasks to an executor to run.
- Executor
- Executes the tasks or delegates the tasks to workers for execution.
- Worker
- Executes the tasks.
- Web Server
- Provides a user interface to analyze, schedule, monitor, and visualize the tasks and DAG. The Web Server enables you to manage users, roles, and set configuration options.
- DAG Directory
- Contains DAG files read by Scheduler, Executor, and Web Server.
- Metadata Database
- Stores the metadata about DAGs’ state, runs, and Airflow configuration options.
To learn more about Airflow, see Airflow Concepts.