Notes about using Airflow

To execute Airflow jobs, you must be an AD/LDAP user that is a member of the tenant where Airflow is installed.

Health Checks

If a database failure occurs, the database pod persists the PersistentVolumeClaim (PVC) and cluster meta information. However, Airflow uses SQLAlchemy, and sometimes the Airflow Scheduler pod loses the connection during database pod failures. One way to automate connection checks is to use Scheduler HealthCheck, where the cluster admin or user restarts or creates a trigger if a connection failure occurs.

See Checking Airflow Health Status in the Apache Airflow documentation (link opens an external web page in a new browser tab or window).

Accessing Logs after Removing Pods

In HPE Ezmeral Runtime Enterprise, any completed pod can be automatically removed within some period of time. After a pod is deleted, you can no longer access logs.

Accessing the Web UI

You can access the Airflow UI as follows:
  1. Access the HPE Ezmeral Runtime Enterprise new UI, as described in HPE Ezmeral Runtime Enterprise new UI.
  2. Select Workflow Engine:
    • For HPE Ezmeral ML Ops projects, Workflow Engine is located on the Training and Workflow panel under the Model Building section.
    • For non-HPE Ezmeral ML Ops projects, Workfow Engine is located on the Workflow panel under the Notebook Servers and Workflow section.

You can obtain the FQDN of the Airflow web UI as follows:

kubectl describe svc airflow-https-svc -n <cluster-namespace>

In Annotations, obtain the address of the UI as follows:

mip-bd-ap05-n2-vm05.mip.storage.hpecorp.net:10007

After creating a new Airflow cluster, port numbers of other clusters can be changed. If any issues with the gateway occur, you can port-forward 8080 port of the af-cluster-airflowui-0 pod in the cluster namespace. 

Accessing Data From Outside DAGs with DataTap

See Accessing Data From Outside Airflow DAGs with DataTap.

Run DAGs with SparkKubernetesOperator

See Using Airflow to Schedule Spark Applications