Creating an Airflow Cluster Manually
This procedure describes an alternative method of creating an Airflow Kubernetes cluster. Use this method to perform extra tuning of the Airflow cluster through the command line. However, if no extra tuning is required, use the recommended method described in Creating an Airflow Cluster Automatically.
Prerequisites
-
For system, computation, and storage requirements, see Airflow Requirements.
-
Required access rights: Platform Administrator or Tenant Administrator/Member
- Airflow is enabled on the Kubernetes cluster, as described in Installing Airflow.
About this task
Procedure
-
If your environment has a web proxy, and your HPE Ezmeral Runtime Enterprise
tenant or ML Ops project has Istio Service Mesh enabled, perform
the following:
To allow the
git clone
function in the Airflowgit-sync
container, create an IstioServiceEntry
object with the following web proxy details:cat << EOF | kubectl -n <tenant namespace> apply -f - apiVersion: networking.istio.io/v1alpha3 kind: ServiceEntry metadata: name: proxy spec: hosts: - web-proxy.corp.hpecorp.net # ignored addresses: - 16.85.88.10/32 ports: - number: 8080 name: tcp protocol: TCP location: MESH_EXTERNAL EOF
- On the Kubernetes master node, open the command line.
-
Configure environment variables.
NOTEThese environment variables are set only for this shell, and are only needed during bootstrap script execution. It is not necessary to persist them.
Required environment variable:
AIRFLOW_GIT_REPO_URL
URL of the Git repository for your Directed Acyclic Graphs (DAGs).
For example:
https://github.com/HPEEzmeral/airflow-on-k8s.git
Optional environment variables:
AIRFLOW_CLUSTER_NAMESPACE
- Name of the namespace for AirflowCluster. This namespace should exist on the cluster.
AIRGAP_REGISTRY
- If the environment is air gapped, address of the container registry; for example,
localhost:5000/
(the trailing slash is required). AIRFLOW_GIT_REPO_BRANCH
- The branch of the Git repository that will be used to access DAGs. For example:
ecp-5.5.0
AIRFLOW_GIT_REPO_SUBDIR
- Path to the directory where DAGs are placed in the Git repository.
GIT_PROXY_HTTP
- If Git repository is located outside of the internal network, address of HTTP proxy for git-sync container.
GIT_PROXY_HTTPS
- If Git repository is located outside of the internal network, address of HTTPS proxy for git-sync container.
Default values for environment variables are as follows.
AIRGAP_REGISTRY="" AIRFLOW_CLUSTER_NAMESPACE="default" AIRFLOW_CLUSTER_IMAGE_TAG="ecp-5.5.0-rc1" AIRFLOW_BASE_NAMESPACE="airflow-base" AIRFLOW_GIT_REPO_BRANCH="" #empty string points to main branch of git repo AIRFLOW_GIT_REPO_SUBDIR="" GIT_PROXY_HTTP="" GIT_PROXY_HTTPS=""
- From the following location, clone the repository branch that corresponds to the release of HPE Ezmeral Runtime Enterprise that your environment is running:
-
Install the Airflow cluster using one of the following options:
-
Option 1: Public Git repository shell script
For example:
AIRFLOW_GIT_REPO_URL="https://github.com/HPEEzmeral/airflow-on-k8s.git" \ AIRFLOW_GIT_REPO_SUBDIR="example_dags/" AIRFLOW_GIT_REPO_BRANCH="ecp-5.5.0" \ /bin/sh airflow-on-k8s/bootstrap/airflow-cluster/install.sh
-
Option 2: Private Git repository shell script
-
If the password (or access token) of the Git repository is already stored in secret by key password within the
AIRFLOW_CLUSTER_NAMESPACE
namespace, additionally pass it the name inAIRFLOW_GIT_REPO_CRED_SECRET_NAME
variable and pass the user name inAIRFLOW_GIT_REPO_USER
variable.For example:
AIRFLOW_GIT_REPO_URL="https://github.com/HPEEzmeral/airflow-on-k8s.git" \ AIRFLOW_GIT_REPO_SUBDIR="example_dags/" AIRFLOW_GIT_REPO_BRANCH="ecp-5.5.0" \ AIRFLOW_GIT_REPO_USER="mapr" \ AIRFLOW_GIT_REPO_CRED_SECRET_NAME="secret-with-git-creds" \ /bin/sh airflow-on-k8s/bootstrap/airflow-cluster/install.sh
-
If the password (or access token) is not already stored in secret, pass the user name in the
AIRFLOW_GIT_REPO_USER
variable, then execute the following command. The script generates an appropriate secret, and, after the script runs, passes credentials at the prompt.For example:
AIRFLOW_GIT_REPO_URL="https://github.com/HPEEzmeral/airflow-on-k8s.git" \ AIRFLOW_GIT_REPO_SUBDIR="example_dags/" AIRFLOW_GIT_REPO_BRANCH="ecp-5.5.0" \ AIRFLOW_GIT_REPO_USER="mapr" \ /bin/sh airflow-on-k8s/bootstrap/airflow-cluster/install.sh
-
-