Creating an Airflow Cluster Manually

This procedure describes an alternative method of creating an Airflow Kubernetes cluster. Use this method to perform extra tuning of the Airflow cluster through the command line. However, if no extra tuning is required, use the recommended method described in Creating an Airflow Cluster Automatically.

Prerequisites

  • For system, computation, and storage requirements, see Airflow Requirements.

  • Required access rights: Platform Administrator or Tenant Administrator/Member

  • Airflow is enabled on the Kubernetes cluster, as described in Installing Airflow.

About this task

Use this method to perform extra tuning of your Airflow cluster. For example, if you are using a proxy server that requires authentication.

Procedure

  1. If your environment has a web proxy, and your HPE Ezmeral Runtime Enterprise tenant or ML Ops project has Istio Service Mesh enabled, perform the following:
    To allow the git clone function in the Airflow git-sync container, create an Istio ServiceEntry object with the following web proxy details:
    cat << EOF | kubectl -n <tenant namespace> apply -f - 
    apiVersion: networking.istio.io/v1alpha3 
    kind: ServiceEntry
    metadata:
      name: proxy
    spec:
      hosts:
      - web-proxy.corp.hpecorp.net # ignored
      addresses:
      - 16.85.88.10/32
      ports:
      - number: 8080
        name: tcp
        protocol: TCP
      location: MESH_EXTERNAL
    EOF
    
  2. On the Kubernetes master node, open the command line.
  3. Configure environment variables.
    NOTE
    These environment variables are set only for this shell, and are only needed during bootstrap script execution. It is not necessary to persist them.

    Required environment variable:

    AIRFLOW_GIT_REPO_URL

    URL of the Git repository for your Directed Acyclic Graphs (DAGs).

    For example:

    https://github.com/HPEEzmeral/airflow-on-k8s.git

    Optional environment variables:

    AIRFLOW_CLUSTER_NAMESPACE
    Name of the namespace for AirflowCluster. This namespace should exist on the cluster.
    AIRGAP_REGISTRY
    If the environment is air gapped, address of the container registry; for example, localhost:5000/ (the trailing slash is required).
    AIRFLOW_GIT_REPO_BRANCH
    The branch of the Git repository that will be used to access DAGs. For example: ecp-5.5.0
    AIRFLOW_GIT_REPO_SUBDIR
    Path to the directory where DAGs are placed in the Git repository.
    GIT_PROXY_HTTP
    If Git repository is located outside of the internal network, address of HTTP proxy for git-sync container.
    GIT_PROXY_HTTPS
    If Git repository is located outside of the internal network, address of HTTPS proxy for git-sync container.

    Default values for environment variables are as follows.

    AIRGAP_REGISTRY=""
    AIRFLOW_CLUSTER_NAMESPACE="default"
    AIRFLOW_CLUSTER_IMAGE_TAG="ecp-5.5.0-rc1"
    AIRFLOW_BASE_NAMESPACE="airflow-base"
    AIRFLOW_GIT_REPO_BRANCH="" #empty string points to main branch of git repo
    AIRFLOW_GIT_REPO_SUBDIR=""
    GIT_PROXY_HTTP=""
    GIT_PROXY_HTTPS=""
  4. From the following location, clone the repository branch that corresponds to the release of HPE Ezmeral Runtime Enterprise that your environment is running:
  5. Install the Airflow cluster using one of the following options:
    • Option 1: Public Git repository shell script

      For example:

      AIRFLOW_GIT_REPO_URL="https://github.com/HPEEzmeral/airflow-on-k8s.git" \
      AIRFLOW_GIT_REPO_SUBDIR="example_dags/" AIRFLOW_GIT_REPO_BRANCH="ecp-5.5.0" \
      /bin/sh airflow-on-k8s/bootstrap/airflow-cluster/install.sh
    • Option 2: Private Git repository shell script

      • If the password (or access token) of the Git repository is already stored in secret by key password within the AIRFLOW_CLUSTER_NAMESPACE namespace, additionally pass it the name in AIRFLOW_GIT_REPO_CRED_SECRET_NAME variable and pass the user name in AIRFLOW_GIT_REPO_USER variable.

        For example:

        AIRFLOW_GIT_REPO_URL="https://github.com/HPEEzmeral/airflow-on-k8s.git" \
        AIRFLOW_GIT_REPO_SUBDIR="example_dags/" AIRFLOW_GIT_REPO_BRANCH="ecp-5.5.0" \
        AIRFLOW_GIT_REPO_USER="mapr" \
        AIRFLOW_GIT_REPO_CRED_SECRET_NAME="secret-with-git-creds" \
        /bin/sh airflow-on-k8s/bootstrap/airflow-cluster/install.sh
      • If the password (or access token) is not already stored in secret, pass the user name in the AIRFLOW_GIT_REPO_USER variable, then execute the following command. The script generates an appropriate secret, and, after the script runs, passes credentials at the prompt.

        For example:

        AIRFLOW_GIT_REPO_URL="https://github.com/HPEEzmeral/airflow-on-k8s.git" \
        AIRFLOW_GIT_REPO_SUBDIR="example_dags/" AIRFLOW_GIT_REPO_BRANCH="ecp-5.5.0" \
        AIRFLOW_GIT_REPO_USER="mapr" \
        /bin/sh airflow-on-k8s/bootstrap/airflow-cluster/install.sh