Creating an Airflow Cluster Automatically

Describes how to create an Airflow Kubernetes cluster from a Git repository through the HPE Ezmeral Runtime Enterprise UI. This is the recommended method of Airflow cluster creation.

Prerequisites

  • For system, computation, and storage requirements, see Airflow Requirements.

  • Required access rights: Platform Administrator or Tenant Administrator/Member

  • Airflow is enabled on the Kubernetes cluster, as described in Installing Airflow.

About this task

NOTE HPE Ezmeral Runtime Enterprise does not allow the creation of source control with proxy servers that require authentication. In this case, install Airflow on the Kubernetes cluster with bootstrap scripts. For more information, see Installing Airflow.

Procedure

  1. Perform one of the following:
    • If you are creating an Airflow cluster in an HPE Ezmeral ML Ops project:

      Create a new tenant with the ML Ops Project check box selected. Alternatively, select the ML Ops Project check box on an existing tenant.

    • If you are creating an Airflow cluster for Spark in a non-HPE Ezmeral ML Ops project:

      Access the HPE Ezmeral Runtime Enterprise new UI, as described in Submitting and Managing Spark Applications Using HPE Ezmeral Runtime Enterprise new UI.

      On the Home page of the new UI select View All on the Projects panel. The Projects screen opens. Select the name of your project.

  2. If your environment has a web proxy, and your HPE Ezmeral Runtime Enterprise tenant or ML Ops project has Istio Service Mesh enabled, perform the following:
    To allow the git clone function in the Airflow git-sync container, create an Istio ServiceEntry object with the following web proxy details:
    cat << EOF | kubectl -n <tenant namespace> apply -f - 
    apiVersion: networking.istio.io/v1alpha3 
    kind: ServiceEntry
    metadata:
      name: proxy
    spec:
      hosts:
      - web-proxy.corp.hpecorp.net # ignored
      addresses:
      - 16.85.88.10/32
      ports:
      - number: 8080
        name: tcp
        protocol: TCP
      location: MESH_EXTERNAL
    EOF
    
  3. Log in to HPE Ezmeral Runtime Enterprise as a Tenant Administrator to create Source Control templates. If you already have Source Control templates available, you can log in to HPE Ezmeral Runtime Enterprise as a Project Member.
  4. Select the ML Workbench tab. The HPE Ezmeral Runtime Enterprise new UI opens on the Overview tab of the Project details screen in a new browser tab.
  5. On the Source Control Configurations pane, click the name of a tenant or click View All. The Source Control Configurations screen opens.
  6. Click the Add Source Control Configuration button. The Create Source Control Configuration form opens.
  7. In the form, fill the required fields as follows:
    • Name: Enter the string airflow-cluster-dags-repo. This source control will create a new Airflow cluster instance in this tenant.
    • Configuration Type:
      NOTE You must log in to HPE Ezmeral Runtime Enterprise as a Tenant Administrator to create Templates.

      If you are using a public Git repository, select Template.

      If you are using a private Git repository, create a Template with the name airflow-cluster-dags-repo-template. Then, create an Instance with the name airflow-cluster-dags-repo, and the airflow-cluster-dags-repo-template Source Control as its template.

    • Repository URL: Enter the public or private Git repository where your DAGs are stored.
    • Branch: Enter the name of the branch in the Git repository that you want to use.
    • Working Directory: Enter the path to the directory where DAGs are located in the Git repository.
  8. If Git is accessible behind a proxy, select the Configure Proxy Settings check box, and fill in the following fields:
    • Proxy Protocol: The protocol of the proxy (http or https).
    • Proxy Host: The hostname (FQDN) of the proxy server.
    • Proxy Port: The port of the proxy server.
  9. If the Git repository is private, and you have selected Configuration Type as Instance, fill in the following fields:
    • Username: The username of the user with access to the repository.
    • Email: The email of the user with access to the repository.
    • Token/Password: The token or password of the user with access to the repository.
  10. After filling in all necessary fields, click Submit. Wait for about 5 to 10 minutes.
  11. Reload the page and return to the Tenant details page. The Workflow Engine link appears in the Training and Workflow area.