Configuring a Remote Database for Airflow

This topic describes how to configure a remote database for Airflow on the HPE Data Fabric.

About this task

Airflow uses SQLAlchemy to connect to the metadata database. The metadata database stores the information about Airflow configurations, user information, roles and policies, and statistics of each DAG state, run, and task.

You can configure databases supported by SQLAlchemy to host Airflow metadata. The most common databases are:
  1. Postgres
  2. MySQL
  3. SQLite

While SQLite is the default database on Apache Airflow, the Airflow community recommends Postgres for most use cases. For more details, see Set up a Database Backend.

For a list of the supported databases, see Choosing database backend.

Procedure

  1. Configure the remote database for Airflow. See Set up a Database Backend.
  2. Update <AIRFLOW-HOME>/conf/airflow.cfg with your SQLAlchemy connection string.
    • For PostgreSQL, update <AIRFLOW-HOME>/conf/airflow.cfg with:

      postgresql+psycopg2://<user>:<password>@<host>/<db>

    • For MySQL, update <AIRFLOW-HOME>/conf/airflow.cfg with:

      mysql+mysqldb://<airflow-user>:<airflow-password>@<host>[:<port>]/<airflow-dbname>

      To connect to MySQL from Airflow, install the mysqlclient.
      1. Run . <airflow_home>/build/env/bin/activate
      2. Run pip install mysqlclient==2.2.0
      3. Run deactivate
  3. Create the database schema using the steps that apply to the currently installed Ecosystem Pack. To identify the Ecosystem Pack that is installed, see Checking the EEP Version:
    • For DEP 10.0.0 and later, use the airflow-admin command.
      1. Run the airflow-admin db migrate command to initialize the database:
        airflow-admin db migrate
      2. Run the airflow-admin connections create-default-connections command to create default connections:
        airflow-admin connections create-default-connections
    • For EEP 9.2.x through EEP 9.4.x, use the airflow command.
      1. Run the airflow db migrate command to initialize the database:
        airflow db migrate
      2. Run the airflow connections create-default-connections command to create default connections:
        airflow connections create-default-connections
    • EEP 9.1.x and earlier, use the airflow command:
      airflow db init
  4. After the database configuration completes, create a user by following the steps in the Command Line Interface and Environment Variables Reference. For example:
    airflow-admin users create --username mapr --firstname mapr --lastname mapr -p mapr --role Admin --email admin@example.org
    Or, for EEP 9.4.x and earlier, use the airflow command:
    airflow users create --username mapr --firstname mapr --lastname mapr -p mapr --role Admin --email admin@example.org
  5. Restart Airflow services as described in Starting, Stopping, and Restarting Airflow Services.