Using whylogs with Airflow

Describes how to use whylogs with Airflow DAGs.

Prerequisites

Sign in to HPE Ezmeral Unified Analytics Software as a member.

About this task

In HPE Ezmeral Unified Analytics Software, whylogs is integrated to work with Airflow DAGs. You can use whylogs with Airflow to profile and monitor the data and detect drifts as data flows through the data pipelines.

To use whylogs with Airflow DAGs, refer to the Airflow DAG example in GitHub. The basic steps are outlined as follows:
  1. Import the required libraries and modules from whylogs in your Airflow DAG script. You can use notebooks to create your Airflow DAG. To learn about notebooks, see Creating and Managing Notebook Servers.
  2. Define your Airflow DAG that can profile and monitor the data to detect drifts.
  3. Add your DAG to the Git repository.
    NOTE
    If you do not have the repository to store Airflow DAGs, request an administrator to configure the Git repository now. For details, see Airflow DAGs Git Repository.
  4. Navigate to the Airflow screen using either of the following methods:
    • Click Data Engineering > Airflow Pipelines.
    • Click Tools & Frameworks, select the Data Engineering tab, and click Open in the Airflow tile.
  5. In Airflow, verify that you are on the DAGs screen and your defined DAG is available in the DAGs screen.
  6. To run your DAG, click the play button.
  7. Once your DAG run completes, navigate back to the HPE Ezmeral Unified Analytics Software home screen.
  8. In the left navigation bar, go to Data Engineering > Data Sources.
  9. Click Browse.
  10. Go to the /shared/<airflow-whylogs> folder which is a path set in your DAG to store the logs from whylogs. You can see that the data proļ¬les and the drift summary report are stored in the shared volume in the .html and .bin formats.
  11. To download a summary report, select Download from the Actions menu.

Results

You can analyze the summary report to detect drifts and monitor your data.