Data Engineering

Data engineers can design and build pipelines that transform and transport data into usable formats for data consumers.

HPE Ezmeral Unified Analytics Software includes connectors for several data sources that facilitate data virtualization by providing a single point of uniform, controlled access to distributed data, regardless of the compute engine. You can use open-source tools, such as Apache Spark and Apache Airflow, to extract data from disparate sources and create transformed data sets for data consumption.

For example, you can run a Spark job to move data from one data source (such as Snowflake) into another data source (such as HPE Ezmeral Data Fabric) and then connect HPE Ezmeral Unified Analytics Software to the HPE Ezmeral Data Fabric data source. Once connected to HPE Ezmeral Data Fabric, you can work with the data (join and transform) to create consumable models for users and applications.

Data consumers with appropriate permissions can use data in their analytical workloads, data science workflows, dashboards, or for data modeling.

Working with Data

The Data Engineering space provides access to interfaces that enable you to use EzPresto, the SQL query engine in HPE Ezmeral Unified Analytics Software, to work with data.

The following list describes what you can do through each of the interfaces in the Data Engineering space:
Data Sources
Connect HPE Ezmeral Unified Analytics Software to external data sources. Each connected data source displays as a tile on the screen. You can also remove data sources or access the Query Editor from each data source tile. See Connecting Data Sources.

When you connect HPE Ezmeral Unified Analytics Software to various data sources, you can access the data in those data sources from Superset and then visualize the data.

Data Catalog
Select data sets (tables and views) from one or more data sources and run federated queries. You can also cache data sets. Caching stores the data in a distributed caching layer within the data fabric for accelerated access to the data. See Caching Data.
Query Editor
Run queries against the selected data sets. You can also create views and new schemas.
Cached Assets
Lists the cached data sets (tables and views). See Caching Data.
Airflow Pipelines
Links to the Airflow interface where you can connect to data sets created in HPE Ezmeral Unified Analytics Software and use them in your data pipelines. See Airflow.