EzPresto

Describes the EzPresto SQL query engine and its featues.

EzPresto in HPE Ezmeral Unified Analytics Software

EzPresto is an SQL query engine based on the open-source, Linux foundation multi-parallel processing (MPP) query engine PrestoDB, that is optimized to run federated queries across various data sources. Enterprise BI applications such as Tableau, Power BI, and data processing engines, such as Spark, can leverage EzPresto for rapid query performance and prompt insights through federated data access.

You can easily connect EzPresto to multiple types of data sources from the Data Engineering space in HPE Ezmeral Unified Analytics Software by going to Data Engineering > Data Sources. Connections require a JDBC connection URL and user credentials.

Data sets available to the connected user display in the Data Catalog, which is accessible by going to Data Engineering > Data Catalog. In the Data Catalog, you select the data sets you want to work with. You can query or cache the selected datasets.

When you opt to cache data sets, you can modify the data sets prior to caching them. For example, you can edit table and column names, remove columns, and create new schema. Cached data sets (tables and views) are accessible in the Cached Assets space of HPE Ezmeral Unified Analytics Software. You can access cached assets by going to Data Engineering > Cached Assets.

When you opt to query data sets, you can run federated queries (query across data sets in multiple data sources) from the Query Editor. You can access the Query Editor by going to Data Engineering > Query Editor. Querying cached data sets accelerates queries for significant performance gains.

You can access the data in connected data sources from Superset and visualize the data that results from complex, federated queries. Superset is accessible in HPE Ezmeral Unified Analytics Software by going to BI Reporting > Dashboards or Tools & Frameworks > Data Engineering tab and clicking Open in the Superset tile. See Superset. You can also monitor the state of queries and query details, including the query plan and resource usage, by going to Administration > EzSQL Cluster Monitoring.

Refer to the following tutorials to get started with EzPresto in HPE Ezmeral Unified Analytics Software:

EzPresto Key Features

EzPresto provides the following key benefits and features:
Data Source Connectivity
EzPresto includes connectors for several data sources, including:
  • HPE Ezmeral Data Fabric
  • HDFS
  • Data Lakes
  • Hive Metastore (including managed HMS services such as AWS Glue)
  • Object Stores
  • Relational Databases
  • NoSQL Databases
  • Streaming data platforms
  • Data warehouses
Built-In Data Catalog
The built-in data catalog provides dynamic registration of new data sources. Data administrators can add new data sources as they become available without restarting any services. When a data administrator adds a new data source, the data catalog automatically refreshes so users, such as data analysts, can browse the new datasets and perform upstream activities, such as reporting and dashboarding.
Role-Based Access Controls
Role-based access controls isolate queries such that members (non-admin users) can only view, access, and cancel their own queries. Admin users have full access to all queries. For example, if a member runs a query that takes too long to complete or uses too many resources, any admin in HPE Ezmeral Unified Analytics Software can stop the query.
Optimized Federated Queries
Access data across disparate data sources in a single, optimized query. Query optimizations for accelerated performance include:
  • Predicate pushdown - EzPresto pushes filters in the WHERE clause down to the data source for processing to reduce the number of rows returned.
  • Projection pushdown - EzPresto pushes projects (scanning of selected columns) down to the data source for processing to reduce the amount of data returned.
  • Dynamic filtering - EzPresto evaluates predicates on the right side of a join and pushes them to the left side of the join to reduce the number of rows scanned from the left table.
  • Cost-based optimization - EzPresto uses table statistics to calculate the cost (resource usage) of various query plans and chooses the optimal plan (plan that uses the least resources) to run the query.
Distributed Caching
EzPresto accelerates federated queries through distributed caching of commonly used datasets. EzPresto currently supports explicit caching where you manually modify tables and select the data that you want cached for fast query access. You can use explicitly cached data for data modeling. EzPresto stores cached data in a data fabric volume. The cache expires based on the set TTL (time-to-live). See Connecting Data Sources and Caching Data.
Explicit Caching
You manually modify tables and select the data in the tables that you want stored in the cache for fast query access. You can use explicitly cached data for data modeling. You can set a TTL (time to live) for the cache.
Self-Service Data Access
End-users can browse data sets they have access to and select the relevant data for their queries and analytical applications and workloads.
Run-Anywhere Architecture
EzPresto has a run-anywhere architecture; you can run EzPresto on-premises, on edge, in the cloud, or hybrid environments.

EzPresto Architecture

The EzPresto architecture consists of the following main components:
Presto
EzPresto uses a modified version of Presto as the query engine. Most of the modifications are in the query planning and optimizer areas, as well as support for different data sources, such as Teradata and Snowflake, and in-process caching based on Apache Geode, tuned for OLTP and OLAP access. The cache provides a tuple store with specialized columnar formats.
WebService
Provides the API.
Web UI
Provides the ability to access EzPresto in applications.
Client Connections
Provides the ability to connect to BI tools and external data sources via the JDBC client.
KeyCloak
KeyCloak provides the authentication mechanism and different authentication options, such as LDAP and JWT.