Retail Store Analysis Dashboard (Superset)
Provides an end-to-end workflow example for a retail store analysis scenario in HPE Ezmeral Unified Analytics Software using EzPresto and Superset.
Scenario
A data analyst wants to visualize data sets from MySQL, SQL Server, and Hive data sources in Superset. The data analyst signs in to HPE Ezmeral Unified Analytics Software and connects Unified Analytics to MySQL, SQL Server, and Hive data sources. The data analyst runs a federated query against the data sets and then creates a view from the query. The analyst accesses the view from Superset and uses it to visualize the data in a bar chart and adds the chart to a dashboard.
HPE Ezmeral Unified Analytics Software includes the following components and applications to support an end-to-end workflow for this scenario:- EzPresto
- An MPP SQL query engine that runs accelerated queries against connected data sources and returns results to Superset for visualization. EzPresto connects to Superset through a database connection, enabling direct access to the data sources connected to Unified Analytics from Superset.
- Superset
- An analytical dashboarding application that communicates with EzPresto to send queries and receive the query results needed to visualize data from the selected data sets.

Steps
- A - Connect Data Sources
- B – Select Data Sets and Create a View
- C - Connect to the Presto Database
- D - Add the View to Superset and Create a Chart
- E - Specify Query Conditions to Visualize Results in the Chart
- F – Create a Superset Dashboard and Add the Chart (Visualized Data)
- G – Monitor Queries
A - Connect Data Sources
Connect HPE Ezmeral Unified Analytics Software to external data sources that contain the data sets (tables and views) you want to work with. This tutorial uses MySQL, SQL Server, and Hive as the connected data source examples.
- In the left navigation column, select Data Engineering > Data
Sources. The Data Sources screen appears.
- Click Add New Data Source.
- Complete the steps required to connect to the MySQL, SQL Server, and Hive data sources:
- Connecting to MySQL
-
- In the Add New Data Source screen, click Create Connection in the MySQL tile.
- In the drawer that opens, enter the following information in the respective
fields:
- Name: mysql
- Connection URL: jdbc:mysql://<ip-address>:<port>
- Connection User: myaccount
- Connection Password: moi123
- Enable Local Snapshot Table: Select the check boxTIPWhen Enable Local Snapshot Table is selected, the system caches remote table data to accelerate queries on the tables. The cache is active for the duration of the configured TTL or until the remote tables in the data source are altered.
- Click Connect. Upon successful connection, the system returns the
following
message:
Successfully added data source "mysql".
- Connecting to SQL Server
-
- In the Add New Data Source screen, click Create Connection in the SQL Server tile.
- In the drawer that opens, enter the following information in the respective
fields:
- Name: mssqlret2
- Connection URL: jdbc:sqlserver:<ip-address>:<port>;database=retailstore
- Connection User: myaccount
- Connection Password: moi123
- Enable Local Snapshot Table: Select the check boxTIPWhen Enable Local Snapshot Table is selected, the system caches remote table data to accelerate queries on the tables. The cache is active for the duration of the configured TTL or until the remote tables in the data source are altered.
- Click Connect. Upon successful connection, the system returns the
following
message:
Successfully added data source "mssqlret2".
- Connecting to Hive
-
- In the Add New Data Source screen, click Create Connection in the Hive tile.
- In the drawer that opens, enter the following information in the respective
fields:
- Name: hiveview
- Hive Metastore: file
- Hive Metastore Catalog Dir: file:///data/shared/tmpmetastore
- In Optional Fields, search for the following fields and add the
specified values:
- Hive Max Partitions Per Writers: 10000
- Hive Temporary Staging Directory Enabled: Unselect
- Hive Allow Drop Table: Select
- Enable Local Snapshot Table: Select the check boxTIPWhen Enable Local Snapshot Table is selected, the system caches remote table data to accelerate queries on the tables. The cache is active for the duration of the configured TTL or until the remote tables in the data source are altered.
- Click Connect. Upon successful connection, the system returns the
following
message:
Successfully added data source "hiveview".
B – Select Data Sets and Create a View
- Select datasets.
- In the left navigation bar, select Data Engineering > Data Catalog.
- On the Data Catalog page, click the dropdown next to the mysql and mssqlret2 data sources to expose the available schemas in those data sources.
- Select schemas for each of the data sources:
- For the mysql data source, select the retailstore schema.
- For the mssqlret2 data source, select the dbo schema.
- In the All Datasets section, click the filter icon to open the Filters drawer.
- Use the filter to identify and select the following data sets in the selected
schemas:
- For the dbo schema, filter for and select the following datasets:
- call_center
- catalog_sales
- data_dim
- item
- For the retailstore schema, filter for and select the following
datasets:
- customer
- customer_address
- customer_demographics
- For the dbo schema, filter for and select the following datasets:
- After you select all the data sets, click Apply.
- Click Selected Datasets (button that is displaying the number of selected data sets).
- In the drawer that opens, click Query Editor. Depending on the number of selected data sets, you may have to scroll down to the bottom of the drawer to see the Query Editor button.
- Query the datasets and create a view.
- In the Query Editor, click + to Add Worksheet.
- Run the following command to create a new schema, such as
hiveview.demoschema
, for example:create schema if not exists hiveview.demoschema;
- Run a query to create a new view from a federated query against the selected data
sets, for
example:
create view hiveview.demoschema.qf_retailstore_view as select * from mssqlret2.dbo.catalog_sales cs inner join mssqlret2.dbo.call_center cc on cs.cs_call_center_sk = cc.cc_call_center_sk inner join mssqlret2.dbo.date_dim d on cs.cs_sold_date_sk = d.d_date_sk inner join mssqlret2.dbo.item i on cs.cs_item_sk = i.i_item_sk inner join mysql.retailstore.customer c on cs.cs_bill_customer_sk = c.c_customer_sk inner join mysql.retailstore.customer_address ca on c.c_current_addr_sk = ca.ca_address_sk inner join mysql.retailstore.customer_demographics cd on c.c_current_cdemo_sk = cd.cd_demo_sk
- Click Run. When the query completes, the status, Finished, displays.
C - Connect to the Presto Database
- To open Superset, in the left navigation pane of HPE Ezmeral Unified Analytics Software, select BI Reporting > Dashboards. Superset opens in a new tab.
- In Superset, select Settings > Database Connections.
- Click +DATABASE.
- In the Connect a database window, select the Presto tile.
- Enter the SQLALCHEMY URI provided by your administrator.
- Test the connection.
- If the test was successful, click Connect.
D - Add the View to Superset and Create a Chart
- In the left navigation bar, select BI Reporting > Dashboards to open Superset.
- In Superset, click the Datasets tab.
- Click +DATASET.
- In the Add Dataset window, select the following options:
- DATABASE: Presto
- SCHEMA: <your_schema>
- SEE TABLE SCHEMA: <your_view>
This tutorial uses the retailstore schema and qf_retailstore_view.
- Click ADD DATASET AND CREATE CHART.
- In the Create a New Chart window, select Bar Chart.
- Click CREATE NEW CHART.
- Enter a name for the chart, such as Retail Store View.
E - Specify Query Conditions to Visualize Results in the Chart
In Superset, charts visualize data based on the query conditions that you specify. The charts created in Superset automatically generate queries that Superset passes to the SQL query engine. Superset visualizes the query results in the chart. Try applying query conditions to visualize your data. Save your chart when done.
The following steps demonstrate how query conditions were applied to visualize data in the resulting example bar chart (shown in step 2):
- Enter the specified query parameters in the following fields:
- METRICS
-
- Click into the METRICS field (located on the DATA tab). A metrics window opens.
- Select the Simple tab.
- Click the edit icon and enter a name for the metric, such as SUM(cs_net_paid).
- In the Column field, select cs_net_paid.
- In the Aggregate field, select SUM.
- Click Save.
- FILTERS
-
- Click into the FILTERS field (located on the DATA tab).
- In the window that opens, select the CUSTOM SQL tab.
- Select the WHERE filter and enter the
following:
NULLIF(ca_state, '') IS NOT NULL
- Click Save.
- DIMENSIONS
-
- Drag and drop the ca_state column into the DIMENSIONS field.
- Click into the BREAKDOWNS column.
- In the window that opens, select the SIMPLE tab and select the cc_name column.
- Click Save.
- SORT BY
-
- Click into the SORT BY field.
- In the window that opens, select the SIMPLE tab and enter cs_net_paid as the COLUMN and SUM as the AGGREGATE.
- Click Save.
- Click CREATE CHART. The bar chart displays results when the query finishes
processing.
- Click Save to save the chart. In the Save Chart window that opens, do not enter or select a dashboard.
- Click Save to continue.
F – Create a Superset Dashboard and Add the Chart (Visualized Data)
Complete the following steps to create a new dashboard and add your chart to the dashboard. This tutorial adds the Retail Store View chart to a dashboard named Retail Store Analysis Dashboard.
- In Superset, click on the Dashboards tab.
- Click + DASHBOARD.
- Enter a name for the dashboard, for example Retail Store Analysis Dashboard.
- Drag and drop your chart into the dashboard.
- Click Save to save the dashboard.
G – Monitor Queries
You can monitor queries generated through Superset from the EzPresto endpoint. You can access the EzPresto endpoint in the EzPresto tile in the Applications & Frameworks space in HPE Ezmeral Unified Analytics Software.
- Return to the HPE Ezmeral Unified Analytics Software UI.
- In the left navigation bar, select Applications & Frameworks.
- On the Data Engineering tab, click the EzPresto endpoint in the EzPresto tile. The EzPresto UI opens in a new tab.
- In the Query Details section, verify that Finished is selected. Selected
options have a visible checkmark.
You can see the query that ran to populate the Retail Store View bar chart in the Retail Store Analysis Dashboard. - Click on the Query ID to see the query details.
- To see a visualized query plan and metadata for the query, click Live Plan and
hover over different areas of the visualized plan. You can also click on various parts
of the visualized plan to zoom in on details.
End of Tutorial
You have completed this tutorial. This tutorial demonstrated the integration of the HPE Ezmeral Unified Analytics Software SQL query engine (EzPresto) with Superset to visualize the results of a query on data sets made available through the default Presto database connection. This tutorial also showed you how to monitor queries from the EzPresto Cluster Monitoring tool.