Retail Store Analysis Dashboard (Superset)

Provides an end-to-end workflow example for a retail store analysis scenario in HPE Ezmeral Unified Analytics Software using EzPresto and Superset.

Scenario

A data analyst wants to visualize data sets from MySQL, SQL Server, and Hive data sources in Superset. The data analyst signs in to HPE Ezmeral Unified Analytics Software and connects Unified Analytics to MySQL, SQL Server, and Hive data sources. The data analyst runs a federated query against the data sets and then creates a view from the query. The analyst accesses the view from Superset and uses it to visualize the data in a bar chart and adds the chart to a dashboard.

HPE Ezmeral Unified Analytics Software includes the following components and applications to support an end-to-end workflow for this scenario:
EzPresto
An MPP SQL query engine that runs accelerated queries against connected data sources and returns results to Superset for visualization. EzPresto connects to Superset through a database connection, enabling direct access to the data sources connected to Unified Analytics from Superset.
Superset
An analytical dashboarding application that communicates with EzPresto to send queries and receive the query results needed to visualize data from the selected data sets.
The following diagram shows the components and applications in the workflow:

Steps

Sign in to HPE Ezmeral Unified Analytics Software and perform the following steps:
IMPORTANT
This tutorial demonstrates how to perform a series of tasks in HPE Ezmeral Unified Analytics Software to complete an example workflow. The data and information used in this tutorial are for example purposes only. You must connect Unified Analytics to your own data sources and use the data sets available to you in your data sources.

A - Connect Data Sources

Connect HPE Ezmeral Unified Analytics Software to external data sources that contain the data sets (tables and views) you want to work with. This tutorial uses MySQL, SQL Server, and Hive as the connected data source examples.

To connect a data source:
  1. In the left navigation column, select Data Engineering > Data Sources. The Data Sources screen appears.

  2. Click Add New Data Source.

  3. Complete the steps required to connect to the MySQL, SQL Server, and Hive data sources:
    Connecting to MySQL
    1. In the Add New Data Source screen, click Create Connection in the MySQL tile.
    2. In the drawer that opens, enter the following information in the respective fields:
      • Name: mysql
      • Connection URL: jdbc:mysql://<ip-address>:<port>
      • Connection User: myaccount
      • Connection Password: moi123
      • Enable Local Snapshot Table: Select the check box
        TIP
        When Enable Local Snapshot Table is selected, the system caches remote table data to accelerate queries on the tables. The cache is active for the duration of the configured TTL or until the remote tables in the data source are altered.
      • Click Connect. Upon successful connection, the system returns the following message:
        Successfully added data source "mysql".
    Connecting to SQL Server
    1. In the Add New Data Source screen, click Create Connection in the SQL Server tile.
    2. In the drawer that opens, enter the following information in the respective fields:
      • Name: mssqlret2
      • Connection URL: jdbc:sqlserver:<ip-address>:<port>;database=retailstore
      • Connection User: myaccount
      • Connection Password: moi123
      • Enable Local Snapshot Table: Select the check box
        TIP
        When Enable Local Snapshot Table is selected, the system caches remote table data to accelerate queries on the tables. The cache is active for the duration of the configured TTL or until the remote tables in the data source are altered.
      • Click Connect. Upon successful connection, the system returns the following message:
        Successfully added data source "mssqlret2".
    Connecting to Hive
    1. In the Add New Data Source screen, click Create Connection in the Hive tile.
    2. In the drawer that opens, enter the following information in the respective fields:
      • Name: hiveview
      • Hive Metastore: file
      • Hive Metastore Catalog Dir: file:///data/shared/tmpmetastore
      • In Optional Fields, search for the following fields and add the specified values:
        • Hive Max Partitions Per Writers: 10000
        • Hive Temporary Staging Directory Enabled: Unselect
        • Hive Allow Drop Table: Select
      • Enable Local Snapshot Table: Select the check box
        TIP
        When Enable Local Snapshot Table is selected, the system caches remote table data to accelerate queries on the tables. The cache is active for the duration of the configured TTL or until the remote tables in the data source are altered.
      • Click Connect. Upon successful connection, the system returns the following message:
        Successfully added data source "hiveview".

B – Select Data Sets and Create a View

In HPE Ezmeral Unified Analytics Software, complete the following steps to create a view. First select data sources and data sets to work with. Then, run a federated query against the selected data sets and create a view from the query. This tutorial creates an example view named qf_retailstore_view.
  1. Select datasets.
    1. In the left navigation bar, select Data Engineering > Data Catalog.
    2. On the Data Catalog page, click the dropdown next to the mysql and mssqlret2 data sources to expose the available schemas in those data sources.
    3. Select schemas for each of the data sources:
      • For the mysql data source, select the retailstore schema.
      • For the mssqlret2 data source, select the dbo schema.
    4. In the All Datasets section, click the filter icon to open the Filters drawer.
    5. Use the filter to identify and select the following data sets in the selected schemas:
      • For the dbo schema, filter for and select the following datasets:
        • call_center
        • catalog_sales
        • data_dim
        • item
      • For the retailstore schema, filter for and select the following datasets:
        • customer
        • customer_address
        • customer_demographics
    6. After you select all the data sets, click Apply.
    7. Click Selected Datasets (button that is displaying the number of selected data sets).
    8. In the drawer that opens, click Query Editor. Depending on the number of selected data sets, you may have to scroll down to the bottom of the drawer to see the Query Editor button.
  2. Query the datasets and create a view.
    1. In the Query Editor, click + to Add Worksheet.
    2. Run the following command to create a new schema, such as hiveview.demoschema, for example:
      create schema if not exists hiveview.demoschema;
    3. Run a query to create a new view from a federated query against the selected data sets, for example:
      create view hiveview.demoschema.qf_retailstore_view as select * from mssqlret2.dbo.catalog_sales cs
      inner join mssqlret2.dbo.call_center cc on cs.cs_call_center_sk = cc.cc_call_center_sk
      inner join mssqlret2.dbo.date_dim d on cs.cs_sold_date_sk = d.d_date_sk
      inner join mssqlret2.dbo.item i on cs.cs_item_sk = i.i_item_sk
      inner join mysql.retailstore.customer c on cs.cs_bill_customer_sk = c.c_customer_sk
      inner join mysql.retailstore.customer_address ca on c.c_current_addr_sk = ca.ca_address_sk
      inner join mysql.retailstore.customer_demographics cd on c.c_current_cdemo_sk = cd.cd_demo_sk
      
    4. Click Run. When the query completes, the status, Finished, displays.

C - Connect to the Presto Database

Complete the following steps to connect Superset to the Presto database for access to your data sources and data sets in HPE Ezmeral Unified Analytics Software. Once connected to the Presto database, you can access the view you created in the previous step (step B). To connect to the Presto database, you need the connection URI. You can get the URI from your HPE Ezmeral Unified Analytics Software administrator.
  1. To open Superset, in the left navigation pane of HPE Ezmeral Unified Analytics Software, select BI Reporting > Dashboards. Superset opens in a new tab.
  2. In Superset, select Settings > Database Connections.
  3. Click +DATABASE.
  4. In the Connect a database window, select the Presto tile.
  5. Enter the SQLALCHEMY URI provided by your administrator.
  6. Test the connection.
  7. If the test was successful, click Connect.

D - Add the View to Superset and Create a Chart

Complete the following steps to import the view you created in HPE Ezmeral Unified Analytics Software and create a bar chart. This tutorial demonstrates how to import the view qf_retailstore_view.
  1. In the left navigation bar, select BI Reporting > Dashboards to open Superset.
  2. In Superset, click the Datasets tab.
  3. Click +DATASET.
  4. In the Add Dataset window, select the following options:
    • DATABASE: Presto
    • SCHEMA: <your_schema>
    • SEE TABLE SCHEMA: <your_view>

    This tutorial uses the retailstore schema and qf_retailstore_view.

  5. Click ADD DATASET AND CREATE CHART.
  6. In the Create a New Chart window, select Bar Chart.

  7. Click CREATE NEW CHART.
  8. Enter a name for the chart, such as Retail Store View.

E - Specify Query Conditions to Visualize Results in the Chart

In Superset, charts visualize data based on the query conditions that you specify. The charts created in Superset automatically generate queries that Superset passes to the SQL query engine. Superset visualizes the query results in the chart. Try applying query conditions to visualize your data. Save your chart when done.

The following steps demonstrate how query conditions were applied to visualize data in the resulting example bar chart (shown in step 2):

  1. Enter the specified query parameters in the following fields:
    METRICS
    1. Click into the METRICS field (located on the DATA tab). A metrics window opens.
    2. Select the Simple tab.
    3. Click the edit icon and enter a name for the metric, such as SUM(cs_net_paid).
    4. In the Column field, select cs_net_paid.
    5. In the Aggregate field, select SUM.
    6. Click Save.
    FILTERS
    1. Click into the FILTERS field (located on the DATA tab).
    2. In the window that opens, select the CUSTOM SQL tab.
    3. Select the WHERE filter and enter the following:
      NULLIF(ca_state, '') IS NOT NULL
    4. Click Save.
    DIMENSIONS
    1. Drag and drop the ca_state column into the DIMENSIONS field.
    2. Click into the BREAKDOWNS column.
    3. In the window that opens, select the SIMPLE tab and select the cc_name column.
    4. Click Save.
    SORT BY
    1. Click into the SORT BY field.
    2. In the window that opens, select the SIMPLE tab and enter cs_net_paid as the COLUMN and SUM as the AGGREGATE.
    3. Click Save.
  2. Click CREATE CHART. The bar chart displays results when the query finishes processing.

  3. Click Save to save the chart. In the Save Chart window that opens, do not enter or select a dashboard.
  4. Click Save to continue.

F – Create a Superset Dashboard and Add the Chart (Visualized Data)

Complete the following steps to create a new dashboard and add your chart to the dashboard. This tutorial adds the Retail Store View chart to a dashboard named Retail Store Analysis Dashboard.

To create a new dashboard and add your visualized data:
  1. In Superset, click on the Dashboards tab.
  2. Click + DASHBOARD.
  3. Enter a name for the dashboard, for example Retail Store Analysis Dashboard.
  4. Drag and drop your chart into the dashboard.

  5. Click Save to save the dashboard.
NOTE
Any time you open a chart or dashboard, Superset and the SQL query engine work together to visualize data. Loading a dashboard page triggers the queries against the database. As the queries run, buffering icons display until the data loads. When data is loaded, the visualizations display.

G – Monitor Queries

You can monitor queries generated through Superset from the EzPresto endpoint. You can access the EzPresto endpoint in the EzPresto tile in the Applications & Frameworks space in HPE Ezmeral Unified Analytics Software.

Complete the following steps to monitor the query that the chart generates:
  1. Return to the HPE Ezmeral Unified Analytics Software UI.
  2. In the left navigation bar, select Applications & Frameworks.
  3. On the Data Engineering tab, click the EzPresto endpoint in the EzPresto tile. The EzPresto UI opens in a new tab.
  4. In the Query Details section, verify that Finished is selected. Selected options have a visible checkmark.

    You can see the query that ran to populate the Retail Store View bar chart in the Retail Store Analysis Dashboard.
  5. Click on the Query ID to see the query details.

  6. To see a visualized query plan and metadata for the query, click Live Plan and hover over different areas of the visualized plan. You can also click on various parts of the visualized plan to zoom in on details.

End of Tutorial

You have completed this tutorial. This tutorial demonstrated the integration of the HPE Ezmeral Unified Analytics Software SQL query engine (EzPresto) with Superset to visualize the results of a query on data sets made available through the default Presto database connection. This tutorial also showed you how to monitor queries from the EzPresto Cluster Monitoring tool.