Connecting to CSV and Parquet Data in an External S3 Data Source via Hive Connector

Describes how to use the Hive connector with Presto in HPE Ezmeral Unified Analytics Software to connect to CSV and Parquet data in S3-based external data sources.

You can connect HPE Ezmeral Unified Analytics Software to any external S3-based data source through the Hive connector and Presto to access CSV and Parquet data. For example, you can connect HPE Ezmeral Unified Analytics Software to an external Data Fabric, Iceberg, or Spark cluster to access CSV and Parquet data in S3 storage within these data sources.

Connection Requirements

Connecting to an S3-based external data source has the following requirements:

You must have read/write access on the S3 data source.
You must provide the required information to connect, including the:
- Access key
- Secret key
- S3 directory where files are stored
- S3 endpoint

Connection fields vary depending on the type of data source you connect to (Data Fabric, Iceberg, Spark, etc.); however, the S3-related fields (credentials, directory, and endpoint) are always required regardless of the data source you are accessing S3 data through.

IMPORTANT

A Hive Metastore is not required. HPE Ezmeral Unified Analytics Software scans the files in the S3 data source to get metadata and determine data types.

For more information about the PrestoDB Hive connector, see Hive Connector.

Connecting to an S3-Based Data Source

To connect to an S3-based data source:

Sign in to HPE Ezmeral Unified Analytics Software.
In the left navigation bar, select Data Engineering > Data Sources.
On the Structured Data tab, click Add New.
In the Hive tile, click Create Connection.

In the drawer that opens, enter values in the required fields. An asterisk (*) denotes a required field.

For Hive Metastore, select Discovery.

In the Optional Fields search field, find and add the following fields:

Field Name	Description
Hive S3 AWS Access Key	Enter the AWS access key.
Hive S3 AWS Secret Key	Enter the AWS secret key.
Hive S3 Endpoint	Enter the S3 endpoint.
Hive S3 Path Style Access	Select the checkbox.

Click Connect. Upon successful connection, the data source is available on the Data Sources page.
TROUBLE
If the connection fails, see Hive Data Source Connection Failure (S3-Based External Data Souce).
In the left navigation bar, select Data Engineering > Data Sources.
In the data source tile, click Query using Data Catalog to access the S3 data.
On the Data Catalog page, identify the data source and select the schema to view the data.

Example: Connect to HPE Ezmeral Data Fabric Object Store to access Parquet files

This example demonstrates how to connect HPE Ezmeral Unified Analytics Software to Parquet data in HPE Ezmeral Data Fabric Object Store (S3-based data store).

TIP

To connect to HPE Ezmeral Data Fabric Object Store, you need working knowledge of HPE Ezmeral Data Fabric Object Store, read/write access to a bucket (granted via IAM policies), and the ability to generate the access key and secret key in HPE Ezmeral Data Fabric Object Store.

In this example, a bucket named prestodemo exists in HPE Ezmeral Data Fabric Object Store. Inside the bucket is a tpch002 directory with nation and nationalhistory sub-directories that contain Parquet files.

The following image shows the HPE Ezmeral Data Fabric Object Store prestodemo bucket and directories within the bucket:

Creating the HPE Ezmeral Unified Analytics Software connection to HPE Ezmeral Data Fabric Object Store requires the following information from HPE Ezmeral Data Fabric Object Store:

Access key
Secret key
S3 directory where files are stored (Example: s3://prestodemo/tpch002)
S3 endpoint (This is the HPE Ezmeral Data Fabric Object Store IP address and port, for example: https://10.10.10.100:9000

To connect HPE Ezmeral Unified Analytics Software to an S3 Directory in an external HPE Ezmeral Data Fabric Object Store:

Sign in to HPE Ezmeral Unified Analytics Software.
In the left navigation bar, select Data Engineering > Data Sources.
On the Structured Data tab, click Add New.
In the Hive tile, click Create Connection.

In the drawer that opens, complete the required fields:

TIP

In the drawer, search for and add the fields you do not see. Use the search field under Optional Fields to find and add these fields.
No external Hive Metastore is required; HPE Ezmeral Unified Analytics Software internally parses and scans the files to get the data types and metadata.

Field Name	Example	Notes
Name	demoparquet	Unique name for the data source connection.
Hive Metastore	Discovery	System scans files to discover metadata and data types.
Data Dir	s3://prestodemo/tpch002	S3 directory where files are stored.
File Type	Parquet	You can select Parquet or CSV depending on the type of data in the S3 directory.
Hive S3 AWS Access Key	The access key generated by the Data Fabric Object Store.	Under Optional Fields, search for this field in the search box and select it.
Hive S3 AWS Secret Key	The secret key generated by the Data Fabric Object Store.	Under Optional Fields, search for this field in the search box and select it.
Hive S3 Endpoint	https://10.10.10.100:9000	The Data Fabric Object Store connection URL. Under Optional Fields, search for this field in the search box and select it.
Hive S3 Path Style Access	Select the checkbox.	Under Optional Fields, search for this field in the search box and select it.

Click Connect. The system displays the message, "Successfully added data source," and the new data source (demoparquet) tile appears on the Data Sources page.
TROUBLE
If the connection fails, see Hive Data Source Connection Failure (S3-Based External Data Souce).
In the new data source tile (demoparquet), click Query using Data Catalog. You can see the demoparquet source listed.
Select the schema (default) under the data source to view the data sets.
NOTE
Each subdirectory in the S3 directory displays as a table in HPE Ezmeral Unified Analytics Software, and each Parquet file in the subdirectory is a row in the table.
Click on a table (nation) and then click the Data Preview tab to view the Parquet data.

HPE Ezmeral Unified Analytics Software 1.5 Documentation
Abstract	HPE Ezmeral Unified Analytics Software is a usage-based Software-as-a-Service (SaaS) model that operationalizes hybrid and multi-cloud modern analytical workloads through a simple user interface, easily installed and deployed in minutes. HPE Ezmeral Unified Analytics Software separates compute and storage for flexible, cost-efficient scalability to securely access data stored in multiple data platforms, enabling you to run traditional and advanced analytics workloads with open-source tools.
Published	November 2025
Edition	1.5.0
Topic last updated	2024-09-04