Configuring a Spark Application to Access External S3 Object Storage

Describes configuration options for connecting Spark to external S3 object storage.

You can configure a Spark application to connect to an external S3 data source directly or through the S3 proxy layer in HPE Ezmeral Unified Analytics Software.

The following diagram shows how applications in Unified Analytics access external S3 data sources, either through a direct connection from the application to an external S3 data source, as depicted by 1, or through the S3 proxy layer, as depicted by 2, 3, and 4.

The S3 proxy layer securely connects Unified Analytics to external data sources, such as AWS S3, MinIO S3, and HPE Ezmeral Data Fabric Object Store.

When you configure a Spark application to access an S3 data source through the S3 proxy layer, you do not have to provide the access credentials or ask an administrator for access to the data source. Your Unified Analytics administrator creates the connections to external S3 data sources and provides the required access credentials (access key and secret key) at that time. Your administrator also grants permissions on the data sources. Your access to the data sources is authorized through Unified Analytics.

You can see the external S3 data sources that your administrator configured for you in the Unified Analytics UI by signing in and going to Data Engineering > Data Sources and clicking on the Object Store Data tab.

The following image shows an example of the Object Store Data tab with tiles for each of the connected external S3 data sources.

The following topics describe each of the methods (direct or S3 proxy) for connecting Spark to an external S3 data source.