Connecting to External S3 Object Stores
Describes how to connect HPE Ezmeral Unified Analytics Software to external S3 object storage in AWS, MinIO, and HPE Ezmeral Data Fabric Object Store.
Administrators can connect HPE Ezmeral Unified Analytics Software to object storage in AWS S3, MinIO, HPE Ezmeral Data Fabric Object Store, and HPE GreenLake for File Storage. Users can then access data in the connected data sources through clients, such as Spark and Kubeflow notebooks, without providing an access or secret key.
When you configure the data source connection, you provide HPE Ezmeral Unified Analytics Software with the access credentials (access key and secret key); the user does not need the access credentials because HPE Ezmeral Unified Analytics Software uses a proxy to communicate with clients.
Clients talk to the HPE Ezmeral Unified Analytics Software proxy through the data source endpoint URL and pass JWT tokens to authenticate users. Users configure clients to talk to the connected object store. Users provide the client with the data source name and endpoint URL (as they appear on the data source tile in the HPE Ezmeral Unified Analytics Software UI), as well as the bucket they want the client to access.
How to Connect HPE Ezmeral Unified Analytics to Object Storage
- Sign in to HPE Ezmeral Unified Analytics Software.
- In the left navigation bar, select Data Engineering > Data Sources.
- On the Data Sources screen, select the Object Store Data tab. NOTEBy default, a local-s3 Ezmeral Data Fabric tile is displayed. This Ezmeral Data Fabric version of S3 is a local S3 version used internally by HPE Ezmeral Unified Analytics Software and cannot be deleted. Do not connect to this data source.
- Click Add New Data Source.
- Click the Add… button in one of the tiles (HPE Ezmeral Data Fabric Object Store, Amazon, MinIO, or HPE GreenLake for File Storage).
- In the drawer that opens, enter the connection properties:
- HPE Ezmeral Data Fabric Object Store
- To connect to HPE Ezmeral Data Fabric Object Store, provide the following information:
- Name - Enter a unique name for the data source.
- Endpoint - Enter the HPE Ezmeral Data Fabric Object Store URL, for
example:
https://<ip-address>:9000
To connect to a secured HPE Ezmeral Data Fabric Object Store, enter the fully qualified domain name (FQDN) of the external HPE Ezmeral Data Fabric Object Store node, for example:https://<FQDN-of-external-DF-s3-node>:9000
- Access Key - Enter the HPE Ezmeral Data Fabric Object Store access key.
- Secret Key - Enter the HPE Ezmeral Data Fabric Object Store secret key.
- Insecure - Only select this option for POCs or demos; do not select
for production environments. If you do not select this option, you must add
the root CA certificate for a secured connection. For a secure HPE Ezmeral Data Fabric Object Store connection, enter the path to the root CA certificate on the node that you specified as the endpoint. Typically, the root CA certificate path is:
/opt/mapr/conf/ca/chain-ca.pem
- AWS S3
- To connect to AWS S3, provide the following information:
- Name - Enter a unique name for the data source.
- Endpoint - Enter the AWS S3 URL, for example
https://s3.us-east-20.amazonaws.com
. - Access Key - Enter the AWS S3 access key.TIPThe access key and secret key are associated with the IAM user in AWS. The IAM policy associated with the user should permit access to buckets. For example, the IAM policy should grant the user read, write, and/or create access on buckets.
- Secret Key - Enter the AWS S3 secret key.
- AWS Region - Enter the AWS region.
- MinIO
- To connect to MinIO, provide the following information:
- Name - Enter a unique name for the data source.
- Endpoint - Enter the MinIO URL.
- Access Key - Enter the MinIO access key.
- Secret Key - Enter the MinIO secret key.
- Insecure - Only select this option for POCs or demos; do not select for production environments. When the option is not selected, you must add the root CA certificate for a secured connection.
- Root Certificate - This is a TLS mode configuration. Add the root CA certificate bundle.
- Click Add. The data source is connected and a new tile for the data source
displays on the Data Sources screen. IMPORTANTThe data source name and endpoint URL display on the tile. Users need this information to connect their clients to the data source. Users can navigate to the Data Sources screen to get the information. See Accessing Data in External S3 Object Stores.
HPE GreenLake for File Storage
- Name - Enter a unique name for the data source.
- Endpoint - Enter the MinIO URL.
- Access Key - Enter the MinIO access key.
- Secret Key - Enter the MinIO secret key.
- Insecure - Only select this option for POCs or demos; do not select for production environments. When the option is not selected, you must add the root CA certificate for a secured connection.
- Root Certificate - This is a TLS mode configuration. Add the root CA certificate bundle.