Storage

This article describes the various storage usages and how datasets are made available to the containerized clusters.

Container Local Data Storage

HDFS is provisioned within the containers that comprise a virtual Hadoop cluster when that cluster is created. The underlying storage for the HDFS data nodes in the containers resides on local disks in the physical servers hosting those containers. The deployment refers to the set of local disks as node storage. When using HDFS storage in a virtual cluster, the data does not persist beyond the life of the virtual cluster.

Ephemeral Storage

Ephemeral storage is built from the local storage in each host. It is used for the disk volumes that back the local storage for each virtual node. Installing a host reserves a subset of the local disks on that host for node storage. Physical Linux volumes are created on those disks and then used to create a Linux volume group. A Linux logical volume is then created from this Linux volume group. This Linux logical volume is assigned to the Linux container subsystem, which in turn uses portions of the logical volume to the containers running on that host for use as local storage within those containers.

Persistent Storage using HPE Ezmeral Data Fabric

A deployment of HPE Ezmeral Runtime Enterprise must use one HPE Ezmeral Data Fabric for persistent storage. You can choose which implementation of HPE Ezmeral Data Fabric that you use.

HPE Ezmeral Data Fabric on Bare Metal

HPE Ezmeral Data Fabric on Bare Metal is an implementation of HPE Ezmeral Data Fabric that is on physical or virtual machines that are not part of the HPE Ezmeral Runtime Enterprise deployment. You can connect from the HPE Ezmeral Runtime Enterprise deployment to a bare metal implementation of as external storage.

Typically, you would choose this option if you have an existing deployment of HPE Ezmeral Data Fabric and you are adding a deployment of HPE Ezmeral Runtime Enterprise to your environment.

To use this implementation as tenant/persistent storage in HPE Ezmeral Runtime Enterprise, you must do the following:
  • Do not specify any disks as Tenant/Persistent storage during the Platform Controller Setup portion of the installation procedure.
  • After you have installed and verified HPE Ezmeral Runtime Enterprise and configured Gateway hosts, you must register the implementation as tenant/persistent storage as described in HPE Ezmeral Data Fabric as Tenant/Persistent Storage.
HPE Ezmeral Data Fabric on Kubernetes

HPE Ezmeral Data Fabric on Kubernetes is an implementation of HPE Ezmeral Data Fabric in a Kubernetes cluster instead of on physical or virtual servers.

To use this implementation as tenant/persistent storage in HPE Ezmeral Runtime Enterprise, you must do the following:
  • Do not specify any disks as tenant/persistent storage during the Platform Controller Setup portion of the installation procedure.
  • After you have installed and verified HPE Ezmeral Runtime Enterprise and configured Gateway hosts, you must create a new Kubernetes Data Fabric cluster and register that cluster for tenant/peristent storage as described in Creating a New Data Fabric Cluster.
Embedded Data Fabric

Embedded Data Fabric is not supported on new deployments of HPE Ezmeral Runtime Enterprise and later.

This option is available only if you are upgrading from a 5.3.x version of HPE Ezmeral Runtime Enterprise and that deployment has an existing Embedded Data Fabric. If your deployment has an existing Embedded Data Fabric, that implementation was registered as tenant/persistent storage during the Platform Controller Setup portion of the HPE Ezmeral Runtime Enterprise installation procedure.

For more information about the different implementations of HPE Ezmeral Data Fabric, and about host and other requirements when implementing HPE Ezmeral Data Fabric on Kubernetes, see HPE Ezmeral Data Fabric on Kubernetes Administration.

Compute and Storage Separation

Getting the maximum flexibility from a container-based solution requires being able to independently scale compute and storage resources. It is also essential to be able to support the persistence of Big Data datasets beyond the lifespan of a Big Data compute cluster. The DataTap and IOBoost technologies allow virtual clusters to access remote data regardless of location or format.

A DataTap creates a logical data lake overlay that allows access to shared data in the enterprise storage devices. This allows users to run Big Data and ML/DL jobs using the existing enterprise storage without needing to make time-consuming copies or transfers of data to local disks. IOBoost augments DataTap's flexibility by adding an application-aware data caching and tiering server to ensure high-speed remote data delivery.

This persistent storage can also serve as filesystem mount storage (FS mounts). The filesystem mount feature allows automatically adding mounts to virtual nodes/containers, thereby allowing virtual nodes/containers to directly access POSIX data as if they were local directories. You can use this feature to provide common files across all of the virtual nodes/containers in a given tenant, such as a common configuration file that will be used by all of the virtual nodes/containers in the Marketing tenant. This eliminates the need to manually copy common files to individual virtual nodes/containers.

All applications running in containers can natively access data across the HPE Persistent Storage fabric via both DataTaps and FS mounts. Persistent volumes are seamlessly available across clusters from this persistent data fabric.

Operating System Storage

For all host types, the recommended storage for the operating system is two 960 GB SSD's in a RAID 1 configuration. See Host Requirements for detailed storage requirements and recommendations.