Data Tiering
Provides an overview of what tiering is, its various types, and how it works in the HPE Ezmeral Data Fabric.
- Low-cost storage as an additional storage tier in the Data Fabric cluster for storing file data that is less frequently accessed ("warm" data) in erasure-coded volume.
- 3rd party cloud object storage as an additional storage tier in the Data Fabric cluster to store file data that is rarely accessed or archived ("cold" data).
See also: Working with Tiered Volumes
Where is data tiered?
For "warm" data, the Data Fabric allows you to offload data to specific nodes or low-cost hardware in a topology. The Data Fabric uses erasure coding to protect data on the low-cost hardware. Erasure coding also reduces the storage overhead in the range of 1.2x-1.5x. See Overview of Tiers for more information on erasure coding.
For "cold" data, the Data Fabric allows you to easily offload your cluster data to public, private, and hybrid clouds. You can offload data to remote cloud from vendors such as Amazon AWS, Google Cloud Platform, Microsoft Azure, IBM Cleversafe, Hitachi HCP, and Minio. This allows you to tap into cloud-scale capacity.
The Data Fabric allows you to configure a volume at the time of volume creation for either warm or cold tier, but not both. If you do not know the type of tier to associate with the volume, you can still create a volume that is tiering-enabled and associate a specific tier later with the volume. However, volumes not enabled for tiering at the time of volume creation cannot be enabled for tiering after the volume is created. You cannot modify the type of tier associated with the volume after the volume is created.
When you create a volume and configure it for warm or cold tiering — associating a warm or cold tier, a storage policy (referred to as rule in the CLI), and an offload schedule — the Data Fabric automatically moves the data out of the volume and into the tier, and purges the data in the volume on the the Data Fabric cluster to release the disk space on the the Data Fabric cluster. However, for tiering-enabled volumes, the amount of hard quota you set is the total space allocated for the volume irrespective of the location (cluster or tier) of the volume data. Writes fail when volume disk space usage reaches the quota assigned for the volume whether or not volume data is local (on the cluster) or remote (on the tier). Also, if you want to recall volume data back to the the Data Fabric cluster, you must have the disk space in the volume equivalent to the amount of data being recalled from the tier. You can retrieve and view the disk space usage metric, including the amount of data offloaded to the tier, for a tiering-enabled volume using the Control System, the CLI, and REST API.
How frequently is data offloaded?
- Erasure coding (warm tier), the Data Fabric applies a default criteria, which is a modification timestamp of 1 day, for offloading data.
- Remote archiving (cold tier), the Data Fabric does not associate a default criteria. You can use the Control System, CLI, and REST API to manually trigger an offload of volume data.
- Erasure coding (warm tier), the Data Fabric automatically uses the default Automatic Tiering Scheduler, which uses internal policies to decide when to schedule the offload operation.
- Remote archiving (cold tier), the Data Fabric does not associate a default schedule. You can use the Control System, CLI, and REST API to manually trigger an offload of volume data.
Even when you manually trigger an offload, the Data Fabric offloads data only if the data meets the criteria defined in the storage policy. In addition, for warm-tier volumes, the Data Fabric offloads data only if the object (stripe) has data exceeding 90% of the object payload; if an object has data less than 90% of the object payload, the object is not offloaded and the metadata tables are not updated. For more information, see Data Offload and Purge.
What is the MAST Gateway?
The Data Fabric automated storage tiering (MAST) Gateway acts as the centralized entry point for all the tiering operations. CLDB assigns tiering-enabled volumes to MAST Gateways for processing all tiering operations for the volume. For more information, see Overview of MAST Gateway.
How is compressed and encrypted data transferred and stored?
- The warm-tier volume is enabled for data-at-rest encryption
(
dare
). - The cold-tier volume is enabled for tier encryption
(
tierencryption
).
Data in the volume is transferred and stored as-is, compressed or uncompressed, on the tier. You can set up replication, snapshots, and mirror volumes for tiering-enabled volumes. See Data Replication, Snapshots, Mirroring, Auditing, and Metrics Collection for more information.
How are reads, writes, and deletes handled?
When a client tries to read offloaded data, the Data Fabric processes the read request of the warm-tiered and cold-tiered standard and mirror volume data differently. Similarly, when a client writes to a tiered volume, the Data Fabric processes appends and overwrites differently. See Data Reads, Writes, and Recalls for more information.
Data, once offloaded, is purged on the the Data Fabric cluster to release the disk space. When you delete an entire file, part of a file, or a snapshot, corresponding objects are removed from the tier also. See Data Compaction for more information.
Enabling Tiering
To enable tiering, see Enabling Tiering
.