Data Tiering

Conceptual information about data tiering.

Data that is active and frequently accessed is categorized as hot data.

Data that is rarely accessed is categorized as warm data or cold data.

Hot data, warm data, and cold data is identified based on the rules and policies set on by the administrator.

The storage used to store hot data is referred to as the hot-tier. The storage used to store warm data is referred to as an EC-tier, and the mechanism to store cold data is referred to as the cold tier.

Data starts off as hot data when it is first written to local storage on a fabric. Data can be termed as warm or cold based on the storage policies that are configured for the data present on Data Fabric.

Data stored on a fabric requires three times the amount of disk space of the regular volume on premium hardware due to replication (default being 3). After offloading to the cloud, the space used by data (including data in the namespace container) in the volume on the data fabric cluster is freed and only the metadata of the volume in the namespace container is 3-way replicated on the data fabric cluster.

Data can be set up to be automatically offloaded to a volume on a low-cost storage alternative, called a warm tier, on the data fabric cluster. Alternatively, data can be offloaded to a low-cost storage on a third party cloud object store, called a cold tier, like S3.

Data Fabric provides rule-based automated tiering functionality that allows you to seamlessly integrate with:
  • Low-cost storage as an additional storage tier in the data fabric cluster for storing file data that is less frequently accessed (warm data) in erasure-coded volume.
  • 3rd party cloud object storage as an additional storage tier in the data fabric cluster to store file data that is rarely accessed or archived (cold data).
In this way, valuable on-premise storage resources can be used for more active or hot file data and applications, while warm and/or cold file data can be retained at minimum cost for compliance, historical, or other business reasons. The data fabric provides consistent and simplified access to and management of the data.

Data, once offloaded, is purged on the the data fabric cluster to release the disk space. When you delete an entire file, part of a file, or a snapshot, corresponding objects are removed from the tier

When a client tries to read offloaded data, the data fabric processes the read request of the warm-tiered and cold-tiered standard and mirror volume data differently. Similarly, when a client writes to a tiered volume, the data fabric processes appends and overwrites differently.

To manage data offloading, you must have created storage policies. See Administering Storage Policies to learn more about managing storage policies.

To offload data, you must create remote targets. See Creating a Remote Target to add a new remote target.

You can schedule data offloading. See <add link to schedules> for further information on creating schedule.