Erasure coding

In HPE Ezmeral Runtime Enterprise, HPE Ezmeral Data Fabric on Kubernetes supports storage tiers that use erasure coding for data. Erasure coding (EC) is a method of protecting data on lower-cost hardware that also reduces storage overhead in the range of 1.2x-1.5x. EC ensures that if data becomes corrupted, it can be reconstructed using information about the data that is present elsewhere.

In HPE Ezmeral Runtime Enterprise, HPE Ezmeral Data Fabric on Kubernetes provides rule-based automated data tiering functions to offload less frequently used data to specific nodes or low-cost hardware. Typically, erasure coding is used when storing "warm" tier data. Erasure coding is a method of protecting data on lower-cost hardware that also reduces storage overhead in the range of 1.2x-1.5x.

TIP

For an excellent introduction to erasure coding, see this tech talk.

Erasure coding (EC) is a data protection method in which data is broken into fragments, expanded and encoded with redundant data pieces, and stored across a set of different locations or storage media. EC ensures that if data becomes corrupted, it can be reconstructed using information about the data that is present elsewhere.

A key decision involved in setting up erasure coding is selecting the erasure coding scheme. Considerations include how many nodes you can afford, how long you can tolerate waiting for a failed data node to be rebuilt, and how many failures you expect to occur.

Erasure coding schemes are expressed as numbers separated by the + (plus sign):

  • When the scheme does not include local parity, two numbers are used. For example 10+2 indicates a scheme without local parity where 10 is the number of data nodes and 2 is the number of parity nodes. Generally these schemes are expressed as m+n.

  • When the scheme includes local parity, three numbers are used. For example 10+2+2 indicates a scheme with local parity where 10 is the number of data nodes, followed by 2 local parity nodes, followed by 2 global parity nodes.

For erasure coding schemes without local parity, the recommended total number of nodes is m+2n (rather than m+n) to ensure Data-Fabric self-healing and proper operation after n failures. With m+2n nodes, n failures will self-heal with no operator intervention. For example, the recommended total number of nodes when you select a 3+2 erasure coding scheme is seven: Three data nodes and two times the number of parity nodes.

Although data can continue to be read after experiencing n failures with only m+n nodes, performance is significantly reduced because each read requires rebuilding data fragments. Also, manual intervention is required to protect the data from further failures. Data will not be erasure coded if only m nodes are available.

In erasure coding schemes with local parity, data nodes are divided into groups, with each group having a local parity node. Recovery from a failed node is faster because fewer nodes must be read when rebuilding the failed node.

For detailed information about erasure coding and a list of recommended coding schemes, see Erasure Coding Scheme for Data Protection and Recovery in the HPE Ezmeral Data Fabric documentation.