Erasure coding
In HPE Ezmeral Runtime Enterprise, HPE Ezmeral Data Fabric on Kubernetes supports storage tiers that use erasure coding for data. Erasure coding (EC) is a method of protecting data on lower-cost hardware that also reduces storage overhead in the range of 1.2x-1.5x. EC ensures that if data becomes corrupted, it can be reconstructed using information about the data that is present elsewhere.
In HPE Ezmeral Runtime Enterprise, HPE Ezmeral Data Fabric on Kubernetes provides rule-based automated data tiering functions to offload less frequently used data to specific nodes or low-cost hardware. Typically, erasure coding is used when storing "warm" tier data. Erasure coding is a method of protecting data on lower-cost hardware that also reduces storage overhead in the range of 1.2x-1.5x.
For an excellent introduction to erasure coding, see this tech talk.
Erasure coding (EC) is a data protection method in which data is broken into fragments, expanded and encoded with redundant data pieces, and stored across a set of different locations or storage media. EC ensures that if data becomes corrupted, it can be reconstructed using information about the data that is present elsewhere.
A key decision involved in setting up erasure coding is selecting the erasure coding scheme. Considerations include how many nodes you can afford, how long you can tolerate waiting for a failed data node to be rebuilt, and how many failures you expect to occur.
Erasure coding schemes are expressed as numbers separated by the +
(plus
sign):
-
When the scheme does not include local parity, two numbers are used. For example
10+2
indicates a scheme without local parity where10
is the number of data nodes and2
is the number of parity nodes. Generally these schemes are expressed asm+n
. -
When the scheme includes local parity, three numbers are used. For example
10+2+2
indicates a scheme with local parity where10
is the number of data nodes, followed by2
local parity nodes, followed by2
global parity nodes.
For erasure coding schemes without local parity, the recommended total number of nodes
is m+2n
(rather than m+n
) to ensure Data-Fabric self-healing
and proper operation after n
failures. With m+2n
nodes,
n
failures will self-heal with no operator intervention. For example, the
recommended total number of nodes when you select a 3+2
erasure coding scheme
is seven: Three data nodes and two times the number of parity nodes.
Although data can continue to be read after experiencing n
failures with
only m+n
nodes, performance is significantly reduced because each read
requires rebuilding data fragments. Also, manual intervention is required to protect the data
from further failures. Data will not be erasure coded if only m
nodes are
available.
In erasure coding schemes with local parity, data nodes are divided into groups, with each group having a local parity node. Recovery from a failed node is faster because fewer nodes must be read when rebuilding the failed node.
For detailed information about erasure coding and a list of recommended coding schemes, see Erasure Coding Scheme for Data Protection and Recovery in the HPE Ezmeral Data Fabric documentation.