Understanding Replication
Describes how replication works, and how to configure the replication factor.
Volumes are stored as pieces called containers that contain files, directories, and other
data. By default, the maximum container size is 32 GB. The HPE Ezmeral Data Fabric administrator
sets the maximum container size using the cldb.container.sizemb
parameter
(see the config commands). Containers are replicated to
protect data. Normally, each container has three copies stored on separate nodes to provide
uninterrupted access to all data, even if a node fails.
For each volume, you can specify a desired and minimum data replication factor, and a desired and minimum namespace (name container) replication factor.
When enabled, the CLDB manages the namespace container replication separate from the data container replication. Use this capability when you have low volume replication, but want to have higher namespace replication.
- The replication factor is the number of replicated copies that you need for
normal operation and data protection. When the number of copies falls below
the desired replication factor, but remains equal to or above the minimum
replication factor, the CLDB actively creates additional copies of the
container while trying to minimize the impact of making an additional copy
of the container. Re-replication occurs after the timeout specified in the
cldb.fs.mark.rereplicate.sec
parameter (configurable using the configuration API). The minimum replication factor is 1 and the maximum is 6 (default: 3). - The minimum value of the minimum replication factor is the smallest number of
copies you need in order to adequately protect against data loss. When the
replication factor falls below this minimum value, re-replication occurs
aggressively if data is being actively written to the container. If the
enforceminreplicationforio
property is set totrue
, writes succeed only when the minimum replication factor requirements are met. If theenforceminreplicationforio
property is set totrue
and the minimum number of copies are not available, the client is asked to retry. In the case of a:- Hard mount, the client might try for up to 10 minutes and then return an error
- Soft mount, the client might return an error
enforceminreplicationforio
property (configurable at the volume level) is set totrue
, the requirement to maintain a minimum number of copies is not enforced during writes until new copies of all containers associated with the volume are created.
- The namespace replication factor is the number of namespace container
replicated copies that you need for normal operation and data protection.
When the number of copies falls below the desired replication factor, but
remains equal to or above the minimum replication factor, the CLDB actively
creates additional copies of the container while trying to minimize the
impact of making an additional copy of the container. Re-replication occurs
after the timeout specified in the
cldb.fs.mark.rereplicate.sec
parameter (configurable using the configuration API). The minimum replication factor is 1 and the maximum is 6 (default: 3). - The minimum value of the minimum namespace replication factor is the minimum
number of namespace container replicated copies you want in order to
adequately protect against data loss. When the replication factor falls
below this minimum value, re-replication occurs aggressively if data is
being actively written to the container. If the
enforcemineplicationforio
property (configurable at the volume level) is set totrue
, writes succeed only when this minimum value of the minimum replication factor requirements are met. If this property is set totrue
and minimum number of copies are not available, the client is asked to retry. In the case of a:- Hard mount, the client tries for up to 10 minutes and then return an error
- Soft mount, the client returns an error
enforceminreplicationforio
property is set totrue
, the presence of the minimum number of copies is not enforced during writes until new copies of all containers associated with the volume are created.NOTEThe maximum replication setting of 6 does not apply for mapr.cldb.internal volume containers (CID-1). The number of CID-1 container replicas are always equivalent to the number of CLDB nodes in the cluster.
If any containers in the CLDB volume fall below the minimum value of the minimum
replication factor,
the cluster is inaccessible until aggressive re-replication restores the minimum level of
replication. If a disk failure is detected, any data stored on the failed disk is
re-replicated without regard to the timeout specified in the
cldb.fs.mark.rereplicate.sec
parameter.
If all copies of a container, which are neither under nor over replicated, are on the same rack, HPE Ezmeral Data Fabric automatically detects and distributes the copies, such that they are all not on the same rack, after 12 hours. If a container is under replicated and HPE Ezmeral Data Fabric is unable to find a different rack for the new copy, the creation of the copy is deferred. If another rack is unavailable for the new copy after 3 hours, HPE Ezmeral Data Fabric creates a copy of the container on the same rack and if this results in all copies of the container being on the same rack, HPE Ezmeral Data Fabric distributes the copies after 12 hours. Also, during replication, HPE Ezmeral Data Fabric tries to defer the scenarios where all copies end up on the same rack. As per deferring policy:
- If a container has copies less than the "minimum replication" but greater than 2 and if both copies end up on the same rack, then HPE Ezmeral Data Fabric tries to create the third copy on a different rack for up to 3 hours.
- If a container has copies more than the minimum but less than the desired and if all copies are on the same rack, then HPE Ezmeral Data Fabric tries to create the next copy on a different rack for up to 3 hours.
If you do not set the namespace (NS) replication and minimum namespace replication values
explicitly, they assume the same values as (data) replication and minimum replication
respectively. This means that all changes to (data) replication
and
minreplication
parameters are also reflected in
nsreplication
and nsminreplication
. If
nsreplication
or nsminreplication
is modified or
specified during creation, nsreplication
and
nsminreplication
start assuming values different from
replication
and minreplication
.
Table Replication vs Mirroring - Understanding the Differences
This section describes the advantages of both Table Replication and Mirroring, to let you determine the best option for your use case.
Advantages of Table Replication- Table replication replicates each table update instantaneously, in seconds (subject to compute and network resources). Mirroring has a much larger RTO (recovery time objective), in minutes.
- Table replication also transmits lesser data because it just transmits the actual physical rows and nothing else.
- In table replication, both the end points are READ-WRITE masters with the option of two-way multi-master replication.
- Table replication proceeds from Source Table > Destination Gateway(s) > Destination Table, which provides reasonable isolation between the two end point clusters. The source table talks only to the Destination Gateway(s).
For tables and streams, table replication is usually the right choice. However, there are exemptions where mirroring is the best choice.
Advantages of Mirroring- Since a volume mirror represents a moment in time, there is a higher probability of recovering from a volume than from multiple tables.
- You can retain old states of a mirror. If you have deleted a bunch of data in your tables and table replication has replicated those changes, then you can recover your data from a mirror.
- Mirrors are helpful during development. Create a read-write mirror and use for development. Revert it to the last mirrored state and start over. The point is that you can revert the entire volume to a known state, as needed.
- Use local mirror(s) to increase read throughput.
- You can use mirrors to obtain traceability and reproducibility during data operations such as machine learning. You can have separate mirrors for different clusters, and operations on one mirror do not affect the other.