Understanding Replication

Describes how replication works, and how to configure the replication factor.

Volumes are stored as pieces called containers that contain files, directories, and other data. By default, the maximum container size is 32 GB. The HPE Ezmeral Data Fabric administrator sets the maximum container size using the cldb.container.sizemb parameter (see the config commands). Containers are replicated to protect data. Normally, each container has three copies stored on separate nodes to provide uninterrupted access to all data, even if a node fails.

For each volume, you can specify a desired and minimum data replication factor, and a desired and minimum namespace (name container) replication factor.

When enabled, the CLDB manages the namespace container replication separate from the data container replication. Use this capability when you have low volume replication, but want to have higher namespace replication.

NOTE
The namespace container parameters, nsreplication or nsminreplication, must be the same or larger than the equivalent data replication parameter, replication or minreplication.
  • The replication factor is the number of replicated copies that you need for normal operation and data protection. When the number of copies falls below the desired replication factor, but remains equal to or above the minimum replication factor, the CLDB actively creates additional copies of the container while trying to minimize the impact of making an additional copy of the container. Re-replication occurs after the timeout specified in the cldb.fs.mark.rereplicate.sec parameter (configurable using the configuration API). The minimum replication factor is 1 and the maximum is 6 (default: 3).
  • The minimum value of the minimum replication factor is the smallest number of copies you need in order to adequately protect against data loss. When the replication factor falls below this minimum value, re-replication occurs aggressively if data is being actively written to the container. If the enforceminreplicationforio property is set to true, writes succeed only when the minimum replication factor requirements are met. If the enforceminreplicationforio property is set to true and the minimum number of copies are not available, the client is asked to retry. In the case of a:
    • Hard mount, the client might try for up to 10 minutes and then return an error
    • Soft mount, the client might return an error
    The minimum value of the minimum replication factor is 1 and the maximum value is 6 (default:2). In all cases, the minimum replication factor cannot be greater than the replication factor. When you increase the minimum replication factor, if the enforceminreplicationforio property (configurable at the volume level) is set to true, the requirement to maintain a minimum number of copies is not enforced during writes until new copies of all containers associated with the volume are created.
  • The namespace replication factor is the number of namespace container replicated copies that you need for normal operation and data protection. When the number of copies falls below the desired replication factor, but remains equal to or above the minimum replication factor, the CLDB actively creates additional copies of the container while trying to minimize the impact of making an additional copy of the container. Re-replication occurs after the timeout specified in the cldb.fs.mark.rereplicate.sec parameter (configurable using the configuration API). The minimum replication factor is 1 and the maximum is 6 (default: 3).
  • The minimum value of the minimum namespace replication factor is the minimum number of namespace container replicated copies you want in order to adequately protect against data loss. When the replication factor falls below this minimum value, re-replication occurs aggressively if data is being actively written to the container. If the enforcemineplicationforio property (configurable at the volume level) is set to true, writes succeed only when this minimum value of the minimum replication factor requirements are met. If this property is set to true and minimum number of copies are not available, the client is asked to retry. In the case of a:
    • Hard mount, the client tries for up to 10 minutes and then return an error
    • Soft mount, the client returns an error
    The system does not wait for lost replicas to become available again. The minimum value of the muinimum replication factor is 1 and the maximum value is 6 (default: 2). In all cases, the minimum replication factor cannot be greater than the replication factor. When you increase the minimum replication factor, if the enforceminreplicationforio property is set to true, the presence of the minimum number of copies is not enforced during writes until new copies of all containers associated with the volume are created.
    NOTE
    The maximum replication setting of 6 does not apply for mapr.cldb.internal volume containers (CID-1). The number of CID-1 container replicas are always equivalent to the number of CLDB nodes in the cluster.

If any containers in the CLDB volume fall below the minimum value of the minimum replication factor, the cluster is inaccessible until aggressive re-replication restores the minimum level of replication. If a disk failure is detected, any data stored on the failed disk is re-replicated without regard to the timeout specified in the cldb.fs.mark.rereplicate.sec parameter.

If all copies of a container, which are neither under nor over replicated, are on the same rack, HPE Ezmeral Data Fabric automatically detects and distributes the copies, such that they are all not on the same rack, after 12 hours. If a container is under replicated and HPE Ezmeral Data Fabric is unable to find a different rack for the new copy, the creation of the copy is deferred. If another rack is unavailable for the new copy after 3 hours, HPE Ezmeral Data Fabric creates a copy of the container on the same rack and if this results in all copies of the container being on the same rack, HPE Ezmeral Data Fabric distributes the copies after 12 hours. Also, during replication, HPE Ezmeral Data Fabric tries to defer the scenarios where all copies end up on the same rack. As per deferring policy:

  • If a container has copies less than the "minimum replication" but greater than 2 and if both copies end up on the same rack, then HPE Ezmeral Data Fabric tries to create the third copy on a different rack for up to 3 hours.
  • If a container has copies more than the minimum but less than the desired and if all copies are on the same rack, then HPE Ezmeral Data Fabric tries to create the next copy on a different rack for up to 3 hours.

If you do not set the namespace (NS) replication and minimum namespace replication values explicitly, they assume the same values as (data) replication and minimum replication respectively. This means that all changes to (data) replication and minreplication parameters are also reflected in nsreplication and nsminreplication. If nsreplication or nsminreplication is modified or specified during creation, nsreplication and nsminreplication start assuming values different from replication and minreplication.

Table Replication vs Mirroring - Understanding the Differences

This section describes the advantages of both Table Replication and Mirroring, to let you determine the best option for your use case.

Advantages of Table Replication
  1. Table replication replicates each table update instantaneously, in seconds (subject to compute and network resources). Mirroring has a much larger RTO (recovery time objective), in minutes.
  2. Table replication also transmits lesser data because it just transmits the actual physical rows and nothing else.
  3. In table replication, both the end points are READ-WRITE masters with the option of two-way multi-master replication.
  4. Table replication proceeds from Source Table > Destination Gateway(s) > Destination Table, which provides reasonable isolation between the two end point clusters. The source table talks only to the Destination Gateway(s).
When using mirroring, avoid placing table replication sources in the mirror volume. Doing so, creates problems if the mirror is broken and promoted.

For tables and streams, table replication is usually the right choice. However, there are exemptions where mirroring is the best choice.

Advantages of Mirroring
  1. Since a volume mirror represents a moment in time, there is a higher probability of recovering from a volume than from multiple tables.
  2. You can retain old states of a mirror. If you have deleted a bunch of data in your tables and table replication has replicated those changes, then you can recover your data from a mirror.
  3. Mirrors are helpful during development. Create a read-write mirror and use for development. Revert it to the last mirrored state and start over. The point is that you can revert the entire volume to a known state, as needed.
  4. Use local mirror(s) to increase read throughput.
  5. You can use mirrors to obtain traceability and reproducibility during data operations such as machine learning. You can have separate mirrors for different clusters, and operations on one mirror do not affect the other.