Preparing Clusters for Table Replication
Preparing clusters for table replication includes configuring gateways on destination
clusters, configuring the mapr-clusters.conf
file on the source
cluster, and, if the clusters are secure, setting up secure communications between the
clusters. After you prepare the clusters for table replication, you can setup replication
between tables.
Before You Begin
The following topics identify concepts and tasks that you need to do before setting up your environment for table replication.
- Plan which replication topology you want to use. For information about the various topologies, see Supported replication topologies.
- In general, if you are replicating tables, you should store them in their own
volumes to avoid overlap with volume mirroring. Otherwise, if a source volume
fails, you may have a scenario where a table in a promoted mirror lags behind
the table's replica.
For example, suppose
/vol
mirrors to/vol.mirror
and contains a tablesrcTab
that replicates to/replVol/replTab
. If/vol
fails,/vol.mirror/srcTab
may lag/replVol/replTab
when/vol.mirror
is promoted.To avoid this problem, starting with the 6.1 release, after HPE Ezmeral Data Fabric Database promotes a mirror volume, replication terminates with
REPLICA_STATE_UNEXPECTED
for any tables in that volume.The following sample output shows this behavior:
[mapr]# /opt/mapr/bin/maprcli table replica list -path /vol.mirror/srcTab -refreshnow true -json { "timestamp":1534805233244, "timeofday":"2018-08-20 03:47:13.244 GMT-0700 PM", "status":"OK", "total":1, "data":[ { "cluster":"mirrorSrc", "table":"/replVol/replTab", "type":"MapRDB", "replicaPath":"/replVol/replTab", "replicaState":"REPLICA_STATE_UNEXPECTED", "paused":false, "throttle":false, "idx":1, "networkencryption":false, "synchronous":false, "networkcompression":"lz4", "propagateExistingData":false, "isUptodate":true, "minPendingTS":0, "maxPendingTS":0, "bytesPending":0, "putsPending":0, "bucketsPending":0, "uuid":"8b4563e1-884d-7852-f257-078c397b5b00", "copyTableCompletionPercentage":0, "errors":{ "Code":"ErrReplicaTableUpstreamMismatch", "Host":"10.10.104.35", "Msg":"OpenStream: Upstream table does not match original Upstream cluster mirrorSrc table /replVol/replTab" } } ] }
This change in behavior applies to only tables that have replication enabled starting in 6.1. See Table Replication States for more details.
- Ensure that your user ID has the
readAce
permission on the volume where the source tables are located and thewriteAce
permission on the volumes where the replicas are located. For information about how to set permissions on volumes, see Setting Whole Volume ACEs. - Ensure that you have administrative authority on the clusters that you plan to use.
- If you upgraded your source cluster from a previous version of data-fabric, enable
table replication by running this
maprcli
command:maprcli cluster feature enable -name mfs.feature.db.repl.support
- Depending on your use case, replication requires the installation of gateways and may also require the HBase client. For more information about installation requirements, see Service Layout Guidelines for Replication
Setting Up Table Replication
The following steps show how to set up your environment for table replication including setup for secure clusters.
- In the
mapr-clusters.conf
file on every node in your source cluster, add an entry that lists the CLDB nodes that are in the destination cluster. This step is required so that the source cluster can communicate directly with the destination cluster's CLDB nodes. See mapr-clusters.conf for the format to use for the entries. - On the destination cluster, configure gateways through which the source cluster sends updates to the destination cluster. See Configuring Gateways for Table and Stream Replication.
- If your clusters are secure, configure each cluster so that you can access them all
from one cluster. This way, you need not log into each secure cluster separately
and run
maprcli
commands locally on them. See Configuring Secure Clusters for Running Commands Remotely for more information. - If your clusters are secure, add a cross-cluster ticket to the source cluster, so that it can replicate data to the destination cluster. See Configuring Secure Clusters for Cross-Cluster Mirroring and Replication.
- Optional: If your clusters are secure, configure your source cluster so that
you can use the Control System to set up and administer table replication from
the source to the destination cluster. These steps make it convenient to use the Control System for setting up and managing replication involving two secure clusters. However, before following them, perform these prerequisite tasks.NOTE
- Ensure that both clusters are managed by the same team or group. The UIDs and GIDs of the users that are able to log in to the Control System on the source cluster must exactly match their UIDs and GIDs on the destination cluster. This restriction applies only to access to both clusters through the Control System, and does not apply to access to both clusters through the maprcli. If the clusters are managed by different teams or groups, use the maprcli instead of the Control System to set up and manage table replication involving two secure clusters.
- Ensure that the proper file-system and table permissions are in place on both clusters. Otherwise, any user who can log into the Control System and has the same UID or GID on the destination cluster will be able to set up replication either from the source cluster to the destination cluster or vice versa. A user could create one or more tables on the destination cluster, enable replication to them from the source cluster, load the new tables with data from the source cluster, and start replication. A user could also create tables on the source cluster, enable replication to them from tables in the destination cluster, load the new tables with data from the destination cluster, and start replication.
- On the source cluster, generate a service ticket by using the
maprlogin
command:maprlogin generateticket -type service -cluster <destination cluster> -user mapr -duration <duration> -out <output folder>
Where
-duration
is the length of time before the ticket expires. You can specify the value in either of these formats:[Days:]Hours:Minutes
Seconds
- To every node of the source cluster, add the service ticket to the file
/opt/mapr/conf/mapruserticket
file that was created when you secured the source cluster:at <path and filename of the service ticket> >> /opt/mapr/conf/mapruserticket
- Restart the web server by running the
maprcli node services
command. For the syntax of this command, see node services. - Add the following two properties to the
core-site.xml
file. For Hadoop 2.7.0, edit the file/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/core-site.xml
.<property> <name>hadoop.proxyuser.mapr.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.mapr.groups</name> <value>*</value> </property>