Data Replication, Snapshots, Mirroring, Auditing, and Metrics Collection
Provides an overview of what Data Replication, Snapshots, Mirroring, Auditing, and Metrics Collection are.
Replication
Data from one of the replica containers is first offloaded and then the data in all the replica containers is purged. file system only stores the metadata after data is offloaded. The offload is considered successful only when data on all active replicas have been purged (or removed from the storage pool to release the disk space on the data-fabric filesystem). If, during the offload, the node on which one of the replicas reside is down, the data on that container is purged once the node comes back up.
In the tiering architecture, although data is moved to the storage tier, the namespace of the volume continues to be 3-way replicated. So, the metadata related to namespace container has 3x cost.
The offloaded replica containers are recalled if/when the whole volume is recalled. When a replica is reinstituted to the cluster as a result of a recall operation, a re-synchronization happens to bring all the replicas up to date from the designated master container.
Snapshots
You can associate a snapshot schedule with tiering-enabled volumes. When the data in the volume is offloaded, associated snapshots are also offloaded and file system only stores the metadata. If the whole volume is recalled, the snapshots are also recalled to the data-fabric filesystem. When offloading recalled snapshots, the rules for data offload apply to snapshots as well.
Mirroring
You can create tiering-enabled source volumes and associate them with tiering-enabled mirror volumes. You cannot associate tiering-enabled mirror volumes with standard volumes that are not tiering-enabled and vice versa. Only homogeneous combination of mirror and standard volumes are supported; heterogeneous combination of mirror and standard volumes are not supported.
When a synchronization of the tiering-enabled mirror volume with the (local or remote) tiering-enabled source volume is triggered (either manually or automatically based on a schedule), the mirror volume synchronizes with the source volume if source volume data is local (and not yet tiered). On the other hand, if the source volume data is tiered, the tiering-enabled mirror volume synchronizes with the tiered data fetched by the MAST Gateway that is assigned to the source volume. Incremental changes in the mirror volume are offloaded based on the offload rules associated with the tiering-enabled mirror volume.
- Using Tiering-Enabled Mirror Volumes for Disaster Recovery
- You can create a secondary, cost optimized disaster recovery cluster for a primary
three-way replicated cluster. To do this, create two clusters — a primary
tiering-enabled cluster with no active schedule to automatically offload data and an
associated secondary cluster where primary cluster data is mirrored and then
aggressively offloaded to the tier. While the primary or source cluster continues to be
three-way replicated, if the the secondary, disaster recovery cluster data is:
- Erasure coded (warm tier), it provides space savings in the range of 1.2x-1.5x.
- On a third-party cloud storage (cold tier), it can be three-way replicated on a low-cost storage alternative.
tierjobstatus
command for the offload or recall job shows
AbortedInternal status.Auditing
The data-fabric audit feature lets
you log audit records of cluster-administration operations and operations on the data in the
volume. Scheduled (and automatically triggered) tiering operations such as offload and
compaction are not audited. However, if auditing is enabled at the cluster level, the
manually triggered volume-level tiering operations such as offload, recall, abort, etc. are
audited in the CLDB audit logs. For example, you can see a record similar to the following
in the /opt/mapr/logs/cldbaudit.log.json
file for volume offload
command:
{"timestamp":{"$date":"2018-06-07T15:34:28.580Z"},"resource":"vol1","operation":"volumeOffload","uid":0,"clientip":"10.20.30.40","status":0}
If auditing is enabled for data in the tiering-enabled volume and files within, file-level
tiering operations such as offload, recall, etc. triggered using the REST API, hadoop, and dot-interface are audited in the
FS audit logs
(/var/mapr/local/<hostname>/audit/5661/FSAudit.log-<*>.json
file).See Auditing Data Access Operations for the
list of file-level tiering operations that are audited. You can selectively enable or
disable auditing of these operations. See Selective Auditing of File-System, Table, and Stream Operations Using the CLI for more information. For
example, you can see records similar to the following in the
/var/mapr/local/<hostname>/audit/5661/FSAudit.log-<*>.json
file for
file offload
command:
/mapr123/Cloudpool19//var/mapr/local/abc.sj.us/audit/5660/FSAudit.log-2018-09-12-001.json:1:{"timestamp":{"$date":"2018-09-12T05:47:04.199Z"},"operation":"FILE_OFFLOAD","uid":0,"ipAddress":"10.20.35.45","srcFid":"3184.32.131270","volumeId":16558233,"status":0}
Both the tier rule list
and
tier list
commands are audited in
the /opt/mapr/logs/cldbaudit.log.json
file as well as the
/opt/mapr/mapr-cli-audit-log/audit.log.json
file. The record in the audit
log might look something similar to the following:
{"timestamp":{"$date":"2018-06-13T09:15:24.004Z"},"resource":"cluster","operation":"offloadRuleList","uid":0,"clientip":"10.10.81.14","status":0}
{"timestamp":{"$date":"2018-06-13T09:14:42.304Z"},"resource":"cluster","operation":"tierList","uid":0,"clientip":"10.10.81.14","status":0}
When auditing operations like tierjobstatus
and
tierjobabort
, the coalesce interval set at the volume level is not
honored. You may see multiple records of the same operation from the same client in the
log.
Read requests processed using cache-volumes or erasure-coded volumes are not audited because when the file is accessed, the request first goes to the front-end volume and the operation is audited there. The audit record contains the ID of the front-end volume (volid) and primary file ID (fid). However, the write to the cache-volume for a volume-level recall of data is audited in the audit logs on the file server hosting the cache-volume with the primary file ID (fid). The write to the cache-volume for a file-level recall of data is not audited.
In addition, you can enable auditing of offload and/or recall events at both the volume and
file levels by enabling auditing for filetieroffloadevent
and
filetierrecallevent
at the volume level. By default, auditing is disabled
for filetieroffloadevent
and filetierrecallevent
. If you
enable auditing for filetieroffloadevent
and
filetierrecallevent
using the dataauditops
parameter
with the volume create
or
volume modify
command, the
following are audited in the FS audit log:
- For
filetieroffloadevent
, files offloaded by running thefile offload
command or (only) files purged on Data Fabric file system after runningvolume offload
command. - For
filetierrecallevent
, files recalled by running thefile recall
orvolume recall
command.
For example, you can see a record similar to the following in the
/var/mapr/local/<hostname>/audit/5661/FSAudit.log-<*>.json
file if
auditing is enabled at the volume-level for filetieroffloadevent
:
abc.sj.us/audit/5661/FSAudit.log-2018-06-07-001.json:{"timestamp":{"$date":"2018-06-07T07:27:58.810Z"},"operation":"FILE_TIER_OFFLOAD_EVENT","uid":2000,"ipAddress":"1}
For more information:
Collecting Metrics
{"ts":1534960230000,"vid":248672388,"RDT":0.0,"RDL":0.0,"RDO":0.0,"WRT":363622.7,"WRL":7209.0,"WRO":2580.0}
{"ts":1534960250000,"vid":248672388,"RDT":363686.7,"RDL":2856.0,"RDO":2847.0,"WRT":0.0,"WRL":0.0,"WRO":0.0}
Tiering-related
operations do not generate metrics records. That is, volume and file level offload, recall,
and abort operations are not logged in the metrics log. However, the volumes created to
support tiering (such as the cache-volume, the metadata volume, and the erasure-coded
volume) have metrics collection enabled and the metrics records for these volumes are logged
with the ID of the associated parent or front-end volume. That is, read operations on the
the cache-volume are logged with the ID of the associated front-end volume. For example, you
can see records similar to the following in the metrics log file for the
volume:{"ts":1534968850000,"vid":209801522,"RDT":6328.5,"RDL":161.0,"RDO":158.0,"WRT":0.0,"WRL":0.0,"WRO":0.0}
{"ts":1534968860000,"vid":209801522,"RDT":234669.7,"RDL":5241.0,"RDO":5143.0,"WRT":0.0,"WRL":0.0,"WRO":0.0}
See
Enabling Volume Metric Collection and Collecting Volume Metrics for more
information.