Best Practices for Backing Up HPE Ezmeral Data Fabric Information
Lists the best practices and performance considerations to follow when backing up HPE Ezmeral Data Fabric information.
To back up configuration information and data from your HPE Ezmeral Data Fabric cluster, you must install the appropriate Linux backup client from your backup software provider on your servers in your HPE Ezmeral Data Fabric cluster. Your backup client user must have the proper filesystem, and volume permissions. For details on how to configure HPE Ezmeral Data Fabric volume permissions see Creating Volume-level ACLs and Managing Access Controls.
Backup Configuration Data
By default, all installation files on the cluster, for each server in the cluster, are
stored in a single directory on each server in the HPE Ezmeral Data Fabric cluster. To ensure that you backup all the
configuration files, HPE Ezmeral Data
Fabric supported applications, as well as log files, back up the
/opt/mapr
directory for all servers in the cluster.
Note that the /opt/mapr
location includes all log files. Log files can
add a significant amount of data to your backup environment, so evaluate if they are needed
for your business continuity requirements. To backup just the configuration files for the
cluster, backup the /opt/mapr/conf
directory from all servers in the
cluster.
Backup Volume Data
HPE Ezmeral Data Fabric's recommended way to backup and restore data, is to enable and configure snapshots and volume mirroring for your data, to another HPE Ezmeral Data Fabric cluster. This step ensures that your business continuity and disaster recovery needs are met.
See the following links for setting up Snapshots, Mirroring, Table and Streams replications.
- Snapshots: Managing Snapshots
- Mirrors: Mirror Volumes
- Data Fabric DB Table Replication: Managing Table Replication
- Data Fabric Streams Replication: Stream Replication
If you do not have
a secondary cluster to mirror your data, back up your volumes by specifying the following path
in your Linux backup agent: /mapr/cluster_name/
- For example:
/mapr/my.cluster.com/
.
Performance Considerations When Backing Large Data Sets
You could run into bandwidth and performance limitations when you specify only one path to your HPE Ezmeral Data Fabric cluster, where your data in your volumes is stored on only one Linux host agent. The bottleneck can occur due to the size of that data you are backing up (large file sizes), or due to the number of files you have in your directory structure (millions of files in one directory).
HPE Ezmeral Data Fabric Linux Host 1 (hostname1):
/mapr/my.cluster.com/volume1
/mapr/my.cluster.com/volume2
/mapr/my.cluster.com/volume3
HPE Ezmeral Data Fabric Linux Host 2 (hostname2):
/mapr/my.cluster.com/volume4
/mapr/my.cluster.com/volume5
/mapr/my.cluster.com/volume6
HPE Ezmeral Data Fabric Linux Host 3 (hostname3):
/mapr/my.cluster.com/volume7
/mapr/my.cluster.com/volume8
/mapr/my.cluster.com/volume9
Preserve Metadata About the Volumes
To preserve metadata such as permissions and Access Control Expression (ACE) rules, run a pre-script process as themapr
user, in your backup agent. For example in your pre-script
configuration for your host agent for your cluster, you would run:
maprcli volume dump create
-name volume1 -dumpfile volume1_fulldump1 -e statefile1
Some
backup software may need "stderr" or "stdout" codes to run pre or post processing scripts
within their product. In that case, you may need to write a bash script to dump the file to a
location of your choice, and ensure that your backup agent is configured to backup that
directory. Consult your backup software provider's documentation. For information on creating
volume dumps, see Create and Maintain Volume Dump File.