HPE Data Fabric Database Binary DiffTables
Compares the row key, column family, timestamp, and value of each table cell in each specified HPE Data Fabric Database table. Then, it generates one or two directories with sequence files that you can use to either make a HPE Data Fabric Database table identical to its master or merge the rows from two HPE Data Fabric Database tables.
Sequence files are binary flat files. To convert the sequence file into a format that is easier to understand, use the FormatResults utility.
By default, the DiffTables utility considers both the source table and the destination table to be a master table. Therefore, it generates two directories with sequence files. These sequence files contain the puts required to update each table so that it contains a superset of the rows defined in both tables at the time at which the utility was run.
When you specify a master table, the DiffTables utility generates one of the following output directories:
- opsForDst. A directory containing sequence files that correspond to each put and delete required to make the destination table identical to the source table.
 - opsForSrc. A directory containing sequence files that correspond to each put and delete required to make the source table identical to the destination table.
 
A user with write permissions on a table can run the hbase
          org.apache.hadoop.hbase.mapreduce.Import command to implement the puts and
        deletes specified in the sequence files. 
Required Permissions
The user that runs the DiffTables utility must have the following permissions:
- The permission 
readAceon the volumes where the tables are located. - The permission for column reads (
readperm) on each table. 
For information about how to set permissions on volumes, see Setting Whole Volume ACEs.
For information about how to set permissions on tables, see Enabling Table and Stream Authorizations with ACEs.
mapr user is not treated as a
        superuser. HPE Data Fabric Database does not allow the mapr
        user to run this utility unless that user is given the relevant permission or permissions
        with access-control expressions.Syntax
hbase com.mapr.fs.hbase.tools.mapreduce.DiffTables
  -src <source table path>
    -dst <destination table path>
      -outdir <output directory> 
    [-master <src|dst> ] The master table to use for the diff. 
    [-first_exit] Exit when first difference is found.
    [-columns <comma separated list of family[:column]> ]
    [-starttime <start diff at timestamp>]
    [-endtime <end diff at timestamp>] 
    [-maxversions] <max number of versions to diff>
    [-mapreduce] <true|false> (default: true)]
    [-numthreads <numThreads> (default:16, valid only when -mapreduce is
      false)]
      [-cmpmeta <true|false> (default: true)]
    Parameters
| Parameter | Description | 
|---|---|
| src | 
                 The path to the source table. 
  | 
            
| dst | 
                 The path to the destination table. 
  | 
            
| master | 
                 The table that is considered to be the master table. The values are
                      | 
            
first_exit | 
              
                 By default, the utility compares all the table cells in the specified tables. Use this parameter if you want to exit after the first difference is identified between the tables. The parameter takes no value.  | 
            
outdir | 
              The path to a directory in which to place the generated sequence files. The utility creates the specified directory. If the specified directory already exists, the command fails. | 
| columns | By default, the utility compares all columns. If you do not want to compare all
                columns in the table, you can specify specific columns to include in the comparison.
                Provide a comma-separated list of column families or columns from a certain column
                family (column family:qualifier). For example, use the following syntax to include
                the column family purchases and the column stars in the reviews column family:
                  -columns purchases,reviews:stars
               | 
            
| starttime | By default, the utility compares all table values regardless of their associated timestamp. You can specify a timestamp to indicate the table cell version at which to start the comparison. The timestamp is a long integer value that is either user-defined or system-defined (epoch in milliseconds). Values with timestamps lower than the specified timestamp will not be included in the comparison. | 
| endtime | By default, the utility compares all table values regardless of their associated timestamp. Values with timestamps greater than or equal to the specified timestamp will not be included in the comparison. | 
| maxversions | By default, all versions from the master table are included in the comparison. If you do not want to compare all versions, use this parameter to specify the number of recent versions to include in the comparison. | 
mapreduce | 
              
                 A Boolean value that specifies whether or not to use
                  a MapReduce program to perform the comparison. The default, preferred method is to
                  use a MapReduce program ( When this parameter is set to   | 
            
numthreads | 
              When -mapreduce is
                    false, this parameter specifies the number of threads allocated
                  to perform the comparison. The default is 16. If additional CPU resources are
                  available, you might want to increase the number of thread to achieve better
                  performance. | 
            
cmpmeta | 
              A Boolean value that specifies whether or not to compare
                  table metadata such as column families and ACEs. The default is to compare
                  metadata (true). | 
            
Examples
The following example compares two HPE Data Fabric Database tables:[user@hostname ~]$ hbase com.mapr.fs.hbase.tools.mapreduce.DiffTables -src /customerTableA -dst /customerTableB -outdir /customerTableABCompare
2015-03-04 18:04:52,059 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
DiffTablesMeta completed. Metadata of the two tables is same.
...
Mapreduce job completed. The tables mismatch.
NUM_ROWS_MISMATCH_IN_SRC:32; NUM_ROWS_MISMATCH_IN_DST:30. Please check diff in /customerTableABCompare