gfsck

Describes how you can use the gfsck command, under the supervision of HPE Ezmeral Data Fabric Support or Engineering, to perform consistency checks and appropriate repairs on a volume, or a volume snapshot.

You can use the gfsck command when the local fsck either repairs or loses some containers at the highest epoch.

For an overview of using the GFSCK command, see Using Global File System Checking.

Permissions Required

Although you need to be the root user to run this command, checking tiering-enabled volumes requires you to be the mapr user.

Syntax

/opt/mapr/bin/gfsck
    [-h] [--help]
    [-c] [--clear]
    [-d] [--debug]
    [-b] [--dbcheck]
    [-r] [--repair]
    [-y] [--assume-yes]
    [-Gquick] [--check-tiermetadata-only]
    [-Gfull]  [--check-tiermetadata-full]
    [-Dquick] [--check-tierdata-presence]
    [-Dfull]  [--check-tierdata-crc]
    [-J] [--skip-tier-log-replay]
    [-D] [--crc]
    [-S3] [--only-object-store]
    [cluster=cluster-name (default=default)]
    [rwvolume=volume-name (default=null)]
    [snapshot=snapshot-name (default=null)]
    [snapshotid=snapshot-id (default=0)]
    [fid=fid (default=null)]
    [cid=cid (default=0)]
    [startCid=cid (default=0)]
    [rIdx=<repl index>] (replication index, only enabled with [-D]  [--crc]
    [fidThreads=<check crc thread count for fid>] (default:16, max:128)
    [cidThread=<check crc thread count for cid>] (default:16, max:128)
    [scanthreads=inode scanner threads count (default:10, max:1000)]   

Parameters

-h|--help
Description: Prints usage text
User who must use this option: Either root or mapr.
-c|--clear
Description: Clears previous warnings before performing the global filesystem check.
User who must use this option: Either root or mapr.
-d|--debug
Description: Provides information for debugging.
User who must use this option: Either root or mapr.
-b|--dbcheck
Description: Checks that every key in a tablet is within that tablet's startKey and endKey range. This option is I/O intensive, so use this option only if you suspect database inconsistency.
User who must use this option: root

When used with S3 volumes , this option validates that versionIds of objects in a given partition are less than maxVersionId stored in Partition Map Entry.

User who must use this option: mapr.

-r|--repair
Description: Indicates and repairs the inconsistencies detected by -GQuick, -GFull, -DQuick, and -DFull. Repair is not supported for snapshots and mirrors.
User who must use this option: root
-y|--assume-yes
Description: If specified, assumes that containers without valid copies (as reported by CLDB) are deleted automatically. If not specified, gfsck pauses for user input: yes to delete, no to exit gfsck, or ctrl-C to quit.
User who must use this option: Either root or mapr.
-D|--crc
Description: Provides validation of the CRC of the data present in the volume. The data can either be local or offloaded.

You can use this option at the volume, container, snapshot, and the filelet levels. gfsck reports corruption found at each level.

User who must use this option: root

-S3|--only-object-store
Description: Check objects in each bucket of a given Object Store volume and Object Store mirror volume for metadata inconsistencies.

User who must use this option: mapr.

cluster
Description: Specifies the name of the cluster (default: default cluster)
User who must use this option: Either root or mapr.
rwvolume
Description: Specifies the name of the volume (default: default cluster)
User who must use this option: Either root or mapr.
fid
Description: Checks data CRC for the master copy of the specified fid. To check any other copy, use the rIdx option. You must use fid only with the --crc option.
User who must use this option: mapr
cid
Description: Checks data CRC for the master copy of the specified container ID. To check any other copy, use the rIdx option. The default value of 0 denotes that all containers are checked. You must use cid only with the --crc option.
User who must use this option: mapr
startCid
Description: startCid is only applicable with the option --crc rwvolume=<volumename>.

Use this option to start verification from the specific container instead of starting from the first container of that volume, If not provided, the --crc option checks the data CRC of all the containers.

For example, assume that one particular volume has containers such as 205...2055...2900.. .. .. .. 3000 .. .. .. .. 5000.. .. .. .. .. 9999.

You can use the startCid option to start verification from container 3000, and all containers prior to 3000 will be skipped.

User who must use this option: mapr
rIdx
Description: Specifies the index (either fid or cid) of the copy of the data to check for errors.

Use only with -D or --crc and either fid or cid.

For example, -D fid:2510.32.131204 rIdx=0 only checks the data for copy 1 of the specified fid.

User who must use this option: mapr
fidThreads
Description: Specifies the number of threads for scanning fids (default:16, max:128). You must use fidThreads only with the --crc option.
User who must use this option: mapr
cidThreads
Description: Specifies the number of threads for scanning container IDs (default:16, max:128). You must use cidThreads only with the --crc option.
User who must use this option: mapr
scanthreads
Description: Specifies the number of threads for scanning inodes (default:10, max:1000)
User who must use this option: Either root or mapr.
snapshot
Description: Specifies the name of the snapshot (default: null)
User who must use this option: Either root or mapr.
snapshotid
Description: Specifies the snapshot ID (default: 0)
User who must use this option: Either root or mapr.
Tier Options
-Gquick|--check-tiermetadata-only
Description: Checks if the entries in the meta data tables maintained internally for objects and tiers (the mapping between the Virtual Cluster Descriptor (VCD) map and object map) , are consistent, and reports an error if not.
User who must use this option: mapr
-Gfull|--check-tiermetadata-full
Description: Checks if the entries in the meta data tables maintained internally for objects and containers (the mapping between the VCD map and object map, along with the mapping between the VCD map and the MFS meta data), are consistent and reports an error if not.
User who must use this option: mapr
-Dquick|--check-tierdata-presence
Description: Specified with either -Gquick or -Gfull. Checks and reports if the object in the meta data tables exists in the tier or not.
User who must use this option: mapr
-Dfull|--check-tierdata-crc
Description: Specified with either -Gquick or -Gfull. Validates the data CRC for the object in the meta data tables.
User who must use this option: mapr
-J|--skip-tier-log-replay
Description: Skips replaying transactions from internal dot files if a tier operation ends abruptly. Data Fabric recommends that you use this option when running the GFSCK utility on tiered volumes.
User who must use this option: Either root or mapr.

Examples

  1. Debug Mode

    In debug mode, run the gfsck command on the read/write volume named mapr.cluster.root:

    /opt/mapr/bin/gfsck rwvolume=mapr.cluster.root -d

    Sample output is as follows:

    Starting GlobalFsck:
      clear-mode            = false
      debug-mode            = true
      dbcheck-mode          = false
      repair-mode           = false
      assume-yes-mode       = false
      cluster               = my.cluster.com
      rw-volume-name        = mapr.cluster.root
      snapshot-name         = null
      snapshot-id           = 0
      user-id               = 0
      group-id              = 0
    
      get volume properties ...
        rwVolumeName = mapr.cluster.root (volumeId = 205374230, rootContainerId = 2049, isMirror = false)
    
      put volume mapr.cluster.root in global-fsck mode ...
    
      get snapshot list for volume mapr.cluster.root ...
    
      starting phase one (get containers) for volume mapr.cluster.root(205374230) ...
        container 2049 (latestEpoch=3, fixedByFsck=false)
        got volume containers map
      done phase one
    
      starting phase two (get inodes) for volume mapr.cluster.root(205374230) ...
        get container inode list for cid 2049
          +inodelist: fid=2049.32.131224 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false
          +inodelist: fid=2049.33.131226 pfid=-1.16.2 typ=2 styp=0 nch=0 dMe:false dRec: false
          +inodelist: fid=2049.34.131228 pfid=-1.33.131226 typ=4 styp=0 nch=0 dMe:false dRec: false
          +inodelist: fid=2049.35.131230 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false
          +inodelist: fid=2049.36.131232 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false
          +inodelist: fid=2049.38.262312 pfid=-1.16.2 typ=2 styp=0 nch=0 dMe:false dRec: false
          +inodelist: fid=2049.39.262314 pfid=-1.38.262312 typ=1 styp=0 nch=0 dMe:false dRec: false
        got container inode lists (totalThreads=1)
      done phase two
    
      starting phase three (get fidmaps & tabletmaps) for volume mapr.cluster.root(205374230) ...
        got fidmap lists (totalFidmapThreads=0)
        got tabletmap lists (totalTabletmapThreads=0)
      done phase three
    
    === Start of GlobalFsck Report ===
    
    file-fidmap-filelet union --
    2049.39.262314:P     --> primary (nchunks=0)      --> AllOk
    no errors
    
    table-tabletmap-tablet union --
    empty
    
    orphan directories --
    none
    
    orphan kvstores --
    none
    
    orphan files --
    none
    
    orphan fidmaps --
    none
    
    orphan tables --
    none
    
    orphan tabletmaps --
    none
    
    orphan dbkvstores --
    none
    
    orphan dbfiles --
    none
    
    orphan dbinodes --
    none
    
    containers that need repair --
    none
    
    incomplete snapshots that need to be deleted --
    none
    
    user statistics --
    containers          = 1
    directories         = 2
    kvstores            = 0
    files               = 1
    fidmaps             = 0
    filelets            = 0
    tables              = 0
    tabletmaps          = 0
    schemas             = 0
    tablets             = 0
    segmaps             = 0
    spillmaps           = 0
    overflowfiles       = 0
    bucketfiles         = 0
    spillfiles          = 0
    
    === End of GlobalFsck Report ===
    
    remove volume mapr.cluster.root from global-fsck mode (ret = 0) ...
    
    GlobalFsck completed successfully (7142 ms); Result: verify succeeded 

    To verify if the object is present on the tier, run the gfsck command on the tiering-enabled read/write volume named for_test5:

    NOTE
    This example is valid for -Dfull as well. Replace -Dquick with -Dfull.
    /opt/mapr/bin/gfsck rwvolume=for_test5 -Gfull -Dquick

    Sample output is as follows:

    Starting GlobalFsck:
      clear-mode            = false
      debug-mode            = false
      dbcheck-mode          = false
      repair-mode           = false
      assume-yes-mode       = false
      cluster               = Cloudpool19
      rw-volume-name        = for_test5
      snapshot-name         = null
      snapshot-id           = 0
      user-id               = 2000
      group-id              = 2000
    
      get volume properties ...
    
      put volume for_test5 in global-fsck mode ...
    
      get snapshot list for volume for_test5 ...
    
      starting phase one (get containers) for volume for_test5(16558233) ...  
        got volume containers map
    
    done phase one
    
      starting phase two (get inodes) for volume for_test5(16558233) ...
        got container inode lists
      done phase two
    
      starting phase three (get fidmaps & tabletmaps) for volume for_test5(16558233) ...
        got fidmap lists
        got tabletmap lists
        completed secondary index field path info gathering
        completed secondary index consistency check
        Starting DeferMapCheck..
        completed DeferMapCheck
      done phase three
    
      === Start of GlobalFsck Report ===
    
      file-fidmap-filelet union --
        no errors
    
      table-tabletmap-tablet union --
        empty
    
      containers that need repair --
        none
    
      user statistics --
        containers          = 6
        directories         = 6
        files               = 1
        filelets            = 2
        tables              = 0
        tablets             = 0
    
      === End of GlobalFsck Report ===
    Putting volume into TierGlobalFsck mode . . . . .
    
    === Start of TierGlobalFsck Report ===
    TierVolumeGfsck completed, corruption not found
      total number of containers scanned           6
      total number of vcds verified                6722
      total number of objects verified             18
      total number of vcds skipped                 0
      total number of objects skipped              0
      total number of vcds that need repair        0
      total number of objects that need repair     0
    === End of TierGlobalFsck Report ===
    
    removing volume from TierGlobalFsck mode
    remove volume for_test5 from global-fsck mode (ret = 0)
    
    GlobalFsck completed successfully (37039 ms); Result: verify succeeded
  2. Verifying CRC of FIlelet
    # /opt/mapr/bin/gfsck  -D fid=2085.32.131412  --debug
    verifying data crc
      mode          =       fid
      fid           =       2085.32.131412
      debug-mode    =       true
      repair-mode   =       false
      cluster       =       default
      replication index     =       -1
      user-id       =       0
      group-id      =       0
    
    crc validate result for fid : 2085.32.131412
      total local cluster/vcds verified : 51
      total local cluster/vcds corrupted : 0
      total local cluster/vcds skipped: 0
      total purged cluster/vcds verified : 0
      total purged cluster/vcds corrupted : 0
      total purged cluster/vcds skipped: 0
  3. Verifying CRC at a Container Level
    For CRC checks at the container level, the output is not displayed on the terminal. Instead it is written to the /opt/mapr/log/gfsck.log file. Sample output is as follows:
    /opt/mapr/bin/gfsck  -D rwvolume=rocky
    verifying data crc
      mode          =       volume
      rwVolumeName          =       rocky
      fid thread count      =       16
      cid thread count      =       16
      debug-mode    =       false
      repair-mode   =       false
      cluster       =       default
      replication index     =       -1
      user-id       =       0
      group-id      =       0
      total containers : 6
      total container skipped : 0
      data crc verification completed with no errors 
  4. Check a HPE Object Store volume without corruption

    Step 1: Extract the volume ID of a given bucket:

    /opt/mapr/server/mrconfig s3 bucketinfo kbuck1
      bucketdirfid 2503.43.131380
      oltFid 2503.44.131382
      odtFid 2503.48.131390
      f2oFid 2503.51.131396
      volid 96531604
      creationTime 1642581709849
      accountName defaul
    Step 2: Obtain the volume name using the volume ID.
    /opt/mapr/bin/maprcli volume list  -columns volumename,volumeid | grep 96531604
        mapr.s3bucketVol.00000003   96531604   
    Step 3: Run gfsck on the volume.
    su mapr -c "/opt/mapr/bin/gfsck -S3 rwvolume=mapr.s3bucketVol.00000003 -d"
      Starting GlobalFsck:
        clear-mode            = false
        debug-mode            = true
        dbcheck-mode          = false
        repair-mode           = false
        assume-yes-mode       = false
        verify-only-object-store      = true
        cluster               = ec-cluster
        rw-volume-name        = mapr.s3bucketVol.00000003
        snapshot-name         = null
        snapshot-id           = 0
        cid           = 0
        fid           = null  
        user-id               = 5000
        group-id              = 5000
    
       file-fidmap-filelet union --
          256001024.54.131402:P       --> primary (nchunks=2)         --> AllOk
          256001024.54.131402:F       --> fidmap  (256001024.55.131404)       --> AllOk
          256001024.54.131402:0       --> filelet (256001027.32.131270)       --> Visited
          256001024.54.131402:1       --> filelet (256001029.32.131338)       --> Visited
          256001024.56.131406:P       --> primary (nchunks=8)         --> AllOk
          256001024.56.131406:F       --> fidmap  (256001024.57.131408)       --> AllOk
          256001024.56.131406:0       --> filelet (256001026.45.131320)       --> Visited
          256001024.56.131406:1       --> filelet (256001030.45.131276)       --> Visited
          256001024.56.131406:2       --> filelet (256001027.41.131272)       --> Visited
          256001024.56.131406:3       --> filelet (256001028.32.131334)       --> Visited
          256001024.56.131406:4       --> filelet (256001029.41.131340)       --> Visited
          256001024.56.131406:5       --> filelet (256001026.46.131322)       --> Visited
          256001024.56.131406:6       --> filelet (256001030.46.131278)       --> Visited
          256001024.56.131406:7       --> filelet (256001029.42.131342)       --> Visited
          no errors
    
        get volume properties ...
          rwVolumeName = mapr.s3bucketVol.00000003 (volumeId = 96531604, rootContainerId = 2503, isMirror = false)
          volume:mapr.s3bucketVol.00000003, snapshotName:mapr.gfsck.snap.mapr.s3bucketVol.00000003.1642584648822, snapshotId:256000052, rootContainerId:256001024, will be doing object store check
    
      s3 bucket verification report --
          S3Bucket:256001024.43.131380 => AllOk
          S3Bucket:256001024.43.131380 Stats => numObjectsScanned:5, numObjectsVerified:4, numObjectsNeedsRepair:0, numObjectsStatusUnknown:0, numTinyObjects:1, numSmallObjects:1, numFSObjects:2, numUnreachableSmallObjects:0
          total unreachable jumbo/large objects:0
    The fields in the bucket verification report are as follows:
    • numTinyObjects: Number of tiny objects per bucket in the volume.
    • numSmallObjects: Number of small objects per bucket in the volume.
    • numFSObjects: Number of large/jumbo objects per bucket in the volume.
    • numObjectsNeedsRepair: Number of objects that need to be repaired.
    • numUnreachableSmallObjects: Number of small objects that have an entry in ODT with no corresponding entry in OLT table.
  5. Check a HPE Object Store volume with corruption

    Step 1: Extract the volume ID of a given bucket:

    /opt/mapr/server/mrconfig s3 bucketinfo kbuck2
      bucketdirfid 2503.43.131380
      oltFid 2503.44.131382
      odtFid 2503.48.131390
      f2oFid 2503.51.131396
      volid 96531653
      creationTime 1642581709849
      accountName defaul
    Step 2: Obtain the volume name using the volume ID.
    /opt/mapr/bin/maprcli volume list  -columns volumename,volumeid | grep 96531653
        mapr.s3bucketVol.00000006   96531653   
    Step 3: Run gfsck on the volume.
    su mapr -c "/opt/mapr/bin/gfsck -S3 rwvolume=mapr.s3bucketVol.00000006 -d"
      Starting GlobalFsck:
        clear-mode            = false
        debug-mode            = true
        dbcheck-mode          = false
        repair-mode           = false
        assume-yes-mode       = false
        verify-only-object-store      = true
        cluster               = ec-cluster
        rw-volume-name        = mapr.s3bucketVol.00000006
        snapshot-name         = null
        snapshot-id           = 0
        cid           = 0
        fid           = null  
        user-id               = 5000
        group-id              = 5000
    
       file-fidmap-filelet union --
        256001038.54.131402:P       --> primary (nchunks=2)         --> AllOk
        256001038.54.131402:F       --> fidmap  (256001038.55.131404)       --> AllOk
        256001038.54.131402:0       --> filelet (256001041.32.131270)       --> Visited
        256001038.54.131402:1       --> filelet (256001043.32.131338)       --> Visited
        256001038.56.131406:P       --> primary (nchunks=8)         --> NeedsRepair
        256001038.56.131406:F       --> fidmap  (256001038.57.131408)       --> NeedsRepair
        256001038.56.131406:0       --> filelet (256001040.45.131320)       --> Visited
        256001038.56.131406:1       --> filelet (256001044.45.131276)       --> Visited
        256001038.56.131406:2       --> filelet (256001041.41.131272)       --> Visited
        256001038.56.131406:3       --> filelet (256001042.32.131334)       --> DeleteInFidmap
        256001038.56.131406:4       --> filelet (256001043.41.131340)       --> Visited
        256001038.56.131406:5       --> filelet (256001040.46.131322)       --> Visited
        256001038.56.131406:6       --> filelet (256001044.46.131278)       --> Visited
        256001038.56.131406:7       --> filelet (256001043.42.131342)       --> Visited
    
      s3 bucket verification report --
        S3Bucket:256001038.43.131380 => NeedsRepair
        S3Bucket:256001038.43.131380 Stats => numObjectsScanned:5, numObjectsVerified:3, numObjectsNeedsRepair:1, numObjectsStatusUnknown:0, numTinyObjects:1, numSmallObjects:1, numFSObjects:2, numUnreachableSmallObjects:0
        total unreachable jumbo/large objects:0
  6. Check a HPE Object Store table range
    su mapr -c "/opt/mapr/bin/gfsck -S3 rwvolume=mapr.s3bucketVol.00000003 -d -b"
      Starting GlobalFsck:
        clear-mode            = false
        debug-mode            = true
        dbcheck-mode          = true
        repair-mode           = false
        assume-yes-mode       = false
        verify-only-object-store      = true
        cluster               = ec-cluster
        rw-volume-name        = mapr.s3bucketVol.00000003
        snapshot-name         = null
        snapshot-id           = 0
        cid           = 0
        fid           = null  
        user-id               = 5000
        group-id              = 5000