gfsck
Describes how you can use the gfsck
command, under the supervision of
HPE Ezmeral Data Fabric Support or
Engineering, to perform consistency checks and appropriate repairs on a volume, or a volume
snapshot.
You can use the gfsck
command when the local fsck
either
repairs or loses some containers at the highest epoch.
For an overview of using the GFSCK command, see Using Global File System Checking.
Permissions Required
Although you need to be the root user to run this command, checking tiering-enabled volumes requires you to be the mapr user.
Syntax
/opt/mapr/bin/gfsck
[-h] [--help]
[-c] [--clear]
[-d] [--debug]
[-b] [--dbcheck]
[-r] [--repair]
[-y] [--assume-yes]
[-Gquick] [--check-tiermetadata-only]
[-Gfull] [--check-tiermetadata-full]
[-Dquick] [--check-tierdata-presence]
[-Dfull] [--check-tierdata-crc]
[-J] [--skip-tier-log-replay]
[-D] [--crc]
[-S3] [--only-object-store]
[cluster=cluster-name (default=default)]
[rwvolume=volume-name (default=null)]
[snapshot=snapshot-name (default=null)]
[snapshotid=snapshot-id (default=0)]
[fid=fid (default=null)]
[cid=cid (default=0)]
[startCid=cid (default=0)]
[rIdx=<repl index>] (replication index, only enabled with [-D] [--crc]
[fidThreads=<check crc thread count for fid>] (default:16, max:128)
[cidThread=<check crc thread count for cid>] (default:16, max:128)
[scanthreads=inode scanner threads count (default:10, max:1000)]
Parameters
- -h|--help
- Description: Prints usage text
- -c|--clear
- Description: Clears previous warnings before performing the global filesystem check.
-d|--debug
- Description: Provides information for debugging.
-b|--dbcheck
- Description: Checks that every key in a tablet is within that tablet's startKey and endKey range. This option is I/O intensive, so use this option only if you suspect database inconsistency.
-r|--repair
- Description: Indicates and repairs the inconsistencies detected by
-GQuick
,-GFull
,-DQuick
, and-DFull
. Repair is not supported for snapshots and mirrors. -y|--assume-yes
- Description: If specified, assumes that containers without valid copies (as
reported by CLDB) are deleted automatically. If not specified,
gfsck
pauses for user input: yes to delete, no to exitgfsck
, or ctrl-C to quit. -D|--crc
- Description: Provides validation of the CRC of the data present in the volume.
The data can either be local or offloaded.
You can use this option at the volume, container, snapshot, and the filelet levels.
gfsck
reports corruption found at each level.User who must use this option:
root
-S3|--only-object-store
- Description: Check objects in each bucket of a given Object
Store volume and Object Store mirror volume for metadata inconsistencies.
User who must use this option:
mapr
. cluster
- Description: Specifies the name of the cluster (default: default cluster)
rwvolume
- Description: Specifies the name of the volume (default: default cluster)
fid
- Description: Checks data CRC for the master copy of the specified fid. To
check any other copy, use the
rIdx
option. You must use fid only with the--crc
option. cid
- Description: Checks data CRC for the master copy of the specified container ID.
To check any other copy, use the
rIdx
option. The default value of0
denotes that all containers are checked. You must use cid only with the--crc
option. - startCid
- Description: startCid is only applicable with the
option
--crc rwvolume=<volumename>
.Use this option to start verification from the specific container instead of starting from the first container of that volume, If not provided, the
--crc
option checks the data CRC of all the containers.For example, assume that one particular volume has containers such as 205...2055...2900.. .. .. .. 3000 .. .. .. .. 5000.. .. .. .. .. 9999.
You can use the startCid option to start verification from container 3000, and all containers prior to 3000 will be skipped.
rIdx
- Description: Specifies the index (either
fid
orcid
) of the copy of the data to check for errors.Use only with
-D
or--crc
and eitherfid
orcid
.For example,
-D fid:2510.32.131204 rIdx=0
only checks the data for copy 1 of the specified fid. fidThreads
- Description: Specifies the number of threads for scanning fids (default:16,
max:128). You must use fidThreads only with the
--crc
option. cidThreads
- Description: Specifies the number of threads for scanning container IDs
(default:16, max:128). You must use cidThreads only with the
--crc
option. scanthreads
- Description: Specifies the number of threads for scanning inodes (default:10, max:1000)
snapshot
- Description: Specifies the name of the snapshot (default: null)
snapshotid
- Description: Specifies the snapshot ID (default: 0)
-Gquick|--check-tiermetadata-only
- Description: Checks if the entries in the meta data tables maintained internally for objects and tiers (the mapping between the Virtual Cluster Descriptor (VCD) map and object map) , are consistent, and reports an error if not.
-Gfull|--check-tiermetadata-full
- Description: Checks if the entries in the meta data tables maintained internally for objects and containers (the mapping between the VCD map and object map, along with the mapping between the VCD map and the MFS meta data), are consistent and reports an error if not.
-Dquick|--check-tierdata-presence
- Description: Specified with either
-Gquick
or-Gfull
. Checks and reports if the object in the meta data tables exists in the tier or not. -Dfull|--check-tierdata-crc
- Description: Specified with either
-Gquick
or-Gfull
. Validates the data CRC for the object in the meta data tables. -J|--skip-tier-log-replay
- Description: Skips replaying transactions from internal dot files if a tier operation ends abruptly. Data Fabric recommends that you use this option when running the GFSCK utility on tiered volumes.
Examples
-
Debug Mode
In debug mode, run the
gfsck
command on the read/write volume namedmapr.cluster.root
:/opt/mapr/bin/gfsck rwvolume=mapr.cluster.root -d
Sample output is as follows:
Starting GlobalFsck: clear-mode = false debug-mode = true dbcheck-mode = false repair-mode = false assume-yes-mode = false cluster = my.cluster.com rw-volume-name = mapr.cluster.root snapshot-name = null snapshot-id = 0 user-id = 0 group-id = 0 get volume properties ... rwVolumeName = mapr.cluster.root (volumeId = 205374230, rootContainerId = 2049, isMirror = false) put volume mapr.cluster.root in global-fsck mode ... get snapshot list for volume mapr.cluster.root ... starting phase one (get containers) for volume mapr.cluster.root(205374230) ... container 2049 (latestEpoch=3, fixedByFsck=false) got volume containers map done phase one starting phase two (get inodes) for volume mapr.cluster.root(205374230) ... get container inode list for cid 2049 +inodelist: fid=2049.32.131224 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false +inodelist: fid=2049.33.131226 pfid=-1.16.2 typ=2 styp=0 nch=0 dMe:false dRec: false +inodelist: fid=2049.34.131228 pfid=-1.33.131226 typ=4 styp=0 nch=0 dMe:false dRec: false +inodelist: fid=2049.35.131230 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false +inodelist: fid=2049.36.131232 pfid=-1.16.2 typ=4 styp=0 nch=0 dMe:false dRec: false +inodelist: fid=2049.38.262312 pfid=-1.16.2 typ=2 styp=0 nch=0 dMe:false dRec: false +inodelist: fid=2049.39.262314 pfid=-1.38.262312 typ=1 styp=0 nch=0 dMe:false dRec: false got container inode lists (totalThreads=1) done phase two starting phase three (get fidmaps & tabletmaps) for volume mapr.cluster.root(205374230) ... got fidmap lists (totalFidmapThreads=0) got tabletmap lists (totalTabletmapThreads=0) done phase three === Start of GlobalFsck Report === file-fidmap-filelet union -- 2049.39.262314:P --> primary (nchunks=0) --> AllOk no errors table-tabletmap-tablet union -- empty orphan directories -- none orphan kvstores -- none orphan files -- none orphan fidmaps -- none orphan tables -- none orphan tabletmaps -- none orphan dbkvstores -- none orphan dbfiles -- none orphan dbinodes -- none containers that need repair -- none incomplete snapshots that need to be deleted -- none user statistics -- containers = 1 directories = 2 kvstores = 0 files = 1 fidmaps = 0 filelets = 0 tables = 0 tabletmaps = 0 schemas = 0 tablets = 0 segmaps = 0 spillmaps = 0 overflowfiles = 0 bucketfiles = 0 spillfiles = 0 === End of GlobalFsck Report === remove volume mapr.cluster.root from global-fsck mode (ret = 0) ... GlobalFsck completed successfully (7142 ms); Result: verify succeeded
To verify if the object is present on the tier, run the
gfsck
command on the tiering-enabled read/write volume namedfor_test5
:NOTEThis example is valid for-Dfull
as well. Replace-Dquick
with-Dfull
./opt/mapr/bin/gfsck rwvolume=for_test5 -Gfull -Dquick
Sample output is as follows:
Starting GlobalFsck: clear-mode = false debug-mode = false dbcheck-mode = false repair-mode = false assume-yes-mode = false cluster = Cloudpool19 rw-volume-name = for_test5 snapshot-name = null snapshot-id = 0 user-id = 2000 group-id = 2000 get volume properties ... put volume for_test5 in global-fsck mode ... get snapshot list for volume for_test5 ... starting phase one (get containers) for volume for_test5(16558233) ... got volume containers map done phase one starting phase two (get inodes) for volume for_test5(16558233) ... got container inode lists done phase two starting phase three (get fidmaps & tabletmaps) for volume for_test5(16558233) ... got fidmap lists got tabletmap lists completed secondary index field path info gathering completed secondary index consistency check Starting DeferMapCheck.. completed DeferMapCheck done phase three === Start of GlobalFsck Report === file-fidmap-filelet union -- no errors table-tabletmap-tablet union -- empty containers that need repair -- none user statistics -- containers = 6 directories = 6 files = 1 filelets = 2 tables = 0 tablets = 0 === End of GlobalFsck Report === Putting volume into TierGlobalFsck mode . . . . . === Start of TierGlobalFsck Report === TierVolumeGfsck completed, corruption not found total number of containers scanned 6 total number of vcds verified 6722 total number of objects verified 18 total number of vcds skipped 0 total number of objects skipped 0 total number of vcds that need repair 0 total number of objects that need repair 0 === End of TierGlobalFsck Report === removing volume from TierGlobalFsck mode remove volume for_test5 from global-fsck mode (ret = 0) GlobalFsck completed successfully (37039 ms); Result: verify succeeded
-
Verifying CRC of FIlelet
# /opt/mapr/bin/gfsck -D fid=2085.32.131412 --debug verifying data crc mode = fid fid = 2085.32.131412 debug-mode = true repair-mode = false cluster = default replication index = -1 user-id = 0 group-id = 0 crc validate result for fid : 2085.32.131412 total local cluster/vcds verified : 51 total local cluster/vcds corrupted : 0 total local cluster/vcds skipped: 0 total purged cluster/vcds verified : 0 total purged cluster/vcds corrupted : 0 total purged cluster/vcds skipped: 0
-
Verifying CRC at a Container Level
For CRC checks at the container level, the output is not displayed on the terminal. Instead it is written to the
/opt/mapr/log/gfsck.log
file. Sample output is as follows:/opt/mapr/bin/gfsck -D rwvolume=rocky verifying data crc mode = volume rwVolumeName = rocky fid thread count = 16 cid thread count = 16 debug-mode = false repair-mode = false cluster = default replication index = -1 user-id = 0 group-id = 0 total containers : 6 total container skipped : 0 data crc verification completed with no errors
- Check a HPE Object Store volume without corruption
Step 1: Extract the volume ID of a given bucket:
/opt/mapr/server/mrconfig s3 bucketinfo kbuck1 bucketdirfid 2503.43.131380 oltFid 2503.44.131382 odtFid 2503.48.131390 f2oFid 2503.51.131396 volid 96531604 creationTime 1642581709849 accountName defaul
Step 2: Obtain the volume name using the volume ID./opt/mapr/bin/maprcli volume list -columns volumename,volumeid | grep 96531604 mapr.s3bucketVol.00000003 96531604
Step 3: Rungfsck
on the volume.su mapr -c "/opt/mapr/bin/gfsck -S3 rwvolume=mapr.s3bucketVol.00000003 -d" Starting GlobalFsck: clear-mode = false debug-mode = true dbcheck-mode = false repair-mode = false assume-yes-mode = false verify-only-object-store = true cluster = ec-cluster rw-volume-name = mapr.s3bucketVol.00000003 snapshot-name = null snapshot-id = 0 cid = 0 fid = null user-id = 5000 group-id = 5000 file-fidmap-filelet union -- 256001024.54.131402:P --> primary (nchunks=2) --> AllOk 256001024.54.131402:F --> fidmap (256001024.55.131404) --> AllOk 256001024.54.131402:0 --> filelet (256001027.32.131270) --> Visited 256001024.54.131402:1 --> filelet (256001029.32.131338) --> Visited 256001024.56.131406:P --> primary (nchunks=8) --> AllOk 256001024.56.131406:F --> fidmap (256001024.57.131408) --> AllOk 256001024.56.131406:0 --> filelet (256001026.45.131320) --> Visited 256001024.56.131406:1 --> filelet (256001030.45.131276) --> Visited 256001024.56.131406:2 --> filelet (256001027.41.131272) --> Visited 256001024.56.131406:3 --> filelet (256001028.32.131334) --> Visited 256001024.56.131406:4 --> filelet (256001029.41.131340) --> Visited 256001024.56.131406:5 --> filelet (256001026.46.131322) --> Visited 256001024.56.131406:6 --> filelet (256001030.46.131278) --> Visited 256001024.56.131406:7 --> filelet (256001029.42.131342) --> Visited no errors get volume properties ... rwVolumeName = mapr.s3bucketVol.00000003 (volumeId = 96531604, rootContainerId = 2503, isMirror = false) volume:mapr.s3bucketVol.00000003, snapshotName:mapr.gfsck.snap.mapr.s3bucketVol.00000003.1642584648822, snapshotId:256000052, rootContainerId:256001024, will be doing object store check s3 bucket verification report -- S3Bucket:256001024.43.131380 => AllOk S3Bucket:256001024.43.131380 Stats => numObjectsScanned:5, numObjectsVerified:4, numObjectsNeedsRepair:0, numObjectsStatusUnknown:0, numTinyObjects:1, numSmallObjects:1, numFSObjects:2, numUnreachableSmallObjects:0 total unreachable jumbo/large objects:0
The fields in the bucket verification report are as follows:- numTinyObjects: Number of tiny objects per bucket in the volume.
- numSmallObjects: Number of small objects per bucket in the volume.
- numFSObjects: Number of large/jumbo objects per bucket in the volume.
- numObjectsNeedsRepair: Number of objects that need to be repaired.
- numUnreachableSmallObjects: Number of small objects that have an entry in ODT with no corresponding entry in OLT table.
-
Check a HPE Object Store volume with corruption
Step 1: Extract the volume ID of a given bucket:
/opt/mapr/server/mrconfig s3 bucketinfo kbuck2 bucketdirfid 2503.43.131380 oltFid 2503.44.131382 odtFid 2503.48.131390 f2oFid 2503.51.131396 volid 96531653 creationTime 1642581709849 accountName defaul
Step 2: Obtain the volume name using the volume ID./opt/mapr/bin/maprcli volume list -columns volumename,volumeid | grep 96531653 mapr.s3bucketVol.00000006 96531653
Step 3: Rungfsck
on the volume.su mapr -c "/opt/mapr/bin/gfsck -S3 rwvolume=mapr.s3bucketVol.00000006 -d" Starting GlobalFsck: clear-mode = false debug-mode = true dbcheck-mode = false repair-mode = false assume-yes-mode = false verify-only-object-store = true cluster = ec-cluster rw-volume-name = mapr.s3bucketVol.00000006 snapshot-name = null snapshot-id = 0 cid = 0 fid = null user-id = 5000 group-id = 5000 file-fidmap-filelet union -- 256001038.54.131402:P --> primary (nchunks=2) --> AllOk 256001038.54.131402:F --> fidmap (256001038.55.131404) --> AllOk 256001038.54.131402:0 --> filelet (256001041.32.131270) --> Visited 256001038.54.131402:1 --> filelet (256001043.32.131338) --> Visited 256001038.56.131406:P --> primary (nchunks=8) --> NeedsRepair 256001038.56.131406:F --> fidmap (256001038.57.131408) --> NeedsRepair 256001038.56.131406:0 --> filelet (256001040.45.131320) --> Visited 256001038.56.131406:1 --> filelet (256001044.45.131276) --> Visited 256001038.56.131406:2 --> filelet (256001041.41.131272) --> Visited 256001038.56.131406:3 --> filelet (256001042.32.131334) --> DeleteInFidmap 256001038.56.131406:4 --> filelet (256001043.41.131340) --> Visited 256001038.56.131406:5 --> filelet (256001040.46.131322) --> Visited 256001038.56.131406:6 --> filelet (256001044.46.131278) --> Visited 256001038.56.131406:7 --> filelet (256001043.42.131342) --> Visited s3 bucket verification report -- S3Bucket:256001038.43.131380 => NeedsRepair S3Bucket:256001038.43.131380 Stats => numObjectsScanned:5, numObjectsVerified:3, numObjectsNeedsRepair:1, numObjectsStatusUnknown:0, numTinyObjects:1, numSmallObjects:1, numFSObjects:2, numUnreachableSmallObjects:0 total unreachable jumbo/large objects:0
-
Check a HPE Object Store table range
su mapr -c "/opt/mapr/bin/gfsck -S3 rwvolume=mapr.s3bucketVol.00000003 -d -b" Starting GlobalFsck: clear-mode = false debug-mode = true dbcheck-mode = true repair-mode = false assume-yes-mode = false verify-only-object-store = true cluster = ec-cluster rw-volume-name = mapr.s3bucketVol.00000003 snapshot-name = null snapshot-id = 0 cid = 0 fid = null user-id = 5000 group-id = 5000