fsck

Detects and fixes inconsistencies in the file system.

Use the file system check (fsck) utility to detect and fix inconsistencies in the file system.

Every storage pool has its own log to journal updates to the storage pool. The system performs all operations to a storage pool transactionally by journaling all operations to the log, before applying them to storage pool metadata. If file system is not shutdown cleanly, some metadata blocks may not persist. However, on the next load of the storage pool, log recovery takes care of these metadata blocks by replaying the records in the log. The fsck utility also replays the log before it checks the metadata consistency in a storage pool. The fsck utility walks the storage pool in question to verify all Data Fabric file system metadata (and data correctness if specified on the command line), and reports all potentially lost or corrupt containers, directories, tables, files, filelets, and blocks in the storage pool. The fsck utility:

  • Checks whether all files and directories are reachable and all directory entries are valid.
  • Checks whether BTrees are consistent for various inode types (such as files and directories).
  • Walks the container file and visits every inode in the container to check that no block is owned by two inodes. Aso, verifies the consistency of bitmaps of inodes and blocks.
  • Checks consistency of snapshots.
  • Visits every allocated block in the storage pool and recovers any blocks that are part of corrupted inodes.
  • Checks consistency of HPE Ezmeral Data Fabric Database metadata.
  • Checks consistency of tabletmap, tablets, buckets, and spill files.

The fsck utility can be used on an offline storage pool after a node failure, after a disk failure, or after a Data Fabric file system process crash, or simply to verify the consistency of data for suspected software bugs.

Typical process flow:

  • Execute the fsck command on the storage pools (or disks) as specified in the following discussion.
  • Execute the gfsck command on the cluster, volumes, or snapshots that were affected.

You can run the fsck command in two modes:

  • Verification mode - fsck only reports errors; it does not attempt to fix or modify any data on disk. You can run fsck in verification mode on an offline storage pool at any time, and it will report errors if there is inconsistency. If it does not report any errors, you can bring up the storage pool online without any risk of data loss. To run the fsck utility in verification mode, use any parameter except the -r parameter.
  • Repair mode - fsck attempts to repair a bad storage pool. When you run the fsck utility in repair mode on a storage pool, some volumes might need a global fsck (gfsck) after bringing the storage pool online. There is potential for loss of data in this case. To run the fsck utility in repair mode, use the -r parameter.

Using the /opt/mapr/server/fsck utility with the -r option produces different results depending on the scenario. The fsck utility does not interpret the scenario nor does it have a safe mode.

  • If a disk is offline because of an imbalanced b-tree, using fsck -r may result in data loss from bad containers, and data loss if additional replicas are unavailable.
  • If a disk is offline because of an I/O error, using fsck -r produces indeterminate results. A disk that is returning I/O errors is questionable in terms of data content and reliability. For example, an operation that completed on the disk but was never returned, may have partial data remaining on the disk. Using fsck -r retains any partial data.
  • If a disk is offline because of slow I/O, using fsck -r does not produce data loss.

The most conservative usage of fsck is to first run fsck without the -r option (verification mode) and check the output. If the output returns errors, then run fsck with the -r option.

Syntax

/opt/mapr/server/fsck [{<device-paths>}] or [-n <sp name>]
	 -l <log filename> ; default /opt/mapr/logs/fsck.log.<ts>.<pid>
	 -p <mfs port> ; default 5660
	 -N to disable status bar
	 -P to purge deleted containers in repair
	 -h for help
	 -j to skip log replay
	 -m <memory in MB> to set cache size for blocks 
	 -d to check data blocks crc
	 -b to check db consistency
	 -C to specify the container-id when using -d
	 -I to specify inode-number when using -d and -C
	 -r to repair ; USE WITH CAUTION AS IT CAN LEAD TO LOSS OF DATA

Parameters

Parameter

Description

-b Checks database consistency.
-C Optional with the -d option. Specifies the read-write container ID on which CRC (cyclic redundancy check) must be performed. If this option is not specified with the -d option, the CRC is performed on all the containers on the storage pool.

-d

Performs a CRC on data blocks. By default, fsck will not validate the CRC of user data pages. Enabling this check causes the check to take a while to complete.

<device-paths>

Paths to the disks that make up the storage pool.

NOTE
Before running fsck, use the mrconfig disk remove command to remove all the disks from file system. For example:
/opt/mapr/server/mrconfig disk remove /dev/sdb
/opt/mapr/server/fsck /dev/sdb

-h

Help

-I

Optional with the -C option. Specifies the inode number on which CRC (cyclic redundancy check) must be performed on the container specified by the -C option. If this option is not specified with the -C option, the CRC is performed on all the inodes.

-j

Skips log replay. Should be set only when log recovery fails. Log recovery can fail if the damaged blocks of a disk belong to the log, or if log recovery finds some CRC errors in the metadata blocks. *Using this parameter will typically lead to larger data loss. *

-l

The log filename. Default: /opt/mapr/logs/fsck.log.<ts>.<pid>

-m

Sets the cache size for blocks (MB).

-n

Storage pool name. This option works only if all the disks are in disktab. Otherwise, you must individually specify all the disks that make up the storage pool, using the <device-paths> parameter.

-N Disables the status bar.

-p

The file system port. Default: 5660

-P Purges deleted containers in repair.

-r

Runs in repair mode. USE WITH CAUTION AS THIS CAN LEAD TO LOSS OF DATA.