Addressing Data Alarms
Lists all the data alarms and their mitigation.
When a disk fails, data on that disk becomes unavailable. As a result, you will probably see one of these two data alarms along with a Disk Failure alarm:
- Data Unavailable (
VOLUME_ALARM_DATA_UNAVAILABLE
) - if there was only one copy of data and it was on the failed disk; or if data was replicated more than once, but all disks with that data failed - Data Under Replicated (
VOLUME_ALARM_DATA_UNDER_REPLICATED
) - if data on the failed disk is replicated elsewhere, but the minimum replication factor is not met as a result of the failed disk
-
Run the following command to identify which storage pools are offline:
[user@host] /opt/mapr/server/mrconfig sp list | grep Offline
-
For each storage pool reported by the previous command, run the following command, where
<sp>
specifies the name of an offline storage pool:[user@host] /opt/mapr/server/fsck -n <sp> -r
When you run fsck with the-r
option, it identifies corrupt blocks and removes them. If there are no corrupt blocks, fsck clears the error condition so you can bring the storage pool back online.NOTEUsing the /opt/mapr/server/fsck utility with the-r
flag to repair a filesystem risks data loss. Call support before using/opt/mapr/server/fsck -r
. - Verify that all Data Unavailable volume alarms are cleared. If Data Unavailable volume alarms persist, contact support or post on answers.mapr.com.
If there are any Data Under Replicated volume alarms in the cluster, can repair the problem by automatically replicating data and putting it on another disk. After you allow a reasonable amount of time for re-replication, verify that the under-replication alarms are cleared.
Using the /opt/mapr/server/fsck utility with the -r
option produces different results depending on the scenario. The fsck
utility does not interpret the scenario nor does it have a safe mode.
- If a disk is offline because of an imbalanced b-tree, using
fsck -r
may result in data loss from bad containers and data loss if additional replicas are unavailable. - If a disk is offline because of an I/O error, using
fsck -r
produces indeterminate results. A disk that is throwing I/O errors is questionable in terms of data content and reliability. For example, an operation that completed on the disk but was never returned may have partial data remaining on the disk. Usingfsck -r
retains any partial data. - If a disk is offline because of a slow I/O, using
fsck -r
does not produce data loss.
fsck -r
is to run fsck
without
the -r
option (verification mode) and check the output. If the output is ok,
then run fsck
with the -r
option.