Replacing a Failed Disk
This procedure describes using the mrconfig
to replace a failed disk
that is part a storage pool on HPE Ezmeral Data Fabric on Kubernetes on HPE Ezmeral Runtime Enterprise.
Prerequisites
Prerequisites:
-
Required access rights:
-
Platform Administrator or Kubernetes Cluster Administrator access rights are required to download the admin kubeconfig file, which is needed to access Kubernetes cluster pods (see Downloading Admin Kubeconfig).
-
You must be logged on as the root user on the nodes that contain the disk and on which the Kubernetes cluster is running.
-
-
You have identified the disk that has failed and needs replacement.
About this task
During this procedure, you place the pod in maintenance mode and take the storage
pool offline. After you replace the failed disk, you will use the
mrconfig
utility to recreate the storage pool, and then you
will bring the storage pool and pod back online.
You must use the mrconfig
utility to perform this task. Using
the equivalent maprcli
commands is not supported.
Procedure
-
Use
kubectl exec
command to access the CLDB or MFS pod that contains the storage pool that contains the failed disk.For example:
kubectl exec -it cldb-0 -n myclusternode1 -- /bin/bash
If needed, you use the
kubectl get pods -n <cluster-name>
command to get the list of pods, and then determine the CLDB or MFS pod in which you want to run thefsck
tool. -
Place the pod in maintenance mode by entering the following command:
sudo touch /opt/mapr/kubernetes/maintenance
-
Use the
mrconfig sp list
command to list the storage pools that are in the pod:In the following example, there is one storage pool,
SP1
, with path:/dev/drive0
mrconfig sp list ListSPs resp: status 0:1 No. of SPs (1), totalsize 224491 MB, totalfree 221235 MB SP 0: name SP1, Online, size 224491 MB, free 221235 MB, path /dev/drive0
-
Make note of the other disk drives in the storage pool.
Later in this procedure you will remove and then add the other disks in the storage pool that containes the failed disk. You can display the disks in the storage pool by entering the
mrconfig dg list <path>
command, where<path>
is the path of the storage pool. In the output of the command, the drive paths of the disks in the group are listed at the end of the lines that start withSubDG
. -
Mark the storage pool as offline.
For example:
mrconfig sp offline /dev/drive0
-
Verify the storage pool is offline by examining the output of the
mrconfig sp list
command.For example:
mrconfig sp list ListSPs resp: status 0:1 No. of SPs (1), totalsize 0 MB, totalfree 0 MB SP 0: name SP1, Offline, size 2575449 MB, free 0 MB, path /dev/drive0
-
Remove the failed disk from the configuration.
CAUTION
Removing a disk destroys the data on the disk, so ensure that all data on a disk is backed up and replicated before removing a disk.
For example:
mrconfig disk remove /dev/drive0
- Replace the disk hardware. Follow the instructions for the system and disk you are replacing to remove the disk from the system and install the replacement disk.
-
Initialize the replaced disk by using the
mrconfig disk init
command.For example:
mrconfig disk init -F /dev/drive0 Disk guid: 7cc56e064fd1e1fe:60a6bfaa0693a2
-
Load the replaced disk by using the
mrconfig disk load
command.For example:
/opt/mapr/server/mrconfig disk load /dev/drive0 guid FEE1D14F-066E-C57C-A293-06AABFA66000 dgguid 00000000-0000-0000-0000-000000000000
-
One disk at a time, use the
mrconfig
utility to remove, initialize, and load the other disks that were part of the storage pool that contained the replaced disk.After you finish this step, the replaced disk and the remaining disks in the storage pool have been initiated and loaded. -
Use the
mrconfig dg create raid0
to create a disk group of typeraid0
that includes the disks in the storage pool.For example:
/opt/mapr/server/mrconfig dg create raid0 /dev/drive0 /dev/drive1 /dev/drive2 CreateDG disks(3) stripeDepth(0) layout(3)
-
Create a concatenated disk group with
mrconfig dg create concat
by specifying the primary drive.For example:
mrconfig dg create concat /dev/drive0 CreateDG disks(1) stripeDepth(0) layout(2)
At this point, you can use the
mrconfig dg list
to see the layout of the disk group, and which disk is the primary disk. The primary disk can be used in other commands to refer to the disk group as a whole. -
Make the storage pool from the newly-created disk group.
For example:
/opt/mapr/server/mrconfig sp make -F /dev/drive0
-
Make the storage pool from the newly-created disk group.
For example:
/opt/mapr/server/mrconfig sp make -F /dev/drive0
-
Bring the storage pool online.
For example:
mrconfig sp online /dev/drive0
-
List the storage pools and verify the storage pool is online.
For example:
mrconfig sp list ListSPs resp: status 0:1 No. of SPs (1), totalsize 2510595 MB, totalfree 2509693 MB SP 0: name SP2, Online, size 2510595 MB, free 2509693 MB, path /dev/drive0
The storage pool is identified by its path. The name of the storage pool is generated automatically, and is not necessarily retained when you recreate a storage pool for a given path.
-
Bring the pod out of maintenance mode:
sudo rm -f /opt/mapr/kubernetes/maintenance
-
(Optional) Verify that the Data Fabric
cluster pods are operational.
For example, you can execute the
edf report ready
command.