HPE Ezmeral Data Fabric Issues
You can view the status of HPE Ezmeral Data Fabric services in the following locations:
- Virtual clusters:Services tab of the Cluster Details screen, or the Services tab of the Training Cluster Details or Deployment Cluster Details screen, as appropriate.
- Kubernetes virtual clusters:Services Status tab of the Kubernetes Cluster Details screen.
Checking Service Status
If the HPE Ezmeral Data Fabric (MapR) service does not appear in any Services tab, then it may not be running. You can determine the status of this service by executing the following commands:
-
Deployment Controller host:
docker ps -a
- Kubernetes Data Fabric Master node:
kubectl get po -A
(if the deployment includes a Kubernetes Data Fabric cluster)
Troubleshooting Errors
This article provides guidance in case any of the HPE Ezmeral Data Fabric services go into an ERROR state (red dot), or if you need to remove stale node IDs.
HPE Ezmeral Data Fabric Service | Description | Diagnostics Steps / Corrective Action |
Container Location Database (CLDB) | Tracks critical metadata about every container in Data Fabric, cluster file servers, and node activity. The CLDB service on multiple nodes distributes lookup operations across those nodes for load balancing and also provides high availability. |
Look at Restart CLDB services, as described here (link opens an external website in a new browser tab/window). |
Warden | A light Java application that runs on all the nodes in a cluster and coordinates cluster services. Warden's job on each node is to start, stop, or restart the appropriate services, and allocate the correct amount of memory to them. |
Get more context on the error by looking at the Warden logs
located at Refer to the troubleshooting steps here (link opens an external website in a new browser tab/window). Consider restarting the Zookeeper and Warden services, as described here (link opens an external website in a new browser tab/window. |
Posix Clients | HPE Ezmeral Data Fabric POSIX clients allow Docker to read and write directly and securely on the filesystem exposed by HPE Ezmeral Data Fabric FUSE (Filesystem in Userspace). |
Look at Turn on HPE Ezmeral Data Fabric tracing to collect more information, as described here (link opens an external website in a new browser tab/window. |
AdminApp | This is the web application that allows users and administrators to control and configure an HPE Ezmeral Data Fabric cluster. |
Look at The admin application is normally controlled by the Warden process, which should restart it if it fails. The primary repair action is to tell the warden on the appropriate node to restart this service. |
Zookeeper | ZooKeeper is a coordination service for distributed applications. It provides a shared hierarchical namespace that is organized like a standard file system. | Look at /opt/mapr/zookeeper/zookeeper-3.4.11/logs/ . |
Fileserver | The mapr-fileserver service is the actual
process that stores data on disks. This service needs to be running
on every machine that is storing data. Having more file servers
running will increase both failure tolerance and overall I/O
bandwidth. |
Look at The warden will try three times to restart the service
automatically. After an interval (30 minutes by default), the
warden will again try three times to restart the service. The
interval can be configured using the parameter |
Removing staleid node records from the MapR cluster |
A
Stale IDs will appear in the output as shown here:
|
Execute the following command to delete the
Do not delete the corresponding valid host entry, which does not
have the |