Cluster Alarms

Cluster alarms indicate problems that affect the cluster as a whole. The following sections describe the Data Fabric cluster alarms.

CLDB Low Memory Alarm

UI Column

Cluster freespace above CLDB heapsize

Logged As

CLUSTER_ALARM_CLDB_HEAPSIZE

Meaning

The CLDB process needs more memory to cache containers.

Resolution

The CLDB heap size is no longer sufficient for the CLDB to cache containers. The solution is to increase the CLDB memory settings on all CLDB nodes, using the same value for the minimum and maximum heap sizes. The text the alarm code provides will include the minimum amount of memory required to be sufficient; however, to accommodate future growth, you should set these values to a somewhat higher number. For example, if the alarm indicates that the CLDB needs 4000 MB, you should set the minimum and maximum heap sizes to a larger value such as 4400 MB.

The CLDB memory settings are controlled by the following parameters in the warden.conf file located in $MAPR_HOME/conf/::

service.command.cldb.heapsize.max=<max heap size> service.command.cldb.heapsize.min=<min heap size>

Restart the Warden service on each CLDB node after you edit the warden.conf file.

License Near Expiration

UI Column: License Near Expiration Alarm
Logged As: CLUSTER_ALARM_LICENSE_NEAR_EXPIRATION
Meaning: The Enterprise Edition license associated with the cluster is within 30 days of expiration.
Resolution: Renew the Enterprise Edition license.
Configuration: Configurable at cluster level. See Configuring the Alarm Threshold Using the CLI for more information.

License Expired

UI Column: License Expiration Alarm
Logged As: CLUSTER_ALARM_LICENSE_EXPIRED
Meaning: The Enterprise Edition license associated with the cluster has expired. Enterprise Edition features have been disabled.
Resolution: Renew the Enterprise Edition license.

Cluster Almost Full

UI Column: Cluster Almost Full
Logged As: CLUSTER_ALARM_CLUSTER_ALMOST_FULL
Meaning: The cluster storage is almost full. The percentage of storage used before this alarm is triggered is 90% by default, and is controlled by the configuration parameter cldb.cluster.almost.full.percentage.
Resolution: Reduce the amount of data stored in the cluster. If the cluster storage is less than 90% full, check the cldb.cluster.almost.full.percentage parameter via the config load command, and adjust it if necessary via the config save command.
Configuration: Configurable at cluster level. See Configuring the Alarm Threshold Using the CLI for more information.

Cluster Full

UI Column: Cluster Full
Logged As: CLUSTER_ALARM_CLUSTER_FULL
Meaning: The cluster storage is full. MapReduce operations have been halted.
Resolution: Free up some space on the cluster.

Maximum Licensed Nodes Exceeded alarm

UI Column: Licensed Nodes Exceeded Alarm
Logged As: CLUSTER_ALARM_LICENSE_MAXNODES_EXCEEDED
Meaning: The cluster has exceeded the number of nodes specified in the license.
Resolution: Remove some nodes, or upgrade the license to accommodate the added nodes.

New Cluster Features Disabled

UI Column: New Cluster Features Disabled
Logged As: CLUSTER_ALARM_NEW_FEATURES_DISABLED
Meaning: Features added in version 2.0 or 3.0 are not enabled on the cluster.
Resolution: Enable the latest features for the data-fabric version that you are currently running.

Upgrade in Progress

UI Column: Software Installation & Upgrades
Logged As: CLUSTER_ALARM_UPGRADE_IN_PROGRESS
Meaning: A rolling upgrade of the cluster is in progress.
Resolution: No action is required. Performance may be affected during the upgrade, but the cluster should still function normally. After the upgrade is complete, the alarm is cleared.

VIPAssignment Failure

UI Column: VIP Assignment Alarm
Logged As: CLUSTER_ALARM_UNASSIGNED_VIRTUAL_IPS
Meaning: Core software was unable to assign a VIP to any NFS servers.
Resolution: Check the VIP configuration, and make sure at least one of the NFS servers in the VIP pool are up and running. See Setting Up VIPs for NFS. This alarm can also indicate that a VIP's hostname exceeds the maximum allowed length of 16. Check the log file /opt/mapr/logs/nfsmon.log for additional information.

DARE Enabled

UI Column: DARE Enabled Alarm
Logged As: CLUSTER_ALARM_DARE_COPY_MASTER_KEY
Meaning: Data-at-rest encryption (DARE) is enabled on the cluster.
Resolution: When DARE is enabled on the cluster, a data-at-rest encryption master key file is generated and stored in the /opt/mapr/conf/tokens folder on the CLDB node. Before dismissing the alarm, make a backup of the /opt/mapr/conf/tokens folder. For an upgraded cluster, you must also back up the dare.master.key stored in /opt/mapr/conf/. Loss of the master key file or the /opt/mapr/conf/tokens folder can be catastrophic and irreversible and might result in loss of data.

DARE Incompatible

UI Column: DARE Incompatible Alarm
Logged As: CLUSTER_ALARM_DARE_INCOMPATIBLE
Meaning: Not all nodes on the cluster are enabled for data-at-rest encryption (DARE).
Resolution: When DARE is enabled on certain nodes in the cluster, there may still be some nodes that are not (yet) enabled for DARE. Enable DARE on all the nodes before dismissing the alarm.

Too Many Snapshots

UI Column: Too Many Snapshots
Logged As: CLUSTER_ALARM_TOO_MANY_SNAPSHOT_CONTAINERS
Meaning: There are too many snapshots on this cluster.
Resolution: Delete snapshots from the cluster before dismissing the alarm.

Service Endpoints changed

UI Column

Either of the following:

API Server endpoints changed for <cluster-name>
API Server endpoints changed for <cluster-name>. Added cluster <cluster-name>.
API endpoints changed. Updated from Cluster Group primary.
API Server endpoints changed by updating ClusterGroup Primary.
API Server endpoints changed. Removed cluster <cluster-name>
API Server endpoints changed by resetTable.

Logged As

CLUSTER_ALARM_CLUSTERGROUP_ENDPOINTS_UPDATED

Meaning

This alarm is an information alarm. The alarm indicates a change in API server endpoint(s), that is, IP address(es).

NOTE

The API server endpoints/IP addresses may be required by Data Fabric for high availability.

API server endpoints would change if one of the following occurs:

A new cluster is added to the cluster group/global namespace
An existing cluster is removed from the cluster group/global namespace
The primary cluster for the cluster group/global namespace changes
An API server is added to or removed from any of the member clusters of the cluster group

Resolution

Download the updated API server endpoints from the UI by following the instructions given on Viewing the Fabric Endpoint or by using the clustergroup get cgtable command. Dismiss the alarm manually on the UI to turn the alarm off.

Insights running in trial mode

UI Column: Insights running in trial mode
Logged As: CLUSTER_ALARM_INSIGHTS_TRIAL_MODE
Meaning: The insights feature that is enabled on the cluster is running in trial mode with Hive Metastore using the Derby RDBMS to store insights table metadata.
Resolution: Associate the Hive metastore with a production grade RDBMS such as PostgreSQL or MySQL. The alarm will be cleared once this is done.

HPE Ezmeral Data Fabric – Customer-Managed 7.9.0 Documentation
Abstract	This site contains documentation for the customer-managed platform of the HPE Ezmeral Data Fabric version 7.9.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	April 2025
Edition	7.9.0
Topic last updated	2024-08-29