Cluster Alarms

Cluster alarms indicate problems that affect the cluster as a whole. The following sections describe the Data Fabric cluster alarms.

CLDB Low Memory Alarm

UI Column
Cluster freespace above CLDB heapsize
Logged As
CLUSTER_ALARM_CLDB_HEAPSIZE
Meaning
The CLDB process needs more memory to cache containers.
Resolution
The CLDB heap size is no longer sufficient for the CLDB to cache containers. The solution is to increase the CLDB memory settings on all CLDB nodes, using the same value for the minimum and maximum heap sizes. The text the alarm code provides will include the minimum amount of memory required to be sufficient; however, to accommodate future growth, you should set these values to a somewhat higher number. For example, if the alarm indicates that the CLDB needs 4000 MB, you should set the minimum and maximum heap sizes to a larger value such as 4400 MB.

The CLDB memory settings are controlled by the following parameters in the warden.conf file located in $MAPR_HOME/conf/::

service.command.cldb.heapsize.max=<max heap size> service.command.cldb.heapsize.min=<min heap size>

Restart the Warden service on each CLDB node after you edit the warden.conf file.

License Near Expiration

UI Column
License Near Expiration Alarm
Logged As
CLUSTER_ALARM_LICENSE_NEAR_EXPIRATION
Meaning
The Enterprise Edition license associated with the cluster is within 30 days of expiration.
Resolution
Renew the Enterprise Edition license.
Configuration
Configurable at cluster level. See Configuring the Alarm Threshold Using the CLI for more information.

License Expired

UI Column
License Expiration Alarm
Logged As
CLUSTER_ALARM_LICENSE_EXPIRED
Meaning
The Enterprise Edition license associated with the cluster has expired. Enterprise Edition features have been disabled.
Resolution
Renew the Enterprise Edition license.

Cluster Almost Full

UI Column
Cluster Almost Full
Logged As
CLUSTER_ALARM_CLUSTER_ALMOST_FULL
Meaning
The cluster storage is almost full. The percentage of storage used before this alarm is triggered is 90% by default, and is controlled by the configuration parameter cldb.cluster.almost.full.percentage.
Resolution
Reduce the amount of data stored in the cluster. If the cluster storage is less than 90% full, check the cldb.cluster.almost.full.percentage parameter via the config load command, and adjust it if necessary via the config save command.
Configuration
Configurable at cluster level. See Configuring the Alarm Threshold Using the CLI for more information.

Cluster Full

UI Column
Cluster Full
Logged As
CLUSTER_ALARM_CLUSTER_FULL
Meaning
The cluster storage is full. MapReduce operations have been halted.
Resolution
Free up some space on the cluster.

Maximum Licensed Nodes Exceeded alarm

UI Column
Licensed Nodes Exceeded Alarm
Logged As
CLUSTER_ALARM_LICENSE_MAXNODES_EXCEEDED
Meaning
The cluster has exceeded the number of nodes specified in the license.
Resolution
Remove some nodes, or upgrade the license to accommodate the added nodes.

New Cluster Features Disabled

UI Column
New Cluster Features Disabled
Logged As
CLUSTER_ALARM_NEW_FEATURES_DISABLED
Meaning
Features added in version 2.0 or 3.0 are not enabled on the cluster.
Resolution
Enable the latest features for the data-fabric version that you are currently running.

Upgrade in Progress

UI Column
Software Installation & Upgrades
Logged As
CLUSTER_ALARM_UPGRADE_IN_PROGRESS
Meaning
A rolling upgrade of the cluster is in progress.
Resolution
No action is required. Performance may be affected during the upgrade, but the cluster should still function normally. After the upgrade is complete, the alarm is cleared.

VIPAssignment Failure

UI Column
VIP Assignment Alarm
Logged As
CLUSTER_ALARM_UNASSIGNED_VIRTUAL_IPS
Meaning
Core software was unable to assign a VIP to any NFS servers.
Resolution
Check the VIP configuration, and make sure at least one of the NFS servers in the VIP pool are up and running. See Setting Up VIPs for NFS. This alarm can also indicate that a VIP's hostname exceeds the maximum allowed length of 16. Check the log file /opt/mapr/logs/nfsmon.log for additional information.

DARE Enabled

UI Column
DARE Enabled Alarm
Logged As

CLUSTER_ALARM_DARE_COPY_MASTER_KEY

Meaning
Data-at-rest encryption (DARE) is enabled on the cluster.
Resolution
When DARE is enabled on the cluster, a data-at-rest encryption master key file is generated and stored in the /opt/mapr/conf/tokens folder on the CLDB node. Before dismissing the alarm, make a backup of the /opt/mapr/conf/tokens folder. For an upgraded cluster, you must also back up the dare.master.key stored in /opt/mapr/conf/. Loss of the master key file or the /opt/mapr/conf/tokens folder can be catastrophic and irreversible and might result in loss of data.

DARE Incompatible

UI Column
DARE Incompatible Alarm
Logged As
CLUSTER_ALARM_DARE_INCOMPATIBLE
Meaning
Not all nodes on the cluster are enabled for data-at-rest encryption (DARE).
Resolution
When DARE is enabled on certain nodes in the cluster, there may still be some nodes that are not (yet) enabled for DARE. Enable DARE on all the nodes before dismissing the alarm.

Too Many Snapshots

UI Column
Too Many Snapshots
Logged As
CLUSTER_ALARM_TOO_MANY_SNAPSHOT_CONTAINERS
Meaning
There are too many snapshots on this cluster.
Resolution
Delete snapshots from the cluster before dismissing the alarm.

Service Endpoints changed

UI Column
Either of the following:
  • API Server endpoints changed for <cluster-name>
  • API Server endpoints changed for <cluster-name>. Added cluster <cluster-name>.
  • API endpoints changed. Updated from Cluster Group primary.
  • API Server endpoints changed by updating ClusterGroup Primary.
  • API Server endpoints changed. Removed cluster <cluster-name>
  • API Server endpoints changed by resetTable.
Logged As
CLUSTER_ALARM_CLUSTERGROUP_ENDPOINTS_UPDATED
Meaning
This alarm is an information alarm. The alarm indicates a change in API server endpoint(s), that is, IP address(es).
NOTE
The API server endpoints/IP addresses may be required by Data Fabric for high availability.
API server endpoints would change if one of the following occurs:
  • A new cluster is added to the cluster group/global namespace
  • An existing cluster is removed from the cluster group/global namespace
  • The primary cluster for the cluster group/global namespace changes
  • An API server is added to or removed from any of the member clusters of the cluster group
Resolution
Download the updated API server endpoints from the UI by following the instructions given on Viewing the Fabric Endpoint or by using the clustergroup get cgtable command. Dismiss the alarm manually on the UI to turn the alarm off.

Insights running in trial mode

UI Column
Insights running in trial mode
Logged As
CLUSTER_ALARM_INSIGHTS_TRIAL_MODE
Meaning
The insights feature that is enabled on the cluster is running in trial mode with Hive Metastore using the Derby RDBMS to store insights table metadata.
Resolution
Associate the Hive metastore with a production grade RDBMS such as PostgreSQL or MySQL. The alarm will be cleared once this is done.