Monitoring the Cluster
Explains how to view the cluster health, disk, memory, CPU utilization metrics, and alarms on the cluster using either the Control System or the CLI.
Monitoring Cluster Health Using the Control System
Procedure
The Overview page displays the following panes:
- Node Health — the health of the nodes on the cluster, by service (default) or topology
- Active Alarms — a summary of active alarms for the cluster
- Cluster Utilization — CPU, memory, and disk space usage
- Yarn — the number of running and queued applications, number of Node Managers, and percent of memory and CPU's used relative to the amount configured
Viewing Cluster Utilization Information on the Control System
About this task
The Cluster Utilization pane in the Overview page displays the following for:
- CPU — Percentage of cores currently utilized and total cores
- Memory — Percentage of memory (in GB) currently utlized and total memory (in GB)
- Disk — Percentage of space (in GB) currently utilized and total disk space (in GB)
The Cluster Utilization pane also shows the amount of raw data and the savings (in percentage) after compression.
The Utilization Trend pane shows CPU, memory, and disk usage trend for the last 24 hours by default. You can select a preset (shown in the following screenshot) or specify a custom time range (shown in the following screenshot).
You can zoom in (by clicking and dragging the cursor in the pane) for a more granular view. Click Reset Zoom to zoom out and return to selected date/time range view. If there were any alarms during the selected date/time range, the Alarms pane above shows:
- When the alarm was raised
- The severity of the alarm
- — an error
- — a warning
- — information
Monitoring Cluster Alarms on the Control System
About this task
See Viewing Active Cluster Alarms for more information.
Retrieving Cluster Information Using the CLI or REST API
About this task
The basic command to retrieve cluster health and disk space information is:
maprcli dashboard info -cluster <cluster>
utilization
field in the output shows the total and utilized
amount of disk space, memory, and CPU for the cluster, which can also be
visualized on the Control System. For example:
# /opt/mapr/bin/maprcli dashboard info -json
{
"timestamp":1525230746268,
"timeofday":"2018-05-01 08:12:26.268 GMT-0700 PM",
"status":"OK",
"total":1,
"data":[
{
...
"utilization":{
"cpu":{
"util":7,
"total":8,
"active":0
},
"memory":{
"total":15886,
"active":11281
},
"disk_space":{
"total":273,
"active":0
},
"compression":{
"compressed":0,
"uncompressed":0
},
"tiering":{
"logicalUsed":0,
"replicatedLogicalUsed":0,
"replicatedTotalUsed":0,
"ecTotalUsed":0,
"cvTotalUsed":0,
"offloaded":0,
"recalled":0
}
},
...
}
]
}
For information on all the fields returned by this command, see dashboard info
.