Warden
Describes the Warden daemon that monitors and restarts services if they terminate.
Warden is a light Java application that runs on all the nodes in a cluster and coordinates cluster services. Warden’s job on each node is to start, stop, or restart the appropriate services, and allocate the correct amount of memory to them. Warden makes extensive use of the znode abstraction discussed in the ZooKeeper section of this document to monitor the state of cluster services.
Each service running in a cluster has a corresponding znode in the ZooKeeper namespace,
named in the pattern /services/<servicename>/<hostname>
. Warden’s
Watcher interface monitors znodes for changes and acts when a znode is created or deleted,
or when child znodes of a monitored znode are created or deleted.
Warden configuration is contained in the warden.conf
file, which lists
service triplets in the form <servicename>:<number of
nodes>:<dependencies>
. The number of nodes
element of this
triplet controls the number of concurrent instances of the service that can run on the
cluster. Some services are restricted to one running instance per cluster, while others,
such as the File Server, can run on every node. The Warden monitors changes to its
configuration file in real time.
When a configuration triplet lists another service as a dependency, the Warden only starts that service after the dependency service is running.
Memory Management with the Warden
System administrators can configure how the cluster’s memory is allocated to running the
operating system, file system, and Hadoop services. The
configuration files /opt/mapr/conf/warden.conf
and
/opt/mapr/conf/conf.d/warden.<servicename>.conf
include parameters
that define how much of the memory on a node is allocated to the operating system, file system, and Hadoop services.
- The
service.<servicename>.heapsize.percent
parameter controls the percentage of system memory allocated to the named service. - The
service.<servicename>.heapsize.max
parameter defines the maximum heapsize used when invoking the service. - The
service.<servicename>.heapsize.min
parameter defines the minimum heapsize used when invoking the service.
For example, the service.command.os.heapsize.percent
,
service.command.os.heapsize.max
, and
service.command.os.heapsize.min
parameters in the
warden.conf
file control the amount of memory that Warden allocates to
the host operating system before allocating memory to other services.
max(heapsize.min, min(heapsize.max, total-memory * heapsize.percent / 100))
For more information, see Memory Allocation for Nodes.
The Warden and Failover
- CLDB Failover
- The ZooKeeper contains a znode corresponding to the active primary CLDB. This znode is monitored by the secondary CLDBs. When the primary CLDB znode is deleted, the secondary CLDBs recognize that the primary CLDB is no longer running. The secondary CLDBs contact ZooKeeper in an attempt to become the new primary CLDB. The first CLDB to get a lock on the znode in ZooKeeper becomes the new primary instance.
- ResourceManager Failover
-
Starting in version 4.0.2, if the node running the ResourceManager fails and the Warden on the ResourceManager node is unable to restart it, Warden starts a new instance of the ResourceManager on another node. The Warden on every ResourceManager node watches the ResourceManager’s znode for changes. When the active ResourceManager’s znode is deleted, the Wardens on other ResourceManager nodes attempt to launch the ResourceManager. The Warden on each ResourceManager node works with the ZooKeeper to ensure that only one ResourceManager is running in the cluster.
In order for failover to occur in this manner, at least two nodes in the cluster should include the ResourceManager role and your cluster must be use the zero configuration failover implementation.
The Warden and Pluggable Services
Services can be plugged into the Warden’s monitoring infrastructure by setting up
an individual configuration file for each supported service in the
/opt/mapr/conf/conf.d
directory, named in the pattern
warden.<servicename>.conf
. The <servicename>:<number of
nodes>:<dependencies>
triplets for a pluggable service are stored in the
individual warden.<servicename>.conf
files, not in the main
warden.conf
file.
- Hue
- HTTP-FS
- The Hive metastore
- HiveServer2
- Spark-Master
- mapr-apiserver
- mapr-collectd
- mapr-drill
- mapr-elasticsearch
- mapr-fluentd
- mapr-grafana
- mapr-hbase
- mapr-hbasethrift
- mapr-historyserver
- mapr-hive
- mapr-hivemetastore
- mapr-hiveserver2
- mapr-hivewebchat
- mapr-httpfs
- mapr-hue
- mapr-impala
- mapr-impalacatalog
- mapr-impalaserver
- mapr-impalastore
- mapr-kafka
- mapr-kibana
- mapr-ksql
- mapr-livy
- mapr-nodemanager
- mapr-objectstore
- mapr-opentsdb
- mapr-resourcemanager
- mapr-schema
- mapr-sentry
- mapr-spark
- mapr-sqoop2
- mapr-storm
- mapr-tez
- mapr-timelineserver
- mapr-webserver
A package can contain multiple services. For example,
mapr-spark
contains all of Spark services including Spark Thrift Server
and Spark Master.
After you install a package and run the configure.sh utility, the associated Warden files are present in
/opt/mapr/conf/conf.d
.
The Warden daemon monitors the znodes for a configured component’s service and restarts the service as specified by the configuration triplet. The configuration file also specifies resource limits for the service, ports used by the service (if any), and a location for log files.
In the triplet
<servicename>:<number of nodes>:<dependencies>
, the
<number of nodes>
can be set to all
. The value
all
specifies that the service is to be started on every node on which
the service is installed.
For example, consider the entry
services=kvstore:all;cldb:all:kvstore;hoststats:all:kvstore
. This entry
specifies the following:
- Start
kvstore
on all the nodes on which it is installed. - Start
cldb
on all the nodes on which it is installed, but wait untilkvstore
is up on all nodes. In other words,cldb
depends onkvstore
to be up. - Start
hoststats
on all nodes on which it is installed but wait untilkvstore
is up on all nodes. In other words,hoststats
depends onkvstore
to be up.
As another example, consider the entry: resourcemanager:1:cldb
. Here,
only one instance of resourcemanager
is started, after
cldb
is up.
If this instance of resourcemanager
goes down, Warden notices that the number of running instances is below the specified count,
and automatically handles the failover. If multiple instances of
resourcemanager
get started, Warden terminates all the extra
instances.
Dependencies are usually handled internally. Some non-core components do have dependencies among themselves, such as for example:
services=nodemanager:all:resourcemanager
hbmaster:all:cldb
hbregionserver:all:hbmaster
Here:
nodemanager
depends onresourcemanager
hbmaster
depends oncldb
hbregionserver
depends onhbmaster