Describes the YARN configuration options.
YARN configuration options are stored in the
/opt/mapr/hadoop/hadoop-2.x.x/etc/hadoop/yarn-site.xml
file and are
editable by the root
user. This file contains configuration information
that overrides the default values for YARN parameters. Overrides of the default values for
core configuration properties are stored in the Default YARN parameters file.
To override a default value for a property, specify the new value within the
<configuration>
tags, using the following format:
<property>
<name> </name>
<value> </value>
<description> </description>
</property>
The following configuration lists describe the possible entries that you can place between
the <name>
tags and between the <value>
tags. The
<description>
tag is optional but recommended for maintainability.
Configuration for ResourceManager
Comprises the following parameters:
- yarn.resourcemanager.hostname
- The hostname of the ResourceManager.
The configure.sh command automatically sets this value to the IP address that
you provide with the -RM
option.
Default value:
{IP Address}
- yarn.resourcemanager.scheduler.address
- The hostname and port of the Scheduler Interface.
Example value:
${yarn.resourcemanager.hostname}:8030
- yarn.resourcemanager.resource-tracker.address
- The hostname and port of the Resource Manager.
Example value:
${yarn.resourcemanager.hostname}:8025
- yarn.resourcemanager.address
- The address of the Applications Manager interface that is contained in the Resource
Manager.
Example value:
${yarn.resourcemanager.address}:8041
Configuration for NodeManager
Comprises the following parameters:
- yarn.nodemanager.container-localizer.log.level
- Default Value:INFO
- Description:You can change the log level for the container localizer by
setting the configuring options in this property. Different configuring options
available are INFO, DEBUG, and WARN. By default logs will be available in the
Application Master logs location but based on your cluster configuration, they will be
available in the application’s localized log directory. This functionality is
available by default starting in EEP 7.1.0. For previous EEP versions, request the patch. See Applying a Patch.
- yarn.nodemanager.max-retry-file-delete
- Default Value: 2
- Description: Defines how many times the NodeManager can attempt to delete
application-related directories from a volume when Spark is configured to use the
mounted NFS directory instead of the /tmp directory on the local filesystem.
Increasing the value for this property can prevent application cache data from
accumulating in the volume. This functionality is available by default starting in
EEP 7.1.0. For previous
EEP versions, request the
patch. See Applying a Patch.
- yarn.nodemanager.kill-container-child-process
- Default Value:
false
- Description: Enables NodeManager to automatically run the
kill
-9
command to end processes that hang after YARN stops containers. Set to
true
to enable this behavior. This functionality is available by
default starting in EEP 7.1.0.
For previous EEP versions,
request the patch. See Applying a Patch.
- yarn.nodemanager.container-executor.class
- Default Value:
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
- Description: Identifies how containers are executed.
Set to
LinuxContainerExecutor
by default, so that jobs can run as the
user that submits the job.
NOTE
If a system user (a user with
userID<500)
wants to submit a job, you must add the user in the
container-executor.cfg
file. The user
mapr
is already configured as an allowed system
user.
- yarn.nodemanager.aux-services
- Default Value:
mapreduce_shuffle, mapr_direct_shuffle
- Description: Selects a shuffle service that needs to be set for MapReduce to
run.
- yarn.nodemanager.aux-services.mapreduce_shuffle.class
- Default Value:
org.apache.hadoop.mapred.ShuffleHandler
- Description: This property, in conjunction with other properties, sets
direct shuffle as the default shuffle for MapReduce.
- yarn.nodemanager.aux-services.mapr_direct_shuffle.class
- Default Value:
com.mapr.hadoop.mapred.LocalVolumeAuxService
- Description: This property, in conjunction with other properties, sets
direct shuffle as the default shuffle for MapReduce.
Configuration for Timeline Server Security with MapR-SASL
Comprises the following parameter:
- yarn.timeline-service.http-authentication.type
- Default Value:
com.mapr.security.maprauth.MaprDelegationTokenAuthenticationHand
ler
- Description: The authentication used for the timeline server HTTP endpoint.
Configuration for Timeline Server Security with Kerberos
Comprises the following parameter:
- yarn.timeline-service.http-authentication.type
- Default Value:
com.mapr.security.maprauth.MaprDelegationTokenAuthenticationHand
ler
- Description: The authentication used for the timeline server HTTP endpoint.
- yarn.timeline-service.http-authentication.kerberos.principal
- Default Value:
principal(HTTP/nodex@NODEx)
- Description: The Kerberos service principal for the timeline server HTTP
endpoint.
- yarn.timeline-service.http-authentication.kerberos.keytab
- Default Value:
path to
keytab(/opt/mapr/conf/mapr.keytab)
- Description: The Kerberos keytab for the timeline server HTTP endpoint.
- yarn.timeline-service.principal
- Default Value:
mapr/nodex@NODEX
- Description: The Kerberos principal for the timeline reader. NodeManager
principal is used for the timeline collector as it runs as an auxiliary service inside
NodeManager.
- yarn.timeline-service.keytab
- Default Value:
path to
keytab(/opt/mapr/conf/mapr.keytab)
- Description: The Kerberos keytab for the timeline reader. NodeManager keytab
is used for the timeline collector as it runs as an auxiliary service inside
NodeManager.
Configuration for Container Logs
Comprises the following parameters:
- yarn.nodemanager.log-dirs
- Default Value:
/opt/mapr/hadoop/hadoop-<version>/logs/userlogs/<applicationID>/<containerID>/<filename>.log
- Description: The location to store container logs on the node. An
application's log directory is
${yarn.nodemanager.log-dirs}/application_${appid}
. Individual
containers' log directories are named container_{$contid}
. Each
container directory will contain the files stderr, stdin, and
syslog generated by that container.
NOTE
You can find the application ID associated with your job in the Control
System.
- yarn.log-aggregation-enable
- Default Value: false
- Description: Indicates whether the logs are aggregated.
- yarn.nodemanager.log.retain-seconds
- Default Value: 10800 (3 hours)
- Description: Specifies the duration for which user logs are maintained, when
log aggregation is disabled.
- yarn.log-aggregation.retain-seconds
- Default Value: -1
- Description: Specifies the number of seconds to retain logs, when log
aggregation is enabled. The default value of -1, disables the deletion of
logs.
- yarn.log-aggregation.retain-check-interval-seconds
- Default Value: -1
- Description: The interval between aggregated log retention checks. If set to
0 or a negative value, then the value is computed as one-tenth of the
aggregated log retention time.
NOTE
Setting this to a low value may cause unnecessary
log retention checks.
- yarn.nodemanager.remote-app-log-dir
- Default Value: /tmp/logs
- Description: The location on the filesystem where the logs are
aggregated.
- yarn.nodemanager.remote-app-log-dir-suffix
- Default Value: logs
- Description: The suffix for the directory that stores the aggregated logs for
each user.
Configuration for Apache Shuffle
You can disable Direct Shuffle and enable Apache Shuffle for MapReduce applications through
the following setting:
- yarn.nodemanager.aux-services
- Value: mapreduce_shuffle
Configuration for aggregated logs
- hadoop.users.acl.mapping
- Indicates whether the user is permitted to access application logs generated by
other users. Allows access to applications that are either in a running state or have
completed execution, provided that log aggregation is enabled. Requires restart of the
ResourceManager, NodeManager and components, along with either cleanup of existing
aggregated logs or modification of the
hadoop.users.acl.force.init
property.
-
Value:
u:<user_name>,g:<group_name>=u:<app_user_name>,g:<app_group_name>;...
-
Example value:
u:userA=u:appUserA,u:appUserB;u:userB,g:groupB=u:appUserA,g:appGroupC
-
Description: Grant userA
read access to the logs or running
applications owned by appUserA
and appUserB
. Grant
log read access to userB
and all members of groupB
for running applications owned by appUserA
and members of
appGroupC
. If a user ACL is defined multiple times, only the most
recent definition is applied.
- hadoop.users.acl.force.init
-
Default value: false
-
Description: Specifies whether to force overwriting of ACEs for existing
aggregated log files during log aggregation. When this property is set to
false
, the user must manually delete all existing aggregated logs
after any change to the hadoop.users.acl.mapping
property. When set
to true
, YARN will automatically update all ACEs after the Resource
manager and Node manager are restarted. However, enabling this option may increase
the startup time of the Resource manager.
To reset all permissions to their default settings:
- Remove the
hadoop.users.acl.mapping
property from the
configuration file.
- Set the following properties to true:
hadoop.users.acl.force.init
yarn.log-aggregation-enable
- Restart the following services to apply the changes:
- ResourceManager
- NodeManager