yarn-site.xml

Describes the YARN configuration options.

YARN configuration options are stored in the /opt/mapr/hadoop/hadoop-2.x.x/etc/hadoop/yarn-site.xml file and are editable by the root user. This file contains configuration information that overrides the default values for YARN parameters. Overrides of the default values for core configuration properties are stored in the Default YARN parameters file.

To override a default value for a property, specify the new value within the <configuration> tags, using the following format:

<property>
 <name> </name>
   <value> </value>
 <description> </description>
</property>

The following configuration lists describe the possible entries that you can place between the <name> tags and between the <value> tags. The <description> tag is optional but recommended for maintainability.

Configuration for ResourceManager

Comprises the following parameters:
yarn.resourcemanager.hostname
The hostname of the ResourceManager.

The configure.sh command automatically sets this value to the IP address that you provide with the -RM option.

Default value: {IP Address}

yarn.resourcemanager.scheduler.address
The hostname and port of the Scheduler Interface.

Example value: ${yarn.resourcemanager.hostname}:8030

yarn.resourcemanager.resource-tracker.address
The hostname and port of the Resource Manager.

Example value: ${yarn.resourcemanager.hostname}:8025

yarn.resourcemanager.address
The address of the Applications Manager interface that is contained in the Resource Manager.

Example value: ${yarn.resourcemanager.address}:8041

Configuration for NodeManager

Comprises the following parameters:
yarn.nodemanager.container-localizer.log.level
Default Value:INFO
Description:You can change the log level for the container localizer by setting the configuring options in this property. Different configuring options available are INFO, DEBUG, and WARN. By default logs will be available in the Application Master logs location but based on your cluster configuration, they will be available in the application’s localized log directory. This functionality is available by default starting in EEP 7.1.0. For previous EEP versions, request the patch. See Applying a Patch.
yarn.nodemanager.max-retry-file-delete
Default Value: 2
Description: Defines how many times the NodeManager can attempt to delete application-related directories from a volume when Spark is configured to use the mounted NFS directory instead of the /tmp directory on the local filesystem. Increasing the value for this property can prevent application cache data from accumulating in the volume. This functionality is available by default starting in EEP 7.1.0. For previous EEP versions, request the patch. See Applying a Patch.
yarn.nodemanager.kill-container-child-process
Default Value: false
Description: Enables NodeManager to automatically run the kill -9 command to end processes that hang after YARN stops containers. Set to true to enable this behavior. This functionality is available by default starting in EEP 7.1.0. For previous EEP versions, request the patch. See Applying a Patch.
yarn.nodemanager.container-executor.class
Default Value: org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor
Description: Identifies how containers are executed.
Set to LinuxContainerExecutor by default, so that jobs can run as the user that submits the job.
NOTE
If a system user (a user with userID<500) wants to submit a job, you must add the user in the container-executor.cfg file. The user mapr is already configured as an allowed system user.
yarn.nodemanager.aux-services
Default Value: mapreduce_shuffle, mapr_direct_shuffle
Description: Selects a shuffle service that needs to be set for MapReduce to run.
yarn.nodemanager.aux-services.mapreduce_shuffle.class
Default Value: org.apache.hadoop.mapred.ShuffleHandler
Description: This property, in conjunction with other properties, sets direct shuffle as the default shuffle for MapReduce.
yarn.nodemanager.aux-services.mapr_direct_shuffle.class
Default Value: com.mapr.hadoop.mapred.LocalVolumeAuxService
Description: This property, in conjunction with other properties, sets direct shuffle as the default shuffle for MapReduce.

Configuration for Timeline Server Security with MapR-SASL

Comprises the following parameter:
yarn.timeline-service.http-authentication.type
Default Value: com.mapr.security.maprauth.MaprDelegationTokenAuthenticationHand ler
Description: The authentication used for the timeline server HTTP endpoint.

Configuration for Timeline Server Security with Kerberos

Comprises the following parameter:
yarn.timeline-service.http-authentication.type
Default Value: com.mapr.security.maprauth.MaprDelegationTokenAuthenticationHand ler
Description: The authentication used for the timeline server HTTP endpoint.
yarn.timeline-service.http-authentication.kerberos.principal
Default Value: principal(HTTP/nodex@NODEx)
Description: The Kerberos service principal for the timeline server HTTP endpoint.
yarn.timeline-service.http-authentication.kerberos.keytab
Default Value: path to keytab(/opt/mapr/conf/mapr.keytab)
Description: The Kerberos keytab for the timeline server HTTP endpoint.
yarn.timeline-service.principal
Default Value: mapr/nodex@NODEX
Description: The Kerberos principal for the timeline reader. NodeManager principal is used for the timeline collector as it runs as an auxiliary service inside NodeManager.
yarn.timeline-service.keytab
Default Value: path to keytab(/opt/mapr/conf/mapr.keytab)
Description: The Kerberos keytab for the timeline reader. NodeManager keytab is used for the timeline collector as it runs as an auxiliary service inside NodeManager.

Configuration for MapReduce

Comprises the following parameter:
mapreduce.job.shuffle.provider.services
Default Value: mapr_direct_shuffle
Description: This is the default shuffle handler for MapReduce. Contains a value from the yarn.nodemanager.aux-services property.

Configuration for Container Logs

Comprises the following parameters:
yarn.nodemanager.log-dirs
Default Value: /opt/mapr/hadoop/hadoop-<version>/logs/userlogs/<applicationID>/<containerID>/<filename>.log
Description: The location to store container logs on the node. An application's log directory is ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories are named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container.
NOTE
You can find the application ID associated with your job in the Control System.
yarn.log-aggregation-enable
Default Value: false
Description: Indicates whether the logs are aggregated.
yarn.nodemanager.log.retain-seconds
Default Value: 10800 (3 hours)
Description: Specifies the duration for which user logs are maintained, when log aggregation is disabled.
yarn.log-aggregation.retain-seconds
Default Value: -1
Description: Specifies the number of seconds to retain logs, when log aggregation is enabled. The default value of -1, disables the deletion of logs.
yarn.log-aggregation.retain-check-interval-seconds
Default Value: -1
Description: The interval between aggregated log retention checks. If set to 0 or a negative value, then the value is computed as one-tenth of the aggregated log retention time.
NOTE
Setting this to a low value may cause unnecessary log retention checks.
yarn.nodemanager.remote-app-log-dir
Default Value: /tmp/logs
Description: The location on the filesystem where the logs are aggregated.
yarn.nodemanager.remote-app-log-dir-suffix
Default Value: logs
Description: The suffix for the directory that stores the aggregated logs for each user.

Configuration for Apache Shuffle

You can disable Direct Shuffle and enable Apache Shuffle for MapReduce applications through the following setting:
yarn.nodemanager.aux-services
Value: mapreduce_shuffle