Installing Hadoop and YARN

This topic describes how to use package managers to download and install Hadoop and YARN services from the EEP repository.

About the Hadoop and YARN Packages

Beginning with core 6.2.0 and EEP 7.0.0, Hadoop and YARN services are no longer included in the data-fabric repository for core packages. They are provided as ecosystem components in the EEP repository. For example:

Old location: https://package.ezmeral.hpe.com/releases/v<version>/redhat/

New location: https://package.ezmeral.hpe.com/releases/MEP/MEP-<version>/redhat/

In addition, some new packages have been added. The following table describes each package:
Package Description
mapr-hadoop-util This package is new for Release 6.2.0. This package contains the essential libraries to run hadoop fs and hadoop mfs shell commands, plus the minimal required Hadoop libraries for core to be able to function. On a data-fabric core node, mapr-hadoop-util is the minimal package you need to install for Hadoop shell commands, and for data-fabric operations, such as maprlogin, to work. mapr-core and mapr-hadoop-client automatically pull in mapr-hadoop-util, so you don't need to install it explicitly.
mapr-hadoop-client This package is new for Release 6.2.0. This package contains the Hadoop job clients (MR and YARN). Clients can submit jobs to a server running mapr-hadoop-core. mapr-hadoop-client is sufficient to run all hadoop mfs and hadoop fs commands, and submit MapReduce jobs to whichever server is running mapr-hadoop-core.
mapr-hadoop-core This package contains all the required libraries to run MapReduce jobs locally. Installing mapr-hadoop-core installs mapr-hadoop-client and mapr-hadoop-util as dependencies.
mapr-nodemanager Installs the NodeManager service. This package installs mapr-hadoop-core as a dependency.
mapr-resourcemanager Installs the ResourceManager service. This package installs mapr-hadoop-core as a dependency.
mapr-historyserver Installs the HistoryServer service. This package installs mapr-hadoop-core as a dependency.
mapr-timelineserver Installs the TimelineServer service.
mapr-httpfs Installs the HttpFS service. Beginning with EEP 9.0.0, HttpFS is part of Hadoop.

Note that the mapr-mapreduce2 package has been removed and is no longer available. mapr-hadoop-core obsoletes the mapr-mapreduce2 package. All the contents of mapr-mapreduce2 are now part of mapr-hadoop-core.

For package dependency information, see Package Dependencies.

Where to Install the Packages

Before installing Hadoop and YARN, you should plan which cluster nodes should run each service. The following table describes where to install the packages:
On these nodes Install these packages

All nodes where you need access to the file system

mapr-hadoop-util

Designated nodes where Hadoop or YARN services are needed (install only the packages you need)

mapr-nodemanager

mapr-resourcemanager

mapr-historyserver

mapr-timelineserver

mapr-httpfs

Nodes where Hadoop or YARN services are installed mapr-hadoop-core
Client nodes and nodes where applications will be launched mapr-hadoop-client

Installing Hadoop and YARN Packages

The following steps use the operating-system package managers to download and install Hadoop and YARN packages from the EEP repository:

  1. Change to the root user or use sudo:
    • On RHEL, CentOS, or Oracle Linux, use the yum command to install the services that you want to run on the node.

      Syntax

      yum install <package_name> <package_name> <package_name> 
      Example
      yum install mapr-hadoop-util mapr-nodemanager mapr-httpfs
    • On SLES, use the zypper command to install the services that you want to run on the node. (SLES support might be limited; for more information, see Operating System Support Matrix.)

      Syntax

      zypper install <package_name> <package_name> <package_name>
      Example
      zypper install mapr-hadoop-util mapr-nodemanager mapr-httpfs
    • On Ubuntu, use the apt-get commands to update the Ubuntu package cache and install the services that you want to run on the node.
      1. Update the Ubuntu package cache:
        apt-get update 
      2. Install the services:

        Syntax

        apt-get install <package_name> <package_name> <package_name> 
        Example
        apt-get install mapr-hadoop-util mapr-nodemanager mapr-httpfs
  2. On each node, run configure.sh with the -R option. Include the -TL option if the timeline server is installed on the cluster. For example:
    configure.sh -R -HS <hostname> -TL <hostname>