hadoop jar

The hadoop jar command runs a program contained in a JAR file. Users can bundle their MapReduce code in a JAR file and execute it using this command.

Syntax

hadoop jar <jar>
     [<arguments>]

Parameters

The following commands parameters are supported for hadoop jar:

Parameter

Description

<jar>

The JAR file.

<arguments>

Arguments to the program specified in the JAR file.

Examples

Streaming Application

Hadoop streaming applications are run using the hadoop jar command. The Hadoop streaming utility enables you to create and run MapReduce applications with any executable or script as the mapper and/or the reducer.

$ hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
    -input myInputDirs \
    -output myOutputDir \
    -mapper org.apache.hadoop.mapred.lib.IdentityMapper \
    -reducer /bin/wc

The -input, -output, -mapper, and -reducer streaming command options are all required for streaming jobs. Either an executable or a Java class may be used for the mapper and the reducer. For more information about and examples of streaming applications, see Hadoop Streaming at the Apache project's page.

Running from a JAR file

The simple Word Count program is another example of a program that is run using the hadoop jar command. The wordcount functionality is built into the hadoop-0.20.2-dev-examples.jar program. You pass the file, along with the location, to Hadoop with the hadoop jar command and Hadoop reads the JAR file and executes the relevant instructions.

The Word Count program reads files from an input directory, counts the words, and writes the results of the application to files in an output directory.

$ hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar wordcount /myvolume/in /myvolume/out