Running Jobs on the WebHCat Server
About this task
REST Calls in WebHCat
The base URI for REST calls in WebHCat is
http://<host>:<port>/templeton/v1/
. The following
table lists elements appended to the base URI and DDL commands.
URI |
Description |
---|---|
Server Information |
|
/status |
Shows WebHCat server status. |
/version |
Shows WebHCat server version. |
DDL Commands |
|
/ddl/database |
List existing databases. |
/ddl/database/<mydatabase> |
Shows properties for the database named mydatabase. |
/ddl/database/<mydatabase>/table |
Shows tables in the database named mydatabase. |
/ddl/database/<mydatabase>/table/<mytable> |
Shows the table definition for the table named mytable in the database named mydatabase. |
/ddl/database/<mydatabase>/table/<mytable>/property |
Shows the table properties for the table named mytable in the database named mydatabase. |
Launching a MapReduce Job with WebHCat
About this task
TempletonControllerJob
, has one map task. The map task launches
the actual job from the REST API call. Check the status of both jobs and the output
directory contents.Procedure
-
Copy the MapReduce example job to the MapRFS layer:
hadoop fs -put /opt/mapr/hadoop/hadoop-<version>/hadoop-<version>-dev-examples.jar /user/mapr/webhcat/examples.jar
-
Use the
curl
utility to launch the job:curl -s -d jar=examples.jar -d class="terasort" -d arg=teragen.test -d arg=whop3 'http://localhost:50111/templeton/v1/mapreduce/jar?user.name=<username>'
Launching a Streaming MapReduce Job with WebHCat
Procedure
-
Use the
curl
utility to launch the job:curl -s -d arg=teragen.test -d output=mycounts -d mapper=/bin/cat -d reducer="/usr/bin/wc -w" 'http://localhost:50111/templeton/v1/mapreduce/streaming?user.name=<username>'
- Check the job status for both WebHCat jobs at the jobtracker page in the Control System.
Launching a Pig Job with WebHCat
Procedure
-
Copy a data file into MapRFS:
hadoop fs -put $HIVE_HOME/examples/files/kv1.txt /user/<user name>/
-
Create a
test.pig
file with the following contents:A = LOAD 'kv1.txt' using PigStorage('\u0001') AS(key:INT, value:chararray); STORE A INTO 'pig.output';
-
Copy the
test.pig
file into MapR filesystem:hadoop fs -put test.pig /user/<user name>/
-
Run the Pig REST API command:
curl -s -d file=test1.pig -d arg=-v 'http://localhost:50111/templeton/v1/pig?user.name=<username>'
-
Monitor the contents of the
pig.output
directory. -
Check the JobTracker page for two jobs:
TempletonControllerJob
andPigLatin
.
Launching a Hive Job with WebHCat
Procedure
-
Create a table:
curl -s -d execute="create+external+table+ext3(t+TIMESTAMP)+location /user/<user name>/ext3'" 'http://localhost:50111/templeton/v1/hive?user.name=<username>'
-
Load data into the table:
curl -s -d execute="insert+overwrite+table+ext3+select+*+from+datetable" 'http://localhost:50111/templeton/v1/hive?user.name=<username>'
-
List the tables:
curl -s -d execute="show+tables" -d statusdir='hive.output' 'http://localhost:50111/templeton/v1/hive?user.name=<username>'
The list of tables is in
hive.output/stdout
.
The Job Queue
About this task
To show HCatalog jobs for a particular user, navigate to the following address:
http://<hostname>:<port>/templeton/v1/queue/?user.name=<username>
The default port for HCatalog is 50111.