Running MapReduce Jobs with HBase
About this task
$ hadoop jar /opt/mapr/hbase/hbase-1.1.13/lib/hbase-server-1.1.13.0-mapr-1912.jar export t1 /user/mapr/t1
or
$ hbase org.apache.hadoop.hbase.mapreduce.Export t1 /user/mapr/t4
The
result is the same because of the tools included in the hbase-server.jar
file:
$ hadoop fs -ls /user/mapr/t1/
Found 2 items
-rwxr-xr-x 3 mapr mapr 0 2019-11-11 15:00 /user/mapr/t1/_SUCCESS
-rw-r--r-- 3 mapr mapr 249 2019-11-11 15:00 /user/mapr/t1/part-m-00000
$ hadoop fs -ls /user/mapr/t4/
Found 2 items
-rwxr-xr-x 3 mapr mapr 0 2019-11-11 15:09 /user/mapr/t4/_SUCCESS
-rw-r--r-- 3 mapr mapr 249 2019-11-11 15:09 /user/mapr/t4/part-m-00000
$
Following is an example of the full
output:
$ hadoop jar /opt/mapr/hbase/hbase-1.1.13/lib/hbase-server-1.1.13.0-mapr-1912.jar export t1 /user/mapr/t1
19/11/11 14:59:41 INFO mapreduce.Export: versions=1, starttime=0, endtime=9223372036854775807, keepDeletedCells=false
19/11/11 14:59:42 INFO mapreduce.TableMapReduceUtil: Configured mapr.hbase.default.db hbase
19/11/11 14:59:42 INFO client.ConnectionFactory: ConnectionFactory receives mapr.hbase.default.db(hbase), set clusterType(HBASE_ONLY), user(mapr), hbase_admin_connect_at_construction(false)
19/11/11 14:59:42 INFO zookeeper.RecoverableZooKeeper: Process identifier=TokenUtil-getAuthToken connecting to ZooKeeper ensemble=node5.cluster.com:5181
19/11/11 14:59:43 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x2c306a57 connecting to ZooKeeper ensemble=node5.cluster.com:5181
19/11/11 14:59:43 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x100044f486eff26
19/11/11 14:59:45 INFO impl.TimelineClientImpl: Timeline service address: https://node5.cluster.com:8190/ws/v1/timeline/
19/11/11 14:59:45 INFO client.MapRZKBasedRMFailoverProxyProvider: Updated RM address to node5.cluster.com/192.168.33.15:8032
19/11/11 14:59:47 INFO client.ConnectionFactory: mapr.hbase.default.db unsetDB is neither MapRDB or HBase, set HBASE_MAPR mode since mapr client is installed.
19/11/11 14:59:47 INFO client.ConnectionFactory: ConnectionFactory receives mapr.hbase.default.db(unsetDB), set clusterType(HBASE_MAPR), user(mapr), hbase_admin_connect_at_construction(false)
19/11/11 14:59:47 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x6b63e6ad connecting to ZooKeeper ensemble=node5.cluster.com:5181
19/11/11 14:59:48 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
19/11/11 14:59:48 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x100044f486eff2a
19/11/11 14:59:48 INFO mapreduce.JobSubmitter: number of splits:1
19/11/11 14:59:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572957695341_0001
19/11/11 14:59:48 INFO mapreduce.JobSubmitter: Kind: HBASE_AUTH_TOKEN, Service: 9161aa11-2f19-4b20-82f8-9678db86e0a7, Ident: (org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier@0)
19/11/11 14:59:49 INFO security.ExternalTokenManagerFactory: Initialized external token manager class - com.mapr.hadoop.yarn.security.MapRTicketManager
19/11/11 14:59:51 INFO impl.YarnClientImpl: Submitted application application_1572957695341_0001
19/11/11 14:59:51 INFO mapreduce.Job: The url to track the job: https://node5.cluster.com:8090/proxy/application_1572957695341_0001/
19/11/11 14:59:51 INFO mapreduce.Job: Running job: job_1572957695341_0001
19/11/11 15:00:05 INFO mapreduce.Job: Job job_1572957695341_0001 running in uber mode : false
19/11/11 15:00:05 INFO mapreduce.Job: map 0% reduce 0%
19/11/11 15:00:13 INFO mapreduce.Job: map 100% reduce 0%
19/11/11 15:00:15 INFO mapreduce.Job: Job job_1572957695341_0001 completed successfully
19/11/11 15:00:15 INFO mapreduce.Job: Counters: 42
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=136674
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
MAPRFS: Number of bytes read=59
MAPRFS: Number of bytes written=249
MAPRFS: Number of read operations=11
MAPRFS: Number of large read operations=0
MAPRFS: Number of write operations=39
Job Counters
Launched map tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=6111
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=6111
Total vcore-seconds taken by all map tasks=6111
Total megabyte-seconds taken by all map tasks=6257664
DISK_MILLIS_MAPS=3056
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=59
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=68
CPU time spent (ms)=1620
Physical memory (bytes) snapshot=246943744
Virtual memory (bytes) snapshot=3582681088
Total committed heap usage (bytes)=287309824
HBase Counters
BYTES_IN_REMOTE_RESULTS=0
BYTES_IN_RESULTS=93
MILLIS_BETWEEN_NEXTS=518
NOT_SERVING_REGION_EXCEPTION=0
NUM_SCANNER_RESTARTS=0
NUM_SCAN_RESULTS_STALE=0
REGIONS_SCANNED=1
REMOTE_RPC_CALLS=0
REMOTE_RPC_RETRIES=0
RPC_CALLS=3
RPC_RETRIES=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=249
The following table shows the tools
included in the hbase-server.jar:
Name1 | Class2 | Description |
---|---|---|
rowcounter | RowCounter | Count rows in HBase table |
CellCounter | CellCounter | Count cells in HBase table |
export | Export | Write table data to HPE Ezmeral Data Fabric file system |
import | Import | Import data written by Export |
importtsv | ImportTsv | Import data in TSV format |
completebulkload | LoadIncrementalHFiles | Complete a bulk data load |
copytable | CopyTable | Export a table from local cluster to peer cluster |
verifyrep | VerifyReplication | Compare the data from tables in two different clusters NOTE This function
does not work for incrementColumnValues cells since the timestamp is changed
after being appended to the log. |
WALPlayer | WALPlayer | Replay WAL files |
exportsnapshot | ExportSnapshot | Export the specific snapshot to a given file system |
1 Class is used for
hbase.org.apache.hadoop.hbase.mapreduce.<class>....
2
Name is used for hadoop jar
/opt/mapr/hbase/hbase-1.1.13/lib/hbase-server-1.1.13.0-mapr-1912.jar
<name>...