Loading Data into Binary Tables
Bulkload operations can be performed as a full bulkload or as an incremental bulkload.
The most common way of loading data into a HPE Ezmeral Data Fabric Database Binary Tables is with a put operation. However, at large scales, bulk loads offer a performance advantage over put operations.
Bulk loading is supported by the following tools, which can be used for both full and incremental bulkload operations:
-
Hbase HPE Ezmeral Data Fabric Database Binary CopyTable utility which copies HPE Ezmeral Data Fabric Database binary table data, table metadata, access control expressions, and more to another HPE Ezmeral Data Fabric Database binary table.
hbase com.mapr.fs.hbase.tools.mapreduce.CopyTable
-
Hbase
ImportFiles
utility which imports HFile or Result files into HPE Ezmeral Data Fabric Database binary tables. For example:hbase com.mapr.fs.hbase.tools.mapreduce.ImportFiles -Dmapred.reduce.tasks=2 -inputDir < input directory, for example: /test/tabler.kv > -table < table name, for example: /table2 > [ -format < Result|HFile > ] [ -sample < true|false > ] [ -mapOnly < true|false > ]
Full Bulk Loads
Full bulkload operations offer the best performance advantage because it skips the
write-ahead log (WAL) typical of HPE Ezmeral Data Fabric Database binary table operations. Full bulkload
operations can only be performed on empty tables that have the bulkload
attribute set to true. This value is set only when creating a table.
When you set the bulkload
attribute, you cannot enable replication on
the table. Since this effectively disables logging on the table, HPE Ezmeral Data Fabric Database also does not
capture log data that Elasticsearch can use to index the table.
To create a HPE Ezmeral Data Fabric Database binary table for bulkloading, use one of the following:
-
maprcli table create command with tthe
-bulkload
parameter set to true. - Apache HBase shell
create
command with theBULKLOAD
parameter set totrue
. For example:hbase> create '/a0','f1', BULKLOAD => 'true'
If you want to pre-split a table, separate theBULKLOAD
parameter from theSPLITS
parameter. For example:hbase> create '/t1', 'f1', {SPLITS => ['10', '20', '30']}, {BULKLOAD => 'true'}
-
Control System with Will table be bulkload? option set to Yes under table PROPERTIES.
- You cannot use the
maprcli table edit
command to set thebulkload
parameter to TRUE again. - You cannot use the Apache HBase shell
alter
command to set theBULKLOAD
parameter to TRUE again. - In the Control System, the Will table be bulkload? option cannot be modified after table creation.
Incremental Bulk Loads
Incremental bulk loads can add data to existing tables concurrently with other table operations, with better performance than put operations. This type of bulk load makes use of write-ahead log files.
You can use incremental bulk loads to ingest large amounts of data to an existing table. Tables remain available for standard client operations such as put, get, and scan while the bulk load is in process. A table can perform multiple incremental bulk load operations simultaneously.
maprcli table create
command,
with the hbase shell’s create
command, or in the Control System,
incremental loads are supported by default.