Set Up Compression with HBase
Using compression with HBase reduces the number of bytes transmitted over the network and stored on disk. These benefits often outweigh the performance cost of compressing the data on every write and uncompressing it on every read.
GZip Compression
GZip compression is included with most Linux distributions and works natively with HBase.
To use GZip compression, specify it in the per-column family compression flag while creating
tables in HBase shell. For example:
create 'mytable', {NAME=>'colfam', COMPRESSION=>'gz'}
LZ4 Compression
The LZ4 algorithm gives a slightly worse compression ratio than the LZO algorithm – which
in turn is worse than algorithms like DEFLATE. However, compression speeds are similar to
LZO and several times faster than DEFLATE, while decompression speeds can be significantly
higher than LZO. Here is an example of configuring LZ4
compression:
create 'mytable1', {NAME=>'colfam', COMPRESSION=>'lz4'}
Snappy Compression
The Snappy compression algorithm is optimized for speed over compression. Snappy compression is included in the core Data Fabric installation, and no additional configuration is required.