Apache HBase is the Hadoop database, a distributed, scalable, big data store. You can use Apache HBase when you need random, realtime read-write access to your Big Data. This section describes how to use HBase with the MapR Platform, but does not duplicate Apache documentation.

The goal of Apache HBase is to host very large tables – billions of rows with millions of columns – atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and Hadoop-compatible filesystems, such as the file system.

Installing Apache HBase on a MapR cluster involves storing all HBase components in a single volume mapped to directory /hbase in the cluster. Tables are stored in a flat namespace, not grouped logically with related files. Because all Apache HBase data resides in one volume, only one set of storage policies can be applied to the entire Apache HBase datastore. Mirrors and snapshots of the HBase volume do not provide functional replication of the datastore. Despite this limitation, mirrors can be used to back up HLogs and HFiles in order to provide a recovery point for Apache HBase data.

This section documents how to work with HBase on the MapR Converged Data Platform. You can refer also to documentation available from the Apache HBase project.

NOTE The HPE Ezmeral Data Fabric Database provides native storage for table data, compatible with the HBase API. For new applications, consider using HPE Ezmeral Data Fabric Database binary tables for increased performance, more versatile table operations, and easier cluster administration.