Optimizing HPE Data Fabric Database Tables Search by ID

Starting from the 1904 release (EEP 6.0.2, EEP 6.1.1, and EEP 6.2.0), search by ID is supported with Hive HPE Data Fabric Database JSON tables.

About this task

Property of Optimization

Prerequisites

The property name is hive.mapr.db.json.fetch.by.id.task.conversion and the value has a boolean type and by default is set to true, which means it is enabled.

Procedure

To disable optimization, set hive.mapr.db.json.fetch.by.id.task.conversion to false.

Conditions for Optimization

Procedure

This optimizer is designed for queries such as:

SELECT *
FROM <mapr_db_json_table>
WHERE _id = <constant_string_value>;

or:

SELECT *
FROM <mapr_db_json_table>
WHERE _id = <constant_string_value> AND (<condition_1>) AND (<condition_2>) ... AND (<condition_N>);

or:

SELECT *
FROM <mapr_db_json_table>
WHERE <Constant false operator>

where _id is a key column of HPE Data Fabric Database JSON table. It provides usage of the findById() method of the HPE Data Fabric Database JSON table. The following functionality is not supported:

joins
group by
distinct
lateral view
subquery
create table as select (CTAS) or insert
analyze
single source

The predicate is not actually a part of the filter, so it is ignored by push down:

SELECT * FROM t WHERE (CASE WHEN _id = 'value_a' THEN 2 ELSE 4 END) > 3;

Using Optimization

Procedure

Consider the following HPE Data Fabric Database JSON table:

CREATE TABLE t(doc_id string, col1 string, col2 string) STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' TBLPROPERTIES("maprdb.table.name" = "/user/mapr/db_json_table","maprdb.column.id" = "doc_id");

Run the EXPLAIN command:

EXPLAIN SELECT col1 FROM t WHERE doc_id='id_004';

The following output is produced:

STAGE DEPENDENCIES:
 Stage-0 is a root stage
                        
STAGE PLANS:
 Stage: Stage-0
  MapR DB JSON Fetch By Id Operator
  limit: -1
  Processor Tree:
  TableScan
  alias: t_small
  filterExpr: (doc_id = 'id_004') (type: boolean)
  Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
  Filter Operator
   predicate: (doc_id = 'id_004') (type: boolean)
   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
   Select Operator
   expressions: col1 (type: string)
   outputColumnNames: _col0
   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
   ListSink

An important part of a query plan is that it shows if optimization is available for the query:

STAGE PLANS:
Stage: Stage-0
MapR DB JSON Fetch By Id Operator

HPE Data Fabric 8.0.0 Software Documentation
Abstract	This site contains documentation for HPE Data Fabric Software version 8.0.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	July 2026
Edition	8.0.0
Topic last updated	2021-09-13