Optimizing HPE Ezmeral Data Fabric Database Tables Search by ID
Starting from the 1904 release (EEP 6.0.2, EEP 6.1.1, and EEP 6.2.0), search by ID is supported with Hive HPE Ezmeral Data Fabric Database JSON tables.
About this task
Property of Optimization
Prerequisites
hive.mapr.db.json.fetch.by.id.task.conversion
and the value has a boolean type and by default is set to true
, which means it is enabled.Procedure
To disable optimization, set
hive.mapr.db.json.fetch.by.id.task.conversion
to
false
.
Conditions for Optimization
Procedure
This optimizer is designed for queries such as:
SELECT *
FROM <mapr_db_json_table>
WHERE _id = <constant_string_value>;
or:SELECT *
FROM <mapr_db_json_table>
WHERE _id = <constant_string_value> AND (<condition_1>) AND (<condition_2>) ... AND (<condition_N>);
or:SELECT *
FROM <mapr_db_json_table>
WHERE <Constant false operator>
where _id
is
a key column of HPE Ezmeral Data Fabric Database JSON table. It
provides usage of the findById()
method of the HPE Ezmeral Data Fabric Database JSON table. The following functionality is
not supported: - joins
- group by
- distinct
- lateral view
- subquery
- create table as select (CTAS) or insert
- analyze
- single source
SELECT * FROM t WHERE (CASE WHEN _id = 'value_a' THEN 2 ELSE 4 END) > 3;
Using Optimization
Procedure
-
Consider the following HPE Ezmeral Data Fabric Database JSON
table:
CREATE TABLE t(doc_id string, col1 string, col2 string) STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' TBLPROPERTIES("maprdb.table.name" = "/user/mapr/db_json_table","maprdb.column.id" = "doc_id");
-
Run the
EXPLAIN
command:EXPLAIN SELECT col1 FROM t WHERE doc_id='id_004';
-
The following output is produced:
An important part of a query plan is that it shows if optimization is available for the query:STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 MapR DB JSON Fetch By Id Operator limit: -1 Processor Tree: TableScan alias: t_small filterExpr: (doc_id = 'id_004') (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Filter Operator predicate: (doc_id = 'id_004') (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: col1 (type: string) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE ListSink
STAGE PLANS: Stage: Stage-0 MapR DB JSON Fetch By Id Operator