Non-Hashed and Hashed Indexes
You can create non-hashed and hashed indexes for queries on JSON tables in HPE Ezmeral Data Fabric Database.
Non-hashed indexes support conditional queries with an ORDER BY clause because HPE Ezmeral Data Fabric Database sorts the data in non-hashed indexes. When processing ORDER BY queries, Drill does not have to perform sort operations on the data.
Hashed indexes support the same conditional queries as non-hashed indexes, but they do not have a guaranteed sort order. Hashed indexes enable HPE Ezmeral Data Fabric Database to evenly distribute new writes on an index across logical partitions to avoid hot spotting. Drill must perform a sort for ORDER BY queries that use hashed indexes. Sorting the data can increase the CPU costs and negatively impact performance. See Hashed Indexes for additional information.
If you notice performance issues with ORDER BY queries that use hashed indexes, review the query plans to see if the plans include sort and merge operations. If this is the case, create non-hashed indexes to support the queries and achieve the best performance.
Examples of Hashed and Non-Hashed Index Plans for an ORDER BY Query
SELECT L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75 ORDER BY L_LINENUMBER;
Hashed Index Plan Example
maprcli table index add -path /drill/testdata/tpch/sf1/maprdb/json/hash/lineitem -index l_hash_comp_1 -indexedfields L_LINENUMBER,L_ORDERKEY -includedfields L_LINESTATUS,L_QUANTITY -hashed true
EXPLAIN PLAN FOR SELECT L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75 ORDER BY L_LINENUMBER;
00-00 Screen
00-01 Project(L_LINESTATUS=[$0], L_QUANTITY=[$1])
00-02 SingleMergeExchange(sort0=[2])
01-01 SelectionVectorRemover
01-02 Sort(sort0=[$2], dir0=[ASC])
01-03 Project(L_LINESTATUS=[$2], L_QUANTITY=[$3], L_LINENUMBER=[$0])
01-04 Scan(table=[[si, tpch_sf1_maprdb_hash, lineitem]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///drill/testdata/tpch/sf1/maprdb/json/hash/lineitem, condition=(((L_LINENUMBER = {"$numberLong":1}) and (L_ORDERKEY >= {"$numberLong":40})) and (L_ORDERKEY <= {"$numberLong":75})), indexName=l_hash_comp_1], columns=[`L_LINENUMBER`, `L_ORDERKEY`, `L_LINESTATUS`, `L_QUANTITY`]]])
Reading the query plan, you can see that Drill uses the hashed index in the plan, as indicated by indexName=l_hash_comp_1. To process the query, HPE Ezmeral Data Fabric Database can use the index, but Drill must sort and merge the data, as indicated by the Sort and SingleMergeExchange operations in the query plan.
Using the hashed index plan for this ORDER BY query requires additional processing and negatively impacts performance.
Non-Hashed Index Plan Example
maprcli table index
add
command on a table, lineitem, as
shown:maprcli table index add -path /drill/testdata/tpch/sf1/maprdb/json/range/lineitem -index l_comp_1 -indexedfields L_LINENUMBER,L_ORDERKEY -includedfields L_LINESTATUS,L_QUANTITY
EXPLAIN PLAN FOR SELECT L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75 ORDER BY L_LINENUMBER;
00-00 Screen
00-01 Project(L_LINESTATUS=[$0], L_QUANTITY=[$1])
00-02 Project(L_LINESTATUS=[$2], L_QUANTITY=[$3], L_LINENUMBER=[$0])
00-03 Scan(table=[[si, tpch_sf1_maprdb_range, lineitem]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///drill/testdata/tpch/sf1/maprdb/json/range/lineitem, condition=(((L_LINENUMBER = {"$numberLong":1}) and (L_ORDERKEY >= {"$numberLong":40})) and (L_ORDERKEY <= {"$numberLong":75})), indexName=l_comp_1], columns=[`L_LINENUMBER`, `L_ORDERKEY`, `L_LINESTATUS`, `L_QUANTITY`]]])
Reading the query plan, you can see that Drill uses the non-hashed index plan, as indicated by indexName=l_comp_1. To process the query, HPE Ezmeral Data Fabric Database uses the index and Drill does not have to perform sort and merge operations on the data, as indicated by the absence of the Sort and SingleMergeExchange operations in the query plan. HPE Ezmeral Data Fabric Database sorted the data in the index when the index was created.