Covering and Non-Covering Queries
Drill uses a cost-based approach to determine an optimal query plan. When queries are eligible for index planning, the queries are either covering or non-covering.
For covering queries, only the index is needed to process the query. Drill creates an index-based query plan that includes an index scan. Covering queries avoid the overhead of fetching data from the primary table.
planner.index.force_sort_noncovering
option for Drill to return the
results of a non-covering query in sorted order. See Index Planning and Execution
Configuration OptionsIndexes for covering and non-covering queries can contain indexed fields, or a combination of indexed and included fields. HPE Ezmeral Data Fabric Database stores included fields in the index. Each field added to the index increases the storage requirement for the index. As the storage size increases, the cost of reading the index also increases. Likewise, for the cost of adding and updating documents. Consider the impact on storage and updates when adding included fields to an index.
- For information about how Drill selects a query plan, see Selection and Execution of Secondary Indexes.
- For information about the types of queries that qualify for index-based plans, see Queries that Benefit from Secondary Indexes.
- For index concepts, see Secondary Index Concepts.
Covering and Non-Covering Query Examples
A query can be covering or non-covering based on the fields referenced in the query and the fields on which an index is created and/or includes.
The following query examples use an index, l_comp_1, created on a table, lineitem.
maprcli table index add
command, as
shown:maprcli table index add -path /drill/testdata/tpch/sf1/maprdb/json/range/lineitem -index l_comp_1 -indexedfields L_LINENUMBER,L_ORDERKEY -includedfields L_LINESTATUS,L_QUANTITY
Covering Query Example
SELECT L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75;
Because the l_comp_1 index includes all fields referenced in the query, Drill creates a query plan that uses the index only.
Running the EXPLAIN PLAN FOR command with the query shows that Drill created a query plan that only uses the index to process the query:EXPLAIN PLAN FOR SELECT L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75;
00-00 Screen
00-01 Project(L_LINESTATUS=[$0], L_QUANTITY=[$1])
00-02 Scan(table=[[si, tpch_sf1_maprdb_range, lineitem]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///drill/testdata/tpch/sf1/maprdb/json/range/lineitem, condition=(((L_LINENUMBER = {"$numberLong":1}) and (L_ORDERKEY >= {"$numberLong":40})) and (L_ORDERKEY <= {"$numberLong":75})), indexName=l_comp_1], columns=[`L_LINESTATUS`, `L_QUANTITY`]]])
Reading the query plan, you can see that the plan includes an index scan, as indicated by groupscan=[JsonTableGroupScan and indexName. Drill and HPE Ezmeral Data Fabric Database can process this query using only the index.
Non-Covering Query Example
SELECT L_RETURNFLAG, L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75;
Because the l_comp_1 index does not include the L_RETURNFLAG field, Drill creates a query plan that uses the index, but also includes a join on the primary table.
EXPLAIN PLAN FOR SELECT L_RETURNFLAG, L_LINESTATUS, L_QUANTITY FROM lineitem WHERE L_LINENUMBER = 1 AND L_ORDERKEY BETWEEN 40 AND 75;
00-00 Screen
00-01 Project(L_RETURNFLAG=[$0], L_LINESTATUS=[$1], L_QUANTITY=[$2])
00-02 Project(L_RETURNFLAG=[$2], L_LINESTATUS=[$3], L_QUANTITY=[$4])
00-03 Project(L_LINENUMBER=[$0], L_ORDERKEY=[$1], L_RETURNFLAG=[$2], L_LINESTATUS=[$3], L_QUANTITY=[$4])
00-04 RowKeyJoin(condition=[=($5, $6)], joinType=[inner])
00-06 Scan(table=[[si, tpch_sf1_maprdb_range, lineitem]], groupscan=[RestrictedJsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///drill/testdata/tpch/sf1/maprdb/json/range/lineitem, condition=(((L_LINENUMBER = {"$numberLong":1}) and (L_ORDERKEY >= {"$numberLong":40})) and (L_ORDERKEY <= {"$numberLong":75}))], columns=[`L_LINENUMBER`, `L_ORDERKEY`, `L_RETURNFLAG`, `L_LINESTATUS`, `L_QUANTITY`, `_id`], rowcount=60012.15000000001]])
00-05 Scan(table=[[si, tpch_sf1_maprdb_range, lineitem]], groupscan=[JsonTableGroupScan [ScanSpec=JsonScanSpec [tableName=maprfs:///drill/testdata/tpch/sf1/maprdb/json/range/lineitem, condition=(((L_LINENUMBER = {"$numberLong":1}) and (L_ORDERKEY >= {"$numberLong":40})) and (L_ORDERKEY <= {"$numberLong":75})), indexName=l_comp_1], columns=[`_id`]]])
Reading the query plan, you can see that the plan includes an index scan, as indicated by the groupscan=[JsonTableGroupScan and indexName, and also a scan on the primary table, as indicated by the groupscan=[RestrictedJsonTableGroupScan and the RowKeyJoin. To process this query, Drill and HPE Ezmeral Data Fabric Database can use the index, but HPE Ezmeral Data Fabric Database must also use the rowkey to perform a join on the primary table to fetch data in the L_RETURNFLAG field.
If this query ran on a regular basis, you could remove the l_comp_1 index and create a new index that includes all fields referenced in the query, including the L_RETURNFLAG field, to improve query performance. However, running a query only once or a few times may not justify the overhead of removing the old index and creating a new index.