Index Planning in Drill
Index planning reduces the I/O operation costs associated with full table scans. If an index is available, Drill can use the index to improve query performance.
SELECT L_LINENUMBER FROM lineitem ORDER BY L_LINENUMBER;
- GROUP BY queries, as shown in the following example where L_COMMITDate is an
indexed field in the index selected for the query plan:
SELECT L_COMMITDate FROM lineitem GROUP BY L_COMMITDate;
- JOIN queries, as shown in the following example where L_ORDERKEY and O_ORDERKEY
are indexed fields and L_LINESTATUS is an included field in the index selected for the
query
plan:
SELECT L.L_LINESTATUS FROM lineitem L, orders O WHERE L.L_ORDERKEY=O.O_ORDERKEY;
NOTEIf the planner picks two indexes, one for lineitem and one for orders, a sort merge join is used instead of a hash join. - Queries with DISTINCT projections, as shown in the following examples where
L_LINENUMBER is an indexed field in the index selected for the query
plan:
SELECT DISTINCT L_LINENUMBER FROM lineitem; SELECT COUNT(DISTINCT L_LINENUMBER) FROM lineitem;
Drill can use indexes for queries that GROUP BY or ORDER BY the leading fields in an index. Drill does not use indexes for queries that GROUP BY or ORDER BY the trailing or included fields in an index.
When a query contains GROUP BY and ORDER BY operations on the leading indexed column, Drill can use the sort order of the index to create index-based query plans that use streaming aggregates and merge joins to improve query performance.
You can run the EXPLAIN PLAN FOR command with a query to see the query plan that Drill creates. See Covering and Non-Covering Queries for more information about index planning in Drill.