Designing Secondary Indexes
It is important that you create secondary indexes that provide the most benefit to your HPE Ezmeral Data Fabric Database JSON queries. This topic describes a general design approach that includes identifying query patterns, using common query patterns involving filters and ordering to determine which indexes to create, weighing the benefits of indexes against their update and storage costs, and taking into consideration index limitations.
The following diagram summarizes the general approach to designing your indexes:
Before designing your secondary indexes in relationship to your queries, make sure you understand the index feature, how to set up and use indexing, the commands used to perform tasks, and how to query the data through your application. The following cover these topics:
Identify Query Patterns
Query patterns, such as queries with filter conditions and ORDER BY clauses, indicate where indexes can improve performance. If a query does not contain selective filters, the overhead of using an index may cost more than a full table scan. You should also define your indexes so a single index benefits either multiple queries or individual queries that you run most often.
See Selection and Execution of Secondary Indexes to understand how HPE Ezmeral Data Fabric Database chooses which secondary indexes to use and how they improve performance.
Determine Potential Indexes Based on Query Patterns
Identified Query Pattern | Potential Indexes to Create |
Compares individual fields with selective filter conditions |
Define single field indexes on the fields that you compare against. Verify that the fields contain supported data types. |
Filters against specific combinations of fields |
Define composite field indexes instead of single field indexes. Specify the sequence of the index keys so fields that appear in equality conditions are the prefixes in the keys. See Using Multiple Indexes to Optimize Query Conditions for additional guidance on defining composite vs single field indexes. |
Accesses a subset of fields in a document, but does not filter or sort on these fields | Add those fields as included fields in indexes. |
Filters on a subfield in a nested document | Define the index key on the subfield. |
Filters on subfields in nested documents that are array elements | Define the index key using a container field path: for example,
arrayField[].subField . |
Filters and projects using a container field path |
Define the container field path as both an indexed field and included field. See Covering Indexes and Container Field Paths for more details. |
Filters on individual elements of an array, which can appear in any position in the array | Define an index using a container field path: for example,
arrayField[] . |
Issues Drill SQL queries with filter conditions that contain CAST expressions | Specify the CAST function when defining the index key. |
Sorts on fields | Define the sequence and order direction of the index keys to match the sequence and order direction of the fields your query sorts. If the sort order of the index keys matches the insertion order of documents, define hashed indexes. |
Sorts on one set of fields and filters on another set using equality conditions |
Define a composite index so that fields using equality conditions are the prefixes in the index keys, followed by the sort fields. |
Evaluate Tradeoffs and Limitations
When you design your indexes, remember that HPE Ezmeral Data Fabric Database must synchronize each index when you insert and update documents in the corresponding JSON table. This impacts the throughput performance of inserts and updates because HPE Ezmeral Data Fabric Database must perform additional writes. The impact increases with each additional index.
HPE Ezmeral Data Fabric Database performs the synchronization operation asynchronously, which minimizes throughput overhead. The consequence is that an index may be inconsistent relative to its JSON table. If your application cannot tolerate lag time between the update to the JSON table and the update to the index, you should take that into consideration when deciding whether to index specific fields.
See Asynchronous Secondary Index Updates for more details about this feature.
Indexes increase your storage requirements. The storage size depends on the number of indexed and included fields in the index and the size of values stored in those fields. As the size of the index increases, the cost of reading the index also increases.
Consider the storage costs when creating indexes and deciding on the fields to add to the index.
When designing your indexes, make sure HPE Ezmeral Data Fabric Database indexes support the functionality you need. For example, it may not be possible to create an index on a particular field path.
See Restrictions on Secondary Indexes for a complete list.