Examples of Designing Secondary Indexes
These examples illustrate the concepts behind designing your secondary indexes. Although the examples focus on query patterns and do not account for sizing, storage, and updates, you should always weigh the benefits of indexes against these other requirements.
Assume that you have a HPE Ezmeral Data Fabric Database JSON table with the following customer data:
{ "_id": "10000", "FullName": { "LastName": "Smith", "FirstName": "John" }, "Address": { "Street": "123 SE 22nd St.", "City": "Oakland", "State": "CA", "Zipcode": "94601-1001" }, "Gender": "M", "AccountBalance": 999.99, "Email": "john.smith@company.com", "Phones": [ {"Type": "Home", "Number": "555-555-1234"}, {"Type": "Mobile", "Number": "555-555-5678"}, {"Type": "Work", "Number": "555-555-9012"} ], "Hobbies": ["Baseball", "Cooking", "Reading"], "DateOfBirth": "10/1/1985" }
The following table contains fields in the document that are candidates for indexing based on the sample queries:
Query # | Query | Candidate Fields for Indexing |
---|---|---|
1 |
Find all customers who were born in the 1970s. |
|
2 |
Find all customers who have an account balance greater than $10K. Order the information in descending order of balance. |
|
3 |
List customers who live in California, ordering the list by
|
|
4 |
Find the ids and emails of customers who live in a specific zip code. |
|
5 |
Find customers who live in a specific set of states and have an account balance less than a specific value. |
|
6 |
Find male customers having a last name starting with the letter "S." |
|
7 | Find all customers who have "Reading" as a hobby. |
|
8 | Find all customers who have a mobile phone number with a prefix of "650". |
|
Index | Rationale |
---|---|
Simple index on DateOfBirth |
Optimizes the range condition on You need not create a hashed index, because it is unlikely that the order of
|
Simple index on AccountBalance , specified as a
descending key |
|
Composite index on:
|
|
Simple index with:
|
|
Composite index on:
|
|
Simple index on Hobbies[] |
Optimizes the equality condition on array elements of Hobbies
in Query 7:
{"$eq":{"Hobbies[]":"Reading"}} |
Composite index on:
|
Optimizes the following two conditions in Query 8:
|
Example with Multiple Container Field Paths
The following example references documents that store the high and low temperatures for each day in a week. They use an array to store the data, where each element in the array corresponds to a day of the week. For each day of the week, there is a two-element array of nested documents. The nested documents indicate whether the temperature corresponds to the high or low for that day. Typically, the outermost array has seven elements, one for each day of the week. But in cases where data is unavailable, the document has only the available days.
{ "_id": "001", "temps": [{"hiLo": [{"type": "hi", "temp": 61}, {"type": "lo", "temp": 49}], "dow": "Sun"}, {"hiLo": [{"type": "hi", "temp": 74}, {"type": "lo", "temp": 51}], "dow": "Mon"}, {"hiLo": [{"type": "hi", "temp": 75}, {"type": "lo", "temp": 51}], "dow": "Tue"}, {"hiLo": [{"type": "hi", "temp": 74}, {"type": "lo", "temp": 52}], "dow": "Wed"}, {"hiLo": [{"type": "hi", "temp": 78}, {"type": "lo", "temp": 54}], "dow": "Thu"}, {"hiLo": [{"type": "hi", "temp": 75}, {"type": "lo", "temp": 53}], "dow": "Fri"}, {"hiLo": [{"type": "hi", "temp": 75}, {"type": "lo", "temp": 54}], "dow": "Sat"}], "weekOf": "4/29/2018" } { "_id": "002", "temps": {"hiLo": [{"type": "hi","temp": 81}, {"type": "lo", "temp": 60}], "dow":"Sat"}, "weekOf": "5/12/2018" } { "_id": "003", "temps": [{"hiLo": [{"type": "hi", "temp": 80}, {"type": "lo", "temp": 55}], "dow": "Sun"}, {"hiLo": [{"type": "hi", "temp": 78}, {"type": "lo", "temp": 54}], "dow": "Mon"}, {"hiLo": [{"type": "hi", "temp": 79}, {"type": "lo", "temp": 54}], "dow": "Tue"}, {"hiLo": [{"type": "hi", "temp": 77}, {"type": "lo", "temp": 53}], "dow": "Wed"}, {"hiLo": [{"type": "hi", "temp": 79}, {"type": "lo", "temp": 54}], "dow": "Thu"}, {"hiLo": [{"type": "hi", "temp": 77}, {"type": "lo", "temp": 54}], "dow": "Fri"}, {"hiLo": [{"type": "hi", "temp": 78}, {"type": "lo", "temp": 54}], "dow": "Sat"}], "weekOf": "5/13/2018" }
Suppose you frequently run the following queries:
Query | Description | Documents Returned |
---|---|---|
|
Find weeks where any day has either a high or low temperature of 60. | 002 |
|
Find weeks and the high/low temperatures for all days on those weeks where any day of the week has a high temperature of at least 80. | 002, 003 |
To optimize these queries, you should define an index with the following fields:
- Indexed fields:
temps[].hiLo[].temp
,temps[].hiLo[].type
- Included fields:
weekOf
,temps[].hiLo
By defining the composite index with temps[].hiLo[].temp
as the first
indexed field, the index can optimize both queries.
By adding weekOf
as an included field, the index is a covering index for
the first query. By adding temps[].hiLo
, the index becomes a covering index
for the second query as well. Note that you must add this included field even though the
sub-fields are also indexed fields. This is due to how indexes with container field paths
store data. For more details, see Covering Indexes and Container Field Paths.