Types of Secondary Indexes
HPE Ezmeral Data Fabric Database JSON supports several index types, including simple indexes, composite indexes, hashed indexes, and indexes with casting. This section describes the properties of these indexes and the situations where each provides value.
The following diagram illustrates the different properties of indexes and index fields. Lines connecting properties represent properties that can be used in combination with one another. Click on the text in the diagram for a description of each property.
Indexed vs Included Fields
An index consists of indexed and included fields. Indexed fields are also referred to as index keys. The following lists describe the characteristics of each type of field:
- Indexed Fields
-
- Determine the sort order of the index and the order of the query result when used
- Allow filter conditions and ORDER BY conditions defined on these fields to be optimized
- Included Fields (sometimes referred to as covered fields)
-
- Do not affect the sort order of the index or the order of the query result
- Can avoid the need to read the base table if all required fields are included in the index
In general, you should define indexed fields on fields you filter and order on, and included fields on fields you reference but do not filter and order.
The following example illustrates when you would define an indexed vs an included field in your index. Assume you have a HPE Ezmeral Data Fabric Database JSON table with the following sample data that contains customer information.
{ "_id": "10000", "FullName": { "LastName": "Smith", "FirstName": "John" }, "Address": { "Street": "123 SE 22nd St.", "City": "Oakland", "State": "CA", "Zipcode": "94601-1001" }, "Gender": "M", "AccountBalance": 999.99, "Email": "john.smith@company.com", "Phones": [ {"Type": "Home", "Number": "555-555-1234"}, {"Type": "Mobile", "Number": "555-555-5678"}, {"Type": "Work", "Number": "555-555-9012"} ], "Hobbies": ["Baseball", "Cooking", "Reading"], "DateOfBirth": "10/1/1985" }
Your query does the following:
- Filters on
Address.Zipcode
- Selects
FullName.FirstName
andFullName.LastName
Since your query filters on Address.Zipcode
, you should include that field
as an indexed field. Also, because this query only needs the fields
FullName.FirstName
and FullName.LastName
, you can set
FullName
as an included field. The result is that this query will only
need to read from the index and will not need to look at the original table. Other queries
that, for example, need to read the phone numbers or address would still need to go back to
the base table.
maprcli table index add -path /customerInfo -index zipCodeIdx \
-indexedfields Address.Zipcode \
-includedfields FullName
There are additional differences in how indexed and included fields behave. The following table summarizes these differences:
Indexed Field | Included Field |
---|---|
There are some restrictions in the data types of indexed fields. See Data Types and Secondary Index Fields for the complete list of types. | Data types of included fields can be any type. There is no data type restriction. |
The collective size of all indexed fields is a maximum of 32KB. | Included fields do not affect the size limit of an index. |
Adding indexed fields increases the cost of key comparisons when scanning the index, due to the increase in the index key size. | Adding included fields does not impact the index scan cost. |
Included fields influence whether an index is a covering index for a query. See Covering Indexes for more information about this concept.
Indexed Field Sort Order
AccountBalance
in descending order, create the following
index.maprcli table index add -path /customerInfo -index BalanceIdx \
-indexedfields AccountBalance:-1
Simple vs Composite Indexes
Simple indexes are indexes with a single indexed field (or key). Composite indexes have more than one key. In both cases, you can define zero or more included fields. See Simple Indexes and Composite Indexes for additional details.
Hashed vs Non-Hashed Indexes
By default, indexes are stored in sort order across the index key values. This can lead to hotspots if the sort order of the index keys match the order data that is inserted into the JSON table. For example, if the indexed field has monotonically increasing timestamp values, such as the date a document is created, the tail end of the index becomes a hotspot. Hashed indexes avoid hotspotting by evenly distributing index writes across a number of logical partitions.
idx
on table,
tab
, with a single key,
idxKeyCol
.maprcli table index add -path /tab -index idx -indexedfields idxKeyCol \
-hashed true
See Hashed Indexes for further details.
Casting
age
field to an INT type and the height
field to
a FLOAT
type.maprcli table index add -path /castTable -index castIdx \
-indexedfields '$CAST(age@INT)','$CAST(height@FLOAT)'
See Using Casts in Secondary Indexes for further details.