Types of Secondary Indexes

HPE Ezmeral Data Fabric Database JSON supports several index types, including simple indexes, composite indexes, hashed indexes, and indexes with casting. This section describes the properties of these indexes and the situations where each provides value.

The following diagram illustrates the different properties of indexes and index fields. Lines connecting properties represent properties that can be used in combination with one another. Click on the text in the diagram for a description of each property.

Simple vs Composite IndexesSimple vs Composite IndexesHashed vs Non-Hashed IndexesHashed vs Non-Hashed IndexesIndexed vs Included FieldsIndexed vs Included FieldsIndexed Field Sort OrderIndexed Field Sort OrderCastingCasting

Indexed vs Included Fields

An index consists of indexed and included fields. Indexed fields are also referred to as index keys. The following lists describe the characteristics of each type of field:

Indexed Fields
  • Determine the sort order of the index and the order of the query result when used
  • Allow filter conditions and ORDER BY conditions defined on these fields to be optimized
Included Fields (sometimes referred to as covered fields)
  • Do not affect the sort order of the index or the order of the query result
  • Can avoid the need to read the base table if all required fields are included in the index

In general, you should define indexed fields on fields you filter and order on, and included fields on fields you reference but do not filter and order.

The following example illustrates when you would define an indexed vs an included field in your index. Assume you have a HPE Ezmeral Data Fabric Database JSON table with the following sample data that contains customer information.

{
   "_id": "10000",
   "FullName": {
      "LastName": "Smith",
      "FirstName": "John"
   },
   "Address": {
      "Street": "123 SE 22nd St.",
      "City": "Oakland",
      "State": "CA",
      "Zipcode": "94601-1001"
   },
   "Gender": "M",
   "AccountBalance": 999.99,
   "Email": "john.smith@company.com",
   "Phones": [
        {"Type": "Home", "Number": "555-555-1234"},
        {"Type": "Mobile", "Number": "555-555-5678"},
        {"Type": "Work", "Number": "555-555-9012"}
   ],
   "Hobbies": ["Baseball", "Cooking", "Reading"],
   "DateOfBirth": "10/1/1985"
}

Your query does the following:

  1. Filters on Address.Zipcode
  2. Selects FullName.FirstName and FullName.LastName

Since your query filters on Address.Zipcode, you should include that field as an indexed field. Also, because this query only needs the fields FullName.FirstName and FullName.LastName, you can set FullName as an included field. The result is that this query will only need to read from the index and will not need to look at the original table. Other queries that, for example, need to read the phone numbers or address would still need to go back to the base table.

maprcli table index add -path /customerInfo -index zipCodeIdx \
   -indexedfields Address.Zipcode \
   -includedfields FullName

There are additional differences in how indexed and included fields behave. The following table summarizes these differences:

Indexed Field Included Field
There are some restrictions in the data types of indexed fields. See Data Types and Secondary Index Fields for the complete list of types. Data types of included fields can be any type. There is no data type restriction.
The collective size of all indexed fields is a maximum of 32KB. Included fields do not affect the size limit of an index.
Adding indexed fields increases the cost of key comparisons when scanning the index, due to the increase in the index key size. Adding included fields does not impact the index scan cost.

Included fields influence whether an index is a covering index for a query. See Covering Indexes for more information about this concept.

Indexed Field Sort Order

You can define each field in your index key to sort in either ascending or descending order. The default is ascending. Typically, you define the sort order to match the ORDER BY clause in your query. This allows the HPE Ezmeral Data Fabric Database to avoid performing an explicit sort. For example, if you issue queries where you return AccountBalance in descending order, create the following index.
maprcli table index add -path /customerInfo -index BalanceIdx \
   -indexedfields AccountBalance:-1

Simple vs Composite Indexes

Simple indexes are indexes with a single indexed field (or key). Composite indexes have more than one key. In both cases, you can define zero or more included fields. See Simple Indexes and Composite Indexes for additional details.

Hashed vs Non-Hashed Indexes

By default, indexes are stored in sort order across the index key values. This can lead to hotspots if the sort order of the index keys match the order data that is inserted into the JSON table. For example, if the indexed field has monotonically increasing timestamp values, such as the date a document is created, the tail end of the index becomes a hotspot. Hashed indexes avoid hotspotting by evenly distributing index writes across a number of logical partitions.

The following example creates a hashed index named idx on table, tab, with a single key, idxKeyCol.
maprcli table index add -path /tab -index idx -indexedfields idxKeyCol \
    -hashed true 

See Hashed Indexes for further details.

Casting

You can CAST individual indexed fields to a specific data type. This is applicable when Drill SQL queries contain CAST expressions. The following example creates an index that casts the age field to an INT type and the height field to a FLOAT type.
maprcli table index add -path /castTable -index castIdx \
   -indexedfields '$CAST(age@INT)','$CAST(height@FLOAT)'

See Using Casts in Secondary Indexes for further details.

NOTE
This feature only applies for queries issued through the Drill SQL interface. The OJAI API does not have CAST support.