Schema provisioning using a table function

Schema provisioning defines explicit schemas so that Apache Drill reliably interprets text/JSON/MapR‑DB JSON data.

You can define a schema for text files and JSON MapR-DB tables. The schema can be provided in two ways:

Placed in the root directory of the text table. For more information, see https://drill.apache.org/docs/create-or-replace-schema/.
Supplied to individual queries using a table function.

This topic describes the second approach, in which the schema is provided through a table function.

Use cases

In Apache Drill, users can query different storage and data formats. Some formats have a schema (parquet), some formats have an inferred schema (json), and for some formats, data is read as varchar only (csv). For the latter case, the user has to use casts to indicate which format is actually stored in the file, although there are some cases where casts cannot help and the query will fail. Even if casts do help, at some point the user might want to avoid repeating the same casts in every query (casting an HBase column to a Double, casting a CSV column to a double, etc). Providing the schema up front helps to resolve these issues.

Following are the major error cases of incorrect schema inference and schema change exception.

Incorrect schema inference

This example assumes that there is a json with two records:

{a: 10} {a: 10.1}

Apache Drill assumes that the column type is BigInt and fails with the following exception:

Error: INTERNAL_ERROR ERROR: You tried to write a Float8 type when you are using a ValueWriter of type NullableBigIntWriterImpl.

Because casts are not pushed down into the scan, adding casts also does not help and the same error is returned:

SELECT cast(a AS DOUBLE) FROM file.json

The same schema change issues will occur if the column has a mix of data types that do not belong to the same group, for example, integer, decimal, varchar. There is currently an option to help the user query such data by treating all data as varchar, but it is only applicable for json format.

Schema evolution

This example assumes that there are two json files, both of which have the same column but different numeric types.

myDir /
              |- File 1: {a: 10}
              |- File 2: {a: 10.1}

If a user queries either file individually, the query succeeds; however, when querying the files together, a SchemaChangeException may occur on blocking operators.

SELECT a, count() FROM myDir  GROUP BY a

UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type.

Previous schema BatchSchema [fields=[[`a` (BIGINT:OPTIONAL)], [`$f1` (INT:REQUIRED)]], selectionVector=NONE]

Incoming schema BatchSchema [fields=[[`a` (FLOAT8:OPTIONAL)], [`$f1` (INT:REQUIRED)]], selectionVector=NONE]

This can be resolved by adding casts in the query:

SELECT cast(a AS DOUBLE), count() FROM myDir  GROUP BY 1

Syntax

The schema can be provided using the Drill table function with the schema parameter. You can define the schema in a file and specify the path to that file in the table function.

For details on how to create a schema file, see https://drill.apache.org/docs/create-or-replace-schema/. Syntax:

table(<table_name>(schema => 'path=`<path to the schema>`'))

For example:

select * from table(dfs.tmp.`text_table`(schema => 'path=`/tmp/my_schema`'))

Or the schema can be defined inline, directly within the query:

<table_name>(schema => 'inline=(<column-name> <type> [NOT NULL] [default <value>] [properties {<col-prop>=<value>, ...}], ...) properties {<schema-prop>=<value>, ...}')

For example:

inline=(
    `user name` VARCHAR NOT NULL default `unknown`,
    `credit card number` VARCHAR properties {`drill.blank-as`=`null`, `drill.special`=`true`}
) properties {`drill.special`=`true`}

The general syntax for an inline schema is as follows:

inline=(<Column Definition>, ...) <Schema properties>

The inline schema is recognized based on the following ANLR grammar: https://github.com/apache/drill/blob/master/exec/vector/src/main/antlr4/org/apache/drill/exec/record/metadata/schema/parser/SchemaParser.g4

Column Definition

The column definiton follows this structure:

<column-name> <type> [NOT NULL] [default <value>] [properties {<key>=<value>, ...}]

Column Name required

The name of the column. It must be enclosed in backticks if it contains spaces or special characters. For example:

`user name`
email

Column Type required

Specifies the data type of the column (case-insensitive). Data types can be primitive and complex. Apache Drill supports most of the primitive data types. Parentheses should be used to provide type length, precision, and scale. Indicating the length for a varchar is optional.

Table 1. Simple types
SQL Data Type	Description	Example
UINT1	Unsigned 1-byte integer	255
UINT2	Unsigned 2-byte integer	65535
UINT4	Unsigned 4-byte integer	4294967295
UINT8	Unsigned 8-byte integer	18446744073709551615
TINYINT	1-byte signed integer	127
SMALLINT	2-byte signed integer	32767
INT INTEGER	4-byte signed integer in the range -2,147,483,648 to 2,147,483,647	2147483646
BIGINT	8-byte signed integer in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807	9223372036854775807
BINARY(p) VARBINARY(p)	Variable-length byte string p – precision	B@e6d9eb7
BOOLEAN	True or false	true
DATE	Years, months, and days in YYYY-MM-DD format since 4713 BC	2015-12-30
DEC(p,s) DECIMAL(p,s) NUMERIC(p,s)	38-digit precision number. p – precision s – scale	DECIMAL(6,2) is 1234.56, 4 digits before and 2 digits after the decimal point
FLOAT	4-byte floating point number	0.456
DOUBLE	8-byte floating point number, precision-scalable	0.456
INTERVAL	A day-time or year-month interval	‘1 10:20:30.123’ (day-time) or ‘1-2’ year to month (year-month)
SMALLINT	2-byte signed integer in the range -32,768 to 32,767	32000
TIME	24-hour based time before or after January 1, 2001 in hours, minutes, seconds format: HH:mm:ss	22:55:55.23
TIMESTAMP	JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds format: yyyy-MM-dd HH:mm:ss.SSS	2015-12-30 22:55:55.23
CHAR(length) CHARACTER(length) CHARACTER VARYING(length) VARCHAR(length)	UTF8-encoded variable-length string. The default limit is 1 character. The maximum character limit is 2,147,483,647.	CHAR(30) casts data to a 30-character string maximum.
LATE DYNAMIC	An internal data type that may appear in schemas but is not intended for direct use by end users.	-

When specifying complex types in a schema, the diamond (<>) syntax must be used.

Table 2. Complex types
SQL Data Type	Description	Example
ARRAY	An array is a repeated list of values. Each element can be a scalar type, such as a `VARCHAR` or `INTEGER`, or a complex type, such as a `MAP` or another `ARRAY`.	ARRAY<INT> ARRAY<VARCHAR> ARRAY<ARRAY<INT>> ARRAY<STRUCT<s1 INT not null, s2 INT>>
STRUCT	A set of named values, each with its own data type. It is similar to an object, where each field can contain a scalar or complex value. Each field within a `STRUCT` can hold a simple or a complex type.	STRUCT<s1 INT, s2 INT> STRUCT<s1 INT not null, s2 VARCHAR(10)> stSTRUCTruct<s1 INT, ARRAY<INT>> STRUCT<s1 INT not null, s2 STRUCT<ss1 INT, ss2 INT>>
UNION	An internal data type that may appear in schemas but is not intended for direct use by end users.	-
MAP	An internal data type that may appear in schemas but is not intended for direct use by end users.	-

Nullability optional

The keyword NOT NULL indicates that the column cannot contain null values. If this keyword is not specified, the column is considered nullable by default.

Column properties optional

There are two built-in column properties: FORMAT and DEFAULT, which are defined using the corresponding case-insensitive keywords. The value for each property must be enclosed in single quotes. Both keywords are optional; however, when used together, their declaration order is strict. The FORMAT keyword must appear before DEFAULT, and both must be specified after the column definition. Examples:

col DATE NOT NULL FORMAT `yyyy-MM-dd`
col DATE NOT NULL DEFAULT `2019-01-01`
col DATE NOT NULL FORMAT `yyyy-MM-dd` DEFAULT `2019-01-01`

Other properties can be indicated after PROPERTIES clause:

col DATE NOT NULL FORMAT `yyyy-MM-dd` DEFAULT `2019-01-01` PROPERTIES {`drill.blank-as` = `null`}

There is no limitation in the parser regarding which column types FORMAT, DEFAULT, and PROPERTIES can be defined. If the reader implementation is able to use any of them, they will be utilized; otherwise, they will be ignored.

Table 3. Column properties
Property name	Description	Example
FORMAT drill.format	Specifies the date/time format pattern (Joda Time syntax) used to convert string values to date/time types. Recommended for accurate parsing of date/time columns. If not set, Drill uses ISO date/time formats by default (e.g., `yyyy-MM-dd'T'HH:mm:ssZZ` for timestamps). Format can be indicated using two ways: via `FORMAT` keyword (before `DEFAULT` keyword) or in the column properties section.	start_date format `yyyy-MM-dd` start_date date properties {`drill.format` = `yyyy-MM-dd`}
DEFAULT drill.default	Specifies a custom default value for non-nullable (required) columns that are missing in the data. Drill uses this value instead of the natural default (e.g., 0 for numbers). Has no effect for nullable columns. Can be set using the `DEFAULT` keyword or in the column’s properties.	id int not null default '-1' id int not null properties {'drill.default'= '-1'}
drill.special	Indicates whether to project the column in a wildcard (*) query. Special columns may be excluded from projection. Certain "special" columns may be available only when explicitly requested. Default `false`	col VARCHAR PROPERTIES {`drill.special` = `true`}
drill.blank-as	Indicates how to handle blanks. The value should be either `null` or `0`. `0` – convert blanks for numeric fields to 0. For non-numeric fields, convert to null (for nullable) or the default value (for non-nullable). `null` – convert blanks to null values (if the column is nullable), or fill with the default value (non-nullable). This is not intended for direct use by end users.	-

Schema properties optional

Schema properties are defined in an inline schema using the PROPERTIES clause, which appears after the column definition list enclosed in parentheses. The syntax is identical to the Column properties syntax. For example:

(col VARCHAR PROPERTIES {`drill.blank-as` = `null`}) PROPERTIES {`drill.strict` = `true`}

Table 4. Schema properties
Property name	Description	Example
drill.strict	If true, Drill enforces a strict schema: any columns in the data but not defined in the schema are ignored, and queries for such columns return NULL. If `false`, Drill includes extra columns found in the data after the schema-defined columns. Default `false`	(col VARCHAR PROPERTIES {drill.blank-as = null}) PROPERTIES {drill.strict = true}

Schema provisioning for MapR-DB JSON tables

Starting with DEP10.0.0, Drill supports schema provisioning for MapR-DB JSON tables. Unlike text and standard JSON formats, the Drill MapR-DB JSON reader uses the provided schema only when a column’s type is ambiguous – for example, if all values in the first batch are missing or null. All other columns and schema properties are ignored.

HPE Data Fabric 8.0.0 Software Documentation
Abstract	This site contains documentation for HPE Data Fabric Software version 8.0.0 including installation, configuration, administration, and reference content, as well as content for the associated bundled ecosystem components and drivers.
Published	July 2026
Edition	8.0.0
Topic last updated	2026-03-24