Drill Table Function
Explains Apache Drill’s table functions (table-valued functions - TVFs).
Apache Drill supports table functions, also known as table-valued functions (TVFs). Functions return a table. You can use functions anywhere that you can use a table. Table functions behave similarly to views, but can take parameters.
Syntax
TABLE()
construct. This wrapper signals that the function returns a table rather than a scalar
value:table(table_function_name())This construct is case-insensitive.
Drill table functions
- Schema table function
- Format plugin table function
These functions are called using the target table name, but they define different parameters.
=>
syntax:table_name(parameter_name => some_value)Schema table function
A schema table function allows you to define a schema for an individual query. You can specify a path to the schema or provide the schema directly in the query. This is useful when the table’s schema, which normally resides in the table’s root folder, is located elsewhere, or when you want to reference an alternate schema or define the schema inline.
The schema table function works with tables from any data storage, but schema provisioning is supported only for text files.
Starting with DEP version 10.0.0, schema provisioning is also supported for JSON MapR-DB tables. However, the schema is used only as a hint to resolve ambiguous column types, such as columns that contain only null values. Schema provisioning for JSON MapRDB tables is not a replacement for the CAST function. If you have a column stored as STRING and you want it to be INT – use the CAST function: Data Type Conversion - Apache Drill.
The schema table function has one parameter: schema.
For more details about the schema provisioning feature and the schema table function syntax, see Schema provisioning using a table function.
schema- accepts a string value of two kinds:pathandinline.
path value is used to specify a path to the table
schema:table_name(schema => 'path=`/path/to/the/schema`')SELECT *
FROM TABLE(dfs.tmp.`myTable.cvs`(schema => 'path=`/tmp/schemas/myTableSchema`'))inline value is used to provide a schema directly in the
query:table_name(schema => 'inline=(column_name data_type [nullability] [format] [default] [properties {prop='val', ...})]')SELECT *
FROM TABLE(dfs.tmp.`text_table`(
schema => 'inline=(col1 date properties {`drill.format` = `yyyy-MM-dd`})
properties {`drill.strict` = `false`}'))Format plugin table function
The format plugin table function allows you to override the format options from the file system storage plugin or replace the table’s detected format with your own.
The format specified here takes precedence over the format automatically detected by Drill. If the defined format matches the format detected by Drill, the storage format options from the storage plugin are used as defaults and overridden by any parameters specified in the table function.
-
type- required.The file system table format. Valid values are the names of Drill format plugins. The values are case-sensitive and include:delta,spss,logregex,syslog,parquet,sequencefile,pcap,xml,iceberg,json,ltsv,text,maprdb,avro,image,sas,pcapng,streams,httpd,excel,pdf,shp,hdf5,msaccess. - dynamic parameters - optional. Parameters that correspond to the Drill format
plugin configuration variables, excluding
extensions(which do not apply in this context). schema- optional. The value corresponds to the value for the same option in the schema table function.
extractHeader option for a csv
file:SELECT *
FROM TABLE(dfs.tmp.`users.csv`(type => 'text', extractHeader => true))fieldDelimiter, extractHeader, and
schema for a csv file:SELECT *
FROM TABLE(dfs.tmp.`cars.csvh-test`(
type => 'text',
fieldDelimiter => ',',
extractHeader => true,
schema => 'inline=(col1 date)'))allTextMode for a json file:SELECT *
FROM TABLE(dfs.tmp.`students`(type => 'json', allTextMode => true))| Parameter name | Default value | Description |
|---|---|---|
lineDelimiter |
\n |
A string used as a line delimiter in the text file. |
fieldDelimiter |
, |
A character used as a field delimiter in the text file. |
quote |
" |
Defines the character used to enclose text fields that may contain delimiters,
line breaks, or other special characters. For example, when |
escape |
" |
Specifies the character used to escape a quotation mark inside a quoted field.
For example, with |
comment |
# |
Identifies the character that marks the beginning of a comment line. Any line
starting with this character is ignored during parsing. For example, if
|
skipFirstLine |
false |
A Boolean value that indicates whether the first line in the text file should be omitted. |
extractHeader |
false |
A Boolean value that indicates whether the first valid record parsed from the text file should be considered as the row containing the names of each column. |
| Parameter name | Default value | Description |
|---|---|---|
allTextMode |
The value defined in store.json.all_text_mode. Default value
is false. |
A Boolean option that forces all fields in JSON files to be read as
|
readNumbersAsDouble |
The value defined in
|
A Boolean value. If |
skipMalformedJSONRecords |
The value defined in
|
A Boolean value that enables the JSON reader to skip malformed or invalid records
in JSON files. When set to |
escapeAnyChar |
The value defined in
|
A Boolean value that enables the JSON record reader to escape any character when processing JSON data. |
nanInf |
The value defined in
|
A Boolean value that enables the JSON record reader to interpret
|
| Parameter name | Default value | Description |
|---|---|---|
allTextMode |
false |
A Boolean option that forces all fields in the MapR-DB table to be read as
|
enablePushdown |
true |
Enables query pushdown to the MapR-DB engine. When set to |
ignoreSchemaChange |
false |
Allows queries to continue even when schema changes are detected in MapR-DB tables. When enabled, Drill ignores schema inconsistencies instead of failing the query. |
|
|
false |
A Boolean value. If set to true, all numeric values are read as
|
|
|
false |
Disables the optimization that uses table metadata to compute
|
|
|
false |
When set to |
|
|
true |
Enables support for queries that reference fields not present in all documents.
When set to |
index |
- |
A string value that specifies a hint for the secondary index table to use when executing queries on MapR-DB. If not set, Drill automatically evaluates available indexes, computes their costs, and selects the most efficient one. |