Connecting Using Hive HPE Data Fabric Database JSON Connector
This section describes the Hive connector for HPE Data Fabric Database JSON table.
About this task
The Hive connector supports the creation of HPE Data Fabric Database based Hive tables. You can create a JSON table on HPE Data Fabric Database and load CSV data and/or JSON files to HPE Data Fabric Database using the connector. HPE Data Fabric Database based Hive tables can be:
- Queried just like file system based Hive tables.
- Combined with file system based Hive tables in joins and sub-queries.
The following table lists the Hive data type and the corresponding (supported) HPE Data Fabric Database OJAI type:
| Hive Type | HPE Data Fabric Database OJAI Type |
|---|---|
| BOOLEAN | BOOLEAN |
| BINARY | BINARY |
| TINYINT | BYTE |
| DATE | DATE |
| DOUBLE | DOUBLE |
| FLOAT | FLOAT |
| INT | INT |
| BIGINT | LONG |
| SMALLINT | SHORT |
| STRING | STRING |
| TIMESTAMP | TIMESTAMP |
- map
- array
- struct
Creating a HPE Data Fabric Database JSON Table and Hive Table Using Hive
Procedure
CREATE TABLE primitive_types (
id string,
bo boolean,
d double,
da date,
f double,
i int,
s string,
ts timestamp)
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
TBLPROPERTIES("maprdb.table.name" = "/tbl","maprdb.column.id" = "id"); Here:- The
maprdb.table.name,maprdb.column.idandSTORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'are mandatory properties. - The value for
maprdb.column.idcolumn should be of type string or binary.
To create a Hive table that exists on HPE Data Fabric Database, specify EXTERNAL in
the table DDL. If the table created is EXTERNAL, when the table is
dropped, only its metadata is deleted; the underlying HPE Data Fabric Database data remains intact. On the other
hand, if the table is not EXTERNAL, dropping the table deletes both the
metadata associated with the table and the underlying HPE Data Fabric Database data.
For example, suppose a
JSON table named /apps/my_users with the following
values:
{"_id":"001","first_name":"John","last_name":"Doe","age":34}
{"_id":"002","first_name":"Jack","last_name":"Smith","age":26}
To create a Hive table over existing HPE Data Fabric Database
JSON table:
CREATE EXTERNAL TABLE primitive_types (
user_id string,
first_name string,
last_name string,
age int)
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
TBLPROPERTIES("maprdb.table.name" = "/apps/my_users","maprdb.column.id" = "user_id"); Now,
because table primitive_types points to HPE Data Fabric Database table, you can perform ETL query similar
to file system based Hive
tables:SELECT COUNT(*) FROM test_external;
SELECT MAX(age) AS label FROM test_external;
...