Connecting Using Hive HPE Ezmeral Data Fabric Database JSON Connector
This section describes the Hive connector for HPE Ezmeral Data Fabric Database JSON table.
About this task
The Hive connector supports the creation of HPE Ezmeral Data Fabric Database based Hive tables. You can create a JSON table on HPE Ezmeral Data Fabric Database and load CSV data and/or JSON files to HPE Ezmeral Data Fabric Database using the connector. HPE Ezmeral Data Fabric Database based Hive tables can be:
- Queried just like file system based Hive tables.
- Combined with file system based Hive tables in joins and sub-queries.
The following table lists the Hive data type and the corresponding (supported) HPE Ezmeral Data Fabric Database OJAI type:
Hive Type | HPE Ezmeral Data Fabric Database OJAI Type |
---|---|
BOOLEAN | BOOLEAN |
BINARY | BINARY |
TINYINT | BYTE |
DATE | DATE |
DOUBLE | DOUBLE |
FLOAT | FLOAT |
INT | INT |
BIGINT | LONG |
SMALLINT | SHORT |
STRING | STRING |
TIMESTAMP | TIMESTAMP |
- map
- array
- struct
Creating a HPE Ezmeral Data Fabric Database JSON Table and Hive Table Using Hive
Procedure
CREATE TABLE primitive_types (
id string,
bo boolean,
d double,
da date,
f double,
i int,
s string,
ts timestamp)
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
TBLPROPERTIES("maprdb.table.name" = "/tbl","maprdb.column.id" = "id");
Here:- The
maprdb.table.name
,maprdb.column.id
andSTORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
are mandatory properties. - The value for
maprdb.column.id
column should be of type string or binary.
To create a Hive table that exists on HPE Ezmeral Data Fabric Database, specify EXTERNAL
in
the table DDL. If the table created is EXTERNAL, when the table is
dropped, only its metadata is deleted; the underlying HPE Ezmeral Data Fabric Database data remains intact. On the other
hand, if the table is not EXTERNAL, dropping the table deletes both the
metadata associated with the table and the underlying HPE Ezmeral Data Fabric Database data.
For example, suppose a
JSON table named /apps/my_users
with the following
values:
{"_id":"001","first_name":"John","last_name":"Doe","age":34}
{"_id":"002","first_name":"Jack","last_name":"Smith","age":26}
To create a Hive table over existing HPE Ezmeral Data Fabric Database
JSON table:
CREATE EXTERNAL TABLE primitive_types (
user_id string,
first_name string,
last_name string,
age int)
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
TBLPROPERTIES("maprdb.table.name" = "/apps/my_users","maprdb.column.id" = "user_id");
Now,
because table primitive_types
points to HPE Ezmeral Data Fabric Database table, you can perform ETL query similar
to file system based Hive
tables:SELECT COUNT(*) FROM test_external;
SELECT MAX(age) AS label FROM test_external;
...
Loading CSV Data to HPE Ezmeral Data Fabric Database JSON Table
Procedure
-
Create intermediate table.
For example:
CREATE TABLE stage(id STRING, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
-
Load data to table.
For example:
LOAD DATA INPATH '/data' into table stage;
-
Create HPE Ezmeral Data Fabric Database table in Hive.
For example:
CREATE TABLE users(id STRING, name STRING, age INT) STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' TBLPROPERTIES("maprdb.table.name" = "/users","maprdb.column.id" = "id");
-
Insert data through stage table.
For example:
INSERT INTO TABLE users select id, name, age from stage;
Loading JSON Files to HPE Ezmeral Data Fabric Database JSON Table
Procedure
-
Add SerDe JAR for JSON.
For example:
add jar /opt/mapr/hive/hive-<version>/hcatalog/share/hcatalog/hive-hcatalog-core-<version>-mapr.jar
-
Create intermediate table.
For example:
CREATE EXTERNAL TABLE stage(id string, name string, age int) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS TEXTFILE;
-
Load data in stage table.
For example:
LOAD DATA INPATH '/data' into table stage;
NOTEIf there is a key in the JSON file that starts with "_" (for example, "_id"), then treat the names as literals upon creating the schema and query using the same literal syntax. For example, specify`_id` string
without any special serde properties. Then in the query, useselect `_id` from sometable;
. Alternatively, you can use 'org.openx.data.jsonserde.JsonSerDe' and add WITH SERDEPROPERTIES ("mapping.id" = "_id" ) to your table definition. -
Create HPE Ezmeral Data Fabric Database table in Hive.
For example:
CREATE TABLE users(id STRING, name STRING, age INT) STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler' TBLPROPERTIES("maprdb.table.name" = "/users","maprdb.column.id" = "id");
-
Insert data through stage table.
For example:
If there is a key in your JSON file that starts with "INSERT INTO TABLE users select id, name, age from stage;
_
" (for example, "_id
"), treat the names as literals upon creating the schema and also query using the same literal syntax. In the above example, it would look like`_id` string
without any special serde properties for it. Then, use again in query as shown below:
Alternatively, useselect `_id` from sometable;
org.openx.data.jsonserde.JsonSerDe
and addWITH SERDEPROPERTIES ("mapping.id" = "_id" )
to your table definition.