Understanding the MERGE Statement
You can use the MERGE statement to perform record-level
INSERT and UPDATE operations efficiently within Hive
tables.
The MERGE statement can be a key tool for Data Fabriccluster data management. It is based on ANSI-standard
SQL.
MERGE statement:- Simple merge.maprdb.column.id is the join key
- Simple merge.maprdb.column.id is not the join key
- DELETE syntax in the MERGE statement
- Multiple source rows match a given target row (cardinality violation)
- Merge on mixed data types
- Merge into external HPE Data Fabric Database JSON tables
- Merge into partitioned HPE Data Fabric Database JSON tables
- Merge into temporary HPE Data Fabric Database JSON tables
Simple merge.maprdb.column.id is the join
key
| id | first_name | last_name | age |
|---|---|---|---|
| 001 | Dorothi | Hogward | 7777 |
| 002 | Alex | Bowee | 7777 |
| 088 | Robert | Dowson | 25 |
| id | first_name | last_name | age |
|---|---|---|---|
| 001 | John | Smith | 45 |
| 002 | Michael | Watson | 27 |
| 003 | Den | Brown | 33 |
MERGE into customer_db_json_target trg
USING customer_source src
ON src.id = trg.id
WHEN MATCHED THEN UPDATE SET age = src.age
WHEN NOT MATCHED THEN
INSERT VALUES (src.id, src.first_name, src.last_name, src.age);| id | first_name | last_name | age |
|---|---|---|---|
| 001 | John | Smith | 7777 |
| 002 | Michael | Watson | 7777 |
| 003 | Den | Brown | 33 |
| 088 | Robert | Dowson | 25 |
id column is
inserted.Simple merge.maprdb.column.id is not the join key
Merging when merge.maprdb.column is not the join key is not
recommended.
DELETE syntax in the MERGE statement
This section describes how to use the DELETE syntax in the
MERGE statement for HPE Data Fabric Database JSON tables.
Included are examples of usage and limitations.
tgt which is the target table of the
MERGE statement, and src, which is the source table from
which data will be taken. Both tables use MapRDBJsonStorageHandler to store
data. The following table shows the initial contents of the tgt
table:| id | Value |
| 1 | AAA |
| 2 | BBB |
| 3 | CCC |
| 4 | DDD |
| 5 | EEE |
src
table:| id | Value |
| 1 | AAA |
| 222 | BBB--- |
| 3 | CCC |
| 444 | DDD--- |
| 5 | EEE |
WHEN MATCHED THEN DELETE
clause. It means that if the id from the tgt table equals
the id from the src table, the row is removed from the
tgt table. When the value of id does not match, a new
row is inserted into the tgt table:MERGE INTO tgt
USING src ON tgt.id=src.id
WHEN MATCHED THEN DELETE
WHEN NOT MATCHED THEN INSERT VALUES (src.id, src.value);The following table
shows the result of the merge:| id | Value |
| 2 | BBB |
| 222 | BBB--- |
| 4 | DDD |
| 444 | DDD--- |
id 1, 3, and 5 from the
tgt table because they existed in the src table, and
they matched values from the tgt table. We did not touch rows with
id 2 and 4, because there were no such values in the src
table. We inserted new rows with id values 222 and 444 because they existed
in the src table and did not exist in the tgt table.Limitations
- Subqueries are not supported as a source when
DELETEis used. - The source table should be a HPE Data Fabric Database JSON table when
deletion is used in a
MERGEoperator. - The
DELETEoperator is not supported with additional conditions afterWHEN MATCHED. Use either a singleUPDATEorDELETE.
MERGE INTO tgt
USING src
ON tgt._id = src._id
WHEN MATCHED AND [boolean expression1] THEN DELETE
WHEN MATCHED AND [boolean expression2] THEN UPDATE
WHEN NOT MATCHED THEN INSERTMultiple source rows match a given target row (cardinality violation)
customer_db_json and
customer_new: | id | first_name | last_name | age |
|---|---|---|---|
| 001 | John | Smith | 45 |
| 002 | Michael | Watson | 27 |
| 003 | Den | Brown | 33 |
| id | first_name | last_name | age |
|---|---|---|---|
| 001 | Dorothi | Hogward | 77 |
| 001 | Dorothi | Hogward | 77 |
| 088 | Robert | Dowson | 25 |
MERGE
customer_new and customer_db_json:
MERGE INTO customer_db_json trg
USING customer_new src ON src.id = trg.id
WHEN MATCHED THEN UPDATE
SET first_name = src.first_name,
last_name = src.last_name
WHEN NOT MATCHED THEN INSERT VALUES
(src.id, src.first_name, src.last_name, src.age);This example causes an exception because of duplicate values in the id
column in the customer_new table:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating cardinality_violation(_col0)
To avoid cardinality violation, set hive.merge.cardinality.check=false,
but in this case the result is unpredictable because there is no rule that defines the order
of duplicated data that will be inserted by using the MERGE statement.
Merge on mixed data types
The merge operation also supports mixed data types, such as arrays, maps, and structures.
mixed_types_source and
mixed_types_target:| doc_id | user_info |
|---|---|
| 1 | {"name":"Brandon","surname":"Lee","age":31,"gender":"MALE"} |
| 2 | {"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"} |
| 3 | {"name":"Mary","surname":"Dowson","age":11,"gender":"FEMALE"} |
| 4 | {"name":"Paul","surname":"Rodgers","age":41,"gender":"MALE"} |
| id | user_info |
|---|---|
| 1 | {"name":"Lexx","surname":"Comfuzer","age":31,"gender":"MALE"} |
mixed_types_source and mixed_types_target:
MERGE INTO mixed_types_target trg
USING mixed_types_source src
ON src.doc_id = old.doc_id
WHEN MATCHED THEN UPDATE
SET user_info = src.user_info
WHEN NOT MATCHED THEN INSERT VALUES
(src.doc_id, src.user_info);| id | first_name |
|---|---|
| 1 | {"name":"Brandon","surname":"Lee","age":31,"gender":"MALE"} |
| 2 | {"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"} |
| 3 | {"name":"Mary","surname":"Dowson","age":11,"gender":"FEMALE"} |
| 4 | {"name":"Paul","surname":"Rodgers","age":41,"gender":"MALE"} |
{"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"}You cannot update only the age field in the structure. You can only
replace all values of the structure with new ones. For details, see Understanding the UPDATE Statement.
Merge into external HPE Data Fabric Database JSON tables
The MERGE operator is also available for external HPE Data Fabric Database JSON tables. You can use the MERGE statement to insert and
update values in external Data Fabric database JSON table
targets.
Merge into partitioned HPE Data Fabric Database JSON tables
Partitioned HPE Data Fabric Database JSON tables are not supported.
Merge into temporary HPE Data Fabric Database JSON tables
The MERGE operator is also available for temporary HPE Data Fabric Database JSON tables. Use temporary tables as target tables for
merge. No additional syntax is needed.