Understanding the MERGE Statement
You can use the MERGE
statement to perform record-level
INSERT
and UPDATE
operations efficiently within Hive
tables.
The MERGE
statement can be a key tool for Data Fabriccluster data management. It is based on ANSI-standard
SQL.
MERGE
statement:- Simple merge.maprdb.column.id is the join key
- Simple merge.maprdb.column.id is not the join key
- DELETE syntax in the MERGE statement
- Multiple source rows match a given target row (cardinality violation)
- Merge on mixed data types
- Merge into external HPE Ezmeral Data Fabric Database JSON tables
- Merge into partitioned HPE Ezmeral Data Fabric Database JSON tables
- Merge into temporary HPE Ezmeral Data Fabric Database JSON tables
Simple merge.maprdb.column.id
is the join
key
id | first_name | last_name | age |
---|---|---|---|
001 | Dorothi | Hogward | 7777 |
002 | Alex | Bowee | 7777 |
088 | Robert | Dowson | 25 |
id | first_name | last_name | age |
---|---|---|---|
001 | John | Smith | 45 |
002 | Michael | Watson | 27 |
003 | Den | Brown | 33 |
MERGE into customer_db_json_target trg
USING customer_source src
ON src.id = trg.id
WHEN MATCHED THEN UPDATE SET age = src.age
WHEN NOT MATCHED THEN
INSERT VALUES (src.id, src.first_name, src.last_name, src.age);
id | first_name | last_name | age |
---|---|---|---|
001 | John | Smith | 7777 |
002 | Michael | Watson | 7777 |
003 | Den | Brown | 33 |
088 | Robert | Dowson | 25 |
id
column is
inserted.Simple merge.maprdb.column.id
is not the join key
Merging when merge.maprdb.column
is not the join key is not
recommended.
DELETE syntax in the MERGE statement
This section describes how to use the DELETE
syntax in the
MERGE
statement for HPE Ezmeral Data Fabric Database JSON tables.
Included are examples of usage and limitations.
tgt
which is the target table of the
MERGE
statement, and src
, which is the source table from
which data will be taken. Both tables use MapRDBJsonStorageHandler
to store
data. The following table shows the initial contents of the tgt
table:id | Value |
1 | AAA |
2 | BBB |
3 | CCC |
4 | DDD |
5 | EEE |
src
table:id | Value |
1 | AAA |
222 | BBB--- |
3 | CCC |
444 | DDD--- |
5 | EEE |
WHEN MATCHED THEN DELETE
clause. It means that if the id
from the tgt
table equals
the id
from the src
table, the row is removed from the
tgt
table. When the value of id
does not match, a new
row is inserted into the tgt
table:MERGE INTO tgt
USING src ON tgt.id=src.id
WHEN MATCHED THEN DELETE
WHEN NOT MATCHED THEN INSERT VALUES (src.id, src.value);
The following table
shows the result of the merge:id | Value |
2 | BBB |
222 | BBB--- |
4 | DDD |
444 | DDD--- |
id
1, 3, and 5 from the
tgt
table because they existed in the src
table, and
they matched values from the tgt
table. We did not touch rows with
id
2 and 4, because there were no such values in the src
table. We inserted new rows with id
values 222 and 444 because they existed
in the src
table and did not exist in the tgt
table.Limitations
- Subqueries are not supported as a source when
DELETE
is used. - The source table should be a HPE Ezmeral Data Fabric Database JSON table when
deletion is used in a
MERGE
operator. - The
DELETE
operator is not supported with additional conditions afterWHEN MATCHED
. Use either a singleUPDATE
orDELETE
.
MERGE INTO tgt
USING src
ON tgt._id = src._id
WHEN MATCHED AND [boolean expression1] THEN DELETE
WHEN MATCHED AND [boolean expression2] THEN UPDATE
WHEN NOT MATCHED THEN INSERT
Multiple source rows match a given target row (cardinality violation)
customer_db_json
and
customer_new
: id | first_name | last_name | age |
---|---|---|---|
001 | John | Smith | 45 |
002 | Michael | Watson | 27 |
003 | Den | Brown | 33 |
id | first_name | last_name | age |
---|---|---|---|
001 | Dorothi | Hogward | 77 |
001 | Dorothi | Hogward | 77 |
088 | Robert | Dowson | 25 |
MERGE
customer_new
and customer_db_json
:
MERGE INTO customer_db_json trg
USING customer_new src ON src.id = trg.id
WHEN MATCHED THEN UPDATE
SET first_name = src.first_name,
last_name = src.last_name
WHEN NOT MATCHED THEN INSERT VALUES
(src.id, src.first_name, src.last_name, src.age);
This example causes an exception because of duplicate values in the id
column in the customer_new
table:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating cardinality_violation(_col0)
To avoid cardinality violation, set hive.merge.cardinality.check=false
,
but in this case the result is unpredictable because there is no rule that defines the order
of duplicated data that will be inserted by using the MERGE
statement.
Merge on mixed data types
The merge operation also supports mixed data types, such as arrays, maps, and structures.
mixed_types_source
and
mixed_types_target
:doc_id | user_info |
---|---|
1 | {"name":"Brandon","surname":"Lee","age":31,"gender":"MALE"} |
2 | {"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"} |
3 | {"name":"Mary","surname":"Dowson","age":11,"gender":"FEMALE"} |
4 | {"name":"Paul","surname":"Rodgers","age":41,"gender":"MALE"} |
id | user_info |
---|---|
1 | {"name":"Lexx","surname":"Comfuzer","age":31,"gender":"MALE"} |
mixed_types_source
and mixed_types_target
:
MERGE INTO mixed_types_target trg
USING mixed_types_source src
ON src.doc_id = old.doc_id
WHEN MATCHED THEN UPDATE
SET user_info = src.user_info
WHEN NOT MATCHED THEN INSERT VALUES
(src.doc_id, src.user_info);
id | first_name |
---|---|
1 | {"name":"Brandon","surname":"Lee","age":31,"gender":"MALE"} |
2 | {"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"} |
3 | {"name":"Mary","surname":"Dowson","age":11,"gender":"FEMALE"} |
4 | {"name":"Paul","surname":"Rodgers","age":41,"gender":"MALE"} |
{"name":"Johnson","surname":"Fall","age":23,"gender":"MALE"}
You cannot update only the age
field in the structure. You can only
replace all values of the structure with new ones. For details, see Understanding the UPDATE Statement.
Merge into external HPE Ezmeral Data Fabric Database JSON tables
The MERGE
operator is also available for external HPE Ezmeral Data Fabric Database JSON tables. You can use the MERGE statement to insert and
update values in external Data Fabric database JSON table
targets.
Merge into partitioned HPE Ezmeral Data Fabric Database JSON tables
Partitioned HPE Ezmeral Data Fabric Database JSON tables are not supported.
Merge into temporary HPE Ezmeral Data Fabric Database JSON tables
The MERGE
operator is also available for temporary HPE Ezmeral Data Fabric Database JSON tables. Use temporary tables as target tables for
merge. No additional syntax is needed.