Merge operation - New Feature added in PostgreSQL 15

Swapnil Suryawanshi
Swapnil Suryawanshi

PostgreSQL, an open-source relational database management system, provides a comprehensive set of features for managing and manipulating data. One such feature which has been introduced in PostgreSQL version 15 is the SQL-standard MERGE statement. It is a powerful tool used for simultaneously inserting, updating and deleting data in a table based on specified conditions. This feature is especially useful when working with large amounts of data and allows to save programming effort on the application side.

This article dives into the merge feature, covering its usage and benefits, and provides a practical test case to demonstrate its capabilities.

Note: This feature is primarily suited for database use cases that follow an OLAP approach rather than an OLTP one. For OLTP, INSERT ON CONFLICT offers better control and performance, so if that is the use case you aim to resolve, we recommend refraining from attempting to use merge.

Prerequisites

To understand and utilize the merge feature in PostgreSQL, readers should have a basic understanding of SQL queries, and PostgreSQL syntax, and have PostgreSQL installed on their system. Familiarity with table structures and data manipulation operations will be beneficial.

MERGE provides a single SQL statement that can conditionally INSERT, UPDATE or DELETE rows in a target_table using a data_source, a task that would otherwise require multiple procedural language statements. First, the MERGE command performs a join from data_source to target_table producing zero or more candidate change rows. For each candidate change row, the status of MATCHED or NOT MATCHED is determined, after which the WHEN clauses for that status are evaluated in the order specified. For each candidate change row, the first clause to evaluate as true is executed. No more than one WHEN clause is executed for any candidate change row.

The syntax of the merge command in PostgreSQL is as follows:

[ WITH with_query [, ...] ]
MERGE INTO [ ONLY ] target_table [ * ] [ [ AS ] target_alias ]
USING data_source ON join_condition
when_clause [...]

The data_source can be another table, or it can be a subquery. This is the general form:

{ [ ONLY ] source_table_name [ * ] |
( source_query ) }
[ [ AS ] source_alias ]

The when_clause determines what action to execute for each row that is matched or not matched. These are the two possible forms:

WHEN MATCHED [ AND condition ] THEN { merge_update | DELETE | DO NOTHING } |

WHEN NOT MATCHED [ AND condition ] THEN { merge_insert | DO NOTHING }

When a source row is matched in the target table, the one in the target table can be either updated or deleted. To delete it, simply write delete. To update it, the merge_update clause is used, which looks like this:

UPDATE SET { column_name = { expression | DEFAULT } |
( column_name [, ...] ) = ( { expression | DEFAULT } [, ...] ) } [, ...]

When a source row is not matched in the target table, one can be inserted:

INSERT [( column_name [, ...] )] [ OVERRIDING { SYSTEM | USER } VALUE ]
{ VALUES ( { expression | DEFAULT } [, ...] ) | DEFAULT VALUES }

To know more about the parameters used in the merge command, refer to the MERGE reference page.

Note: MERGE does not currently support the case where target_table is a materialized view, foreign table, or if it has any rules defined on it.

Examples

Let's see a few examples to understand how merge works in PostgreSQL.

Example 1

Suppose we have two tables books_available, books_updated where books_available is the currently available number of books of particular genre, books_updated contains updated inventory and number of books sold.

-- Create the books_available table
CREATE TABLE books_available (
product_id INT primary key,
genre VARCHAR(50),
qty_in_stock INT
);

-- Create the updated books_updated table 
CREATE TABLE books_updated (
product_id INT primary key,
genre VARCHAR(50),
books_arrival_qty INT
);



-- Insert values into the books_available table
INSERT INTO books_available (product_id, genre, qty_in_stock)
VALUES
(1, 'Thriller', 10),
(2, 'Romance', 15),
(3, 'Fantasy', 12),
(4, 'Fiction', 10),
(5, 'Sci-fi', 6);

-- Insert values into the updated books_updated table
INSERT INTO books_updated (product_id, genre, books_arrival_qty)
VALUES
(1, 'Thriller', 7),
(2, 'Romance', 9),
(4, 'Fiction', 10),
(6, 'Contemporary', 2),
(7, 'Mystery', 4);

Now let's run a merge that reads a series of changes from books_updated and apply them to books_available:

MERGE INTO books_available AS b
USING books_updated AS u
ON b.product_id = u.product_id
WHEN MATCHED THEN
UPDATE SET b.qty_in_stock = b.qty_in_stock + u.books_arrival_qty
WHEN NOT MATCHED THEN
INSERT (product_id, genre, qty_in_stock)
VALUES (u.product_id, u.genre, u.books_arrival_qty);

edb=# SELECT * FROM books_available ORDER BY product_id ASC;
product_id | genre | qty_in_stock 
1 | Thriller | 17
2 | Romance | 24
3 | Fantasy | 12
4 | Fiction | 20
5 | Sci-fi | 6
6 | Contemporary | 2
7 | Mystery | 4
(7 rows)

As we can see, the books_available table has been updated with the data from the books_updated table, and a new row has been inserted for the products with genres "Contemporary" and "Mystery"

We will look into another example where all the conditions are utilised in the merge command.

Example 2

The modified values of the same tables are as below:

Before performing merge operation:

CREATE TABLE books_available (
product_id SERIAL PRIMARY KEY,
genre VARCHAR(255),
qty_in_stock INT,
avg_rating NUMERIC(3, 1), -- Assuming average rating is stored with one decimal place
price_per_unit DECIMAL(10, 2) -- Assuming price per unit is stored with two decimal places
);

CREATE TABLE books_updated (
product_id SERIAL PRIMARY KEY,
genre VARCHAR(255),
books_sold_qty INT,
books_arrival_qty INT,
average_rating_update NUMERIC(3, 1), -- Assuming average rating is stored with one decimal place
price_update DECIMAL(10, 2) -- Assuming price per unit is stored with two decimal places
);

INSERT INTO books_available (product_id, genre, qty_in_stock, avg_rating, price_per_unit)
VALUES
(1, 'Thriller', 10, 4.5, 20.0),
(2, 'Romance', 15, 4.2, 18.0),
(3, 'Fantasy', 12, 4.7, 22.0),
(4, 'Fiction', 10, 4.0, 16.0),
(5, 'Sci-fi', 6, 4.3, 24.0);

INSERT INTO books_updated (product_id, genre, books_sold_qty, books_arrival_qty, average_rating_update, price_update)
VALUES
(1, 'Thriller', 7, 6, 4.6, 21.0),
(2, 'Romance', 9, 5, 4.3, 19.0),
(4, 'Fiction', 10, 0, 4.1, 17.0),
(6, 'Contemporary', 0, 0, 0.0, 0.0),
(7, 'Mystery', 0, 10, 4.8, 25.0);

Merge operation:

MERGE INTO books_available A 
USING books_updated B 
ON B.product_id = A.product_id 
WHEN MATCHED AND qty_in_stock - books_sold_qty + books_arrival_qty > 0 THEN 
UPDATE SET 
qty_in_stock = qty_in_stock - books_sold_qty + books_arrival_qty,
avg_rating = B.average_rating_update,
price_per_unit = B.price_update
WHEN MATCHED THEN 
DELETE
WHEN NOT MATCHED AND books_arrival_qty > 0 THEN 
INSERT VALUES (
B.product_id,
B.genre,
B. books_arrival_qty,
B.average_rating_update,
B.price_update
)
WHEN NOT MATCHED THEN 
DO NOTHING; 

Updated values post-merge operation:

edb=# SELECT * FROM books_available ORDER BY product_id ASC;
product_id | genre | qty_in_stock | avg_rating | price_per_unit 
1 | Thriller | 9 | 4.6 | 21.00
2 | Romance | 11 | 4.3 | 19.00
3 | Fantasy | 12 | 4.7 | 22.00
5 | Sci-fi | 6 | 4.3 | 24.00
7 | Mystery | 10 | 4.8 | 25.00
(5 rows)

As we can see the table books_available, the genres "Thriller" and "Romance" got updated, genre "Fiction" got deleted as the updated quantity of books is 0, genre "Mystery" got inserted as its not available in the table books_available, genre "Contemporary" from books_updated does not get inserted as the books_arrival_qty is not greater than 0.

Example 3: WITH Clause

WITH provides a way to write auxiliary statements for use in a larger query. These statements, which are often referred to as Common Table Expressions or CTEs, can be thought of as defining temporary tables that exist just for one query. Each auxiliary statement in a WITH clause can be a SELECT, INSERT, UPDATE, or DELETE; and the WITH clause itself is attached to a primary statement that can be a SELECT, INSERT, UPDATE, DELETE, or MERGE.

Let's use the same above books_available and books_updated table with the below WITH clause.

Merge operation:

WITH updated_books AS (
SELECT
B.product_id AS updated_product_id,
B.genre AS updated_genre,
B. books_arrival_qty AS updated_ books_arrival_qty,
B.price_update AS updated_price,
B.average_rating_update AS updated_rating,
B. books_arrival_qty * B.price_update AS total_value
FROM books_updated B
WHERE B. books_arrival_qty > 0
)

MERGE INTO books_available A
USING updated_books U
ON A.product_id = U.updated_product_id
WHEN MATCHED THEN
UPDATE SET
qty_in_stock = A.qty_in_stock + U.updated_ books_arrival_qty,
price_per_unit = U.updated_price
WHEN NOT MATCHED THEN
INSERT (product_id, genre, qty_in_stock, avg_rating, price_per_unit)
VALUES (U.updated_product_id, U.updated_genre, U.updated_ books_arrival_qty, U.updated_rating, U.updated_price);

Updated values post merge operation:

SELECT * FROM books_available order by product_id ASC;
product_id | genre | qty_in_stock | avg_rating | price_per_unit 
1 | Thriller | 16 | 4.6 | 21.00
2 | Romance | 20 | 4.3 | 19.00
3 | Fantasy | 12 | 4.7 | 22.00
4 | Fiction | 10 | 4.0 | 16.00
5 | Sci-fi | 6 | 4.3 | 24.00
7 | Mystery | 10 | 4.8 | 25.00
(6 rows)

This CTE selects the product ID, genre, new book quantity, updated average rating, and updated price from the books_updated table but only includes rows where books_arrival_qty is greater than 0.

We then use the MERGE statement to match and update records in the books_available table based on the condition that the product ID matches and there are new books to add (where books_arrival_qty > 0). If a match is found, we update the quantity in stock, average rating, and price. If there's no match, we insert a new record into the books_available table.

If merge command was not in use, these above operations would take many lines of code with a lot of complexity involved which in return would increase the overhead (in terms of time) as data has to be shifted from database to application multiple times. From a data processing perspective, this command is quite helpful as it reduces the I/O operations from the disk for each of the three statements individually and now data is being read from the source only once.

Note: Also, the performance of the MERGE statement greatly depends on the proper indexes being used to match both the source and the target tables. Apart from indexes, it is also essential that the join conditions are optimized. We should also try to filter the source table so that only necessary rows are being fetched by the statement to do the necessary operations.

Concurrency

Initially, a determination of MATCHED or NOT MATCHED status is determined for each row. If a row is MATCHED and is modified, but still exists according to the join conditions, then all the WHEN MATCHED clauses are tested again. On the other hand, if it matches and is modified in a way that makes it no longer visible to the join, then the WHEN NOT MATCHED conditions are executed for it.

For WHEN NOT MATCHED, this does not occur. If a row is considered as NOT MATCHED and, because of a concurrent modification becomes visible (i.e. it becomes MATCHED), then the whole command is aborted with an error, which might be one of:

ERROR: duplicate key value violates unique constraint
ERROR: tuple to be locked was already moved to another partition due to a concurrent update

How to deal with concurrency:

  • The easiest is to not run two MERGEs concurrently targeting the same table •(perhaps: LOCK TABLE books_available IN SHARE MODE)

  • If you must have two, make them not have WHEN NOT MATCHED THEN INSERT clauses

  • If you must allow concurrent insertion, rewrite to INSERT ON CONFLICT UPDATE

  • If you do not lock the table, test extensively for concurrent scenarios •plan to spend much longer testing than developing

Cleaning up the source table

After running merge, the source data is presumably no longer useful; but we will want to run MERGE again in the future. To allow applications to write new data in the source table, several approaches are possible:

  1. Use a transaction in serializable or repeatable read isolation mode, and delete the rows in the source table after running merge in the same transaction. This will delete the rows that were seen by merge; any other rows that were inserted in the source table by transactions that were committed after MERGE read the data will persist.

  2. Have two "source" tables: one which is being inserted into by the application (source_active), and a separate one that is used by merge (source_merging). Once merge finishes, truncate source_merging, and then ALTER TABLE RENAME each table in place of the other. The application can be written continuously to source_active, which merge will not touch.

  3. Use a CTE that deletes the source data at the same time as it returns it.

WITH source_cte AS (DELETE FROM source RETURNING *) 
MERGE INTO target FROM source_cte ...
  1. As above, but use a flag that tells that the row has been processed, and ignore rows that have it:
WITH source_cte AS (
UPDATE source
SET row_merged=true
WHERE <some condition> AND NOT row_merged
RETURNING *
) MERGE INTO target_tbl FROM source_cte ...

Note that in this last case, history is preserved: rows that have been processed are still in the source table after merge is done.

As a more elaborate alternative of this last one, you might make row_merged a timestamp, which is initially NULL and this source_cte turns into current_timestamp (so the WHERE condition would be AND row_merged IS NULL). This case lets you know exactly when was each source row merged into the target table.

Triggers and MERGE

No separate triggers are defined for MERGE. Instead, statement-level and row-level UPDATE, DELETE, and INSERT triggers are fired depending on (for statement-level triggers) what actions are specified in the MERGE query and (for row-level triggers) what actions are performed.

While running a MERGE command, statement-level BEFORE and AFTER triggers are fired for events specified in the actions of the MERGE command, irrespective of whether or not the action is ultimately performed. This is the same as an UPDATE statement that updates no rows: statement-level triggers are fired for it, yet no row-level triggers are fired. The row-level triggers are fired only when a row is actually updated, inserted or deleted. So it's perfectly legal that while statement-level triggers are fired for certain types of action, no row-level triggers are fired for the same kind of action.

Trigger behaviour:

  • BEFORE EACH STATEMENT •INSERT •UPDATE •DELETE
  • BEFORE EACH ROW •Each row as scanned by the join
  • AFTER EACH ROW •Each row in the same order as above
  • AFTER EACH STATEMENT •DELETE •UPDATE

To know more about trigger behaviour, please refer to Overview of Trigger Behavior.

Example: Use the same tables from above (example 2) and create a function and trigger as below

-- Create an AFTER INSERT OR UPDATE trigger on books_available
CREATE OR REPLACE FUNCTION check_stock_trigger()
RETURNS TRIGGER AS $$
BEGIN
-- Check if the stock quantity is zero
IF NEW.qty_in_stock = 0 THEN
RAISE WARNING 'Call the editor to place an order for book %', NEW.product_id;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Attach the trigger to the books_available table
CREATE TRIGGER after_stock_update
AFTER INSERT OR UPDATE
ON books_available
FOR EACH ROW
EXECUTE FUNCTION check_stock_trigger();

Run the merge command:

edb=# MERGE INTO books_available AS A
USING books_updated AS B
ON A.product_id = B.product_id
WHEN MATCHED THEN
UPDATE SET qty_in_stock = B. books_arrival_qty;
WARNING: Call the editor to place an order for book 4
MERGE 3

edb=# select * from books_available order by product_id ASC;
product_id | genre | qty_in_stock | avg_rating | price_per_unit 
1 | Thriller | 6 | 4.5 | 20.00
2 | Romance | 5 | 4.2 | 18.00
3 | Fantasy | 12 | 4.7 | 22.00
4 | Fiction | 0 | 4.0 | 16.00
5 | Sci-fi | 6 | 4.3 | 24.00
(5 rows) 

Here a trigger is generated when the stock became '0' for a book with product 4

Conclusion

We have shown how the merge command in PostgreSQL is a powerful tool for modifying data into a table based on specified conditions. It simplifies complex data change operations and saves time and resources when working with large amounts of data. Database administrators and developers can leverage these features to improve the performance and scalability of their applications while making their application code easier to maintain.

Was this article helpful?

0 out of 0 found this helpful