The general approach followed by developers is to
pre-acquire the lock(FOR UPDATE)on the data sets that are being updated/deleted concurrently. This approach may be considered to solve the dead lock problem or to control the concurrent access to one or more rows, but could sometimes lead to an unnecessary locking behavior. In this article we shall discuss about SELECT FOR UPDATE in PostgreSQL and its locking behavior.
Are you aware that PostgreSQL 9.6 is about to have its final release on November 11, 2021 and then considered end-of-life ? Please take a look at this article by Avi Vallarapu, that can be used as a reference for the upgrade procedure.
To start with a simple scenario, let us consider the following example with two tables and let us assume that an application is running concurrent update operations on the
CREATE TABLE parent(id INT PRIMARY KEY, balance INT); CREATE TABLE child(id INT REFERENCES parent(id) ON DELETE CASCADE, trx_timestamp TIMESTAMP);
The following is an example transaction on the
parent table which is using the
FOR UPDATE clause to avoid any deadlock problems for this specific transaction.
BEGIN; SELECT * FROM parent WHERE id=10 FOR UPDATE; UPDATE parent SET balance=balance-2 WHERE id=10;
If we are just going to run the above statements concurrently, then there should not be any deadlocks among the transactions.
This is because, we have serialized the data access by using the
FOR UPDATE clause.
Now, consider that I have an another concurrent transaction which tries to insert the data into the
BEGIN; INSERT INTO child VALUES(10, now());
Following is the behavior observed when we run these two transactions concurrently.
SESSION 1 SESSION 2 postgres=# BEGIN; BEGIN postgres=*# SELECT * FROM parent WHERE id=10 FOR UPDATE; t ---- 10 (1 row) postgres=*# UPDATE parent SET balance=balance-2 WHERE id=10; UPDATE 1 postgres=# BEGIN; BEGIN postgres=*# INSERT INTO child VALUES(10, now()); .... waiting.... .... waiting....
As you see in the above output, the insert operation on the child table is in waiting state.
Even though the
FOR UPDATE clause does not exist in the
SESSION 2, the transaction would still remain in a waiting state. The reason for this waiting state is due to the PostgreSQL's
FOREIGN KEY data validation process between the
Now, let us discuss about the
FOR UPDATE behavior in PostgreSQL.
By default, when we use
FOR UPDATE, it means that it is going to return the rows which are retrieved by the select statement by acquiring the lock on
In the above example we are actually updating the non key column(
balance), by acquiring the lock on
As we hold the lock on primary key column
id, when we try to insert any value into the child table, then it will
try to acquire the
SHARE lock on the parent table's
id column. In this case, the
SESSION 2 transaction will wait until the
SESSION 1 transaction is complete.
In majority of the cases, we actually do not need to update any
rather we update only the
non key columns like salary, phone number, address, etc. While updating the
non key column values, we could force the behavior to not hold any lock on the
key columns. To achieve this with SELECT FOR UPDATE in PostgreSQL, we could use the
NO KEY option in the
FOR UPDATE clause.
Now, let try the same example with the
NO KEY option.
SESSION 1 SESSION 2 postgres=# BEGIN; BEGIN postgres=*# SELECT * FROM parent WHERE id=10 FOR NO KEY UPDATE; t ---- 10 (1 row) postgres=*# UPDATE parent SET balance=balance-2 WHERE id=10; UPDATE 1 postgres=# BEGIN; BEGIN postgres=*# INSERT INTO child VALUES(10, now()); INSERT 0 1 postgres=*# END; COMMIT postgres=*# END; COMMIT
From the above results, the
INSERT operation in the
SESSION 2 was much smooth and has completed it's activity without any waits.
Now, let us try whether the
FOR NO KEY UPDATE solves the deadlock issue as well.
SESSION 1 SESSION 2 postgres=# BEGIN; BEGIN postgres=*# SELECT * FROM parent WHERE id=10 FOR NO KEY UPDATE; t ---- 10 (1 row) postgres=# BEGIN; BEGIN postgres=*# SELECT * FROM parent WHERE id=10 FOR NO KEY UPDATE; .... waiting.... .... waiting.... postgres=*# UPDATE parent SET balance=balance-2 WHERE id=10; UPDATE 1 postgres=*# END; COMMIT postgres=*# UPDATE parent SET balance=balance-2 WHERE id=10; UPDATE 1 postgres=*# END; COMMIT
As you see,
FOR NO KEY UPDATE is also solving the dead lock problem.
In some cases, we may really require to hold a lock on the
key columns. Some use cases are like updating the employee ID/SSN/Passport number,
where we need to hold the lock on the
key value, which also requires a lock to be held on all of it's referenced column values. This is required to maintain the data consistency across the parent and child tables. This is what was happening in the plain
FOR UPDATE situation, where the
child table transaction has gone into a waiting state.
Now, imagine that we are using
FOR UPDATE in a view or in a humdinger SQL query to avoid a deadlock. Then, the
FOR UPDATE will may avoid the deadlock problems but it is going to block a huge number of
key values, which will make all the
child table transactions to go into the waiting state. By saying this, we would suggest you to use the
FOR UPDATE with the right usage and see the business logic before implementing it.
SELECT for UPDATE in PostgreSQL may solve some of the concurrency problems but also introduces additional challenges due to the locking. It may be a good approach for some of the application but when you see the locking causing more damages, we may need to approach this with pessimistic locking achieved through advisory locks. We shall look into advisory locks and how it can be efficiently as part of our next article. Meanwhile, if you have have any performance problems with PostgreSQL or need support in migrating to PostgreSQL, please contact us or you may fill the following form.