SELECT FOR UPDATE and its behavior with foreign keys in PostgreSQL

The general approach followed by developers is to pre-acquire the lock(FOR UPDATE)on the data sets that are being updated/deleted concurrently. This approach may be considered to solve the dead lock problem or to control the concurrent access to one or more rows, but could sometimes lead to an unnecessary locking behavior. In this article we shall discuss about SELECT FOR UPDATE in PostgreSQL and its locking behavior.

Are you aware that PostgreSQL 9.6 is about to have its final release on November 11, 2021 and then considered end-of-life ? Please take a look at this article by Avi Vallarapu, that can be used as a reference for the upgrade procedure.

Subscribe to Our Newsletters

To start with a simple scenario, let us consider the following example with two tables and let us assume that an application is running concurrent update operations on the parent table.

    CREATE TABLE parent(id INT PRIMARY KEY, balance INT);
    CREATE TABLE child(id INT REFERENCES parent(id) ON DELETE CASCADE, trx_timestamp TIMESTAMP);

The following is an example transaction on the parent table which is using the FOR UPDATE clause to avoid any deadlock problems for this specific transaction.

    BEGIN;
        SELECT * FROM parent WHERE id=10 FOR UPDATE;
        UPDATE parent SET balance=balance-2 WHERE id=10;

If we are just going to run the above statements concurrently, then there should not be any deadlocks among the transactions.
This is because, we have serialized the data access by using the FOR UPDATE clause.

Now, consider that I have an another concurrent transaction which tries to insert the data into the child table.

    BEGIN;
        INSERT INTO child VALUES(10, now());

Following is the behavior observed when we run these two transactions concurrently.

SESSION 1                                                       SESSION 2
postgres=# BEGIN;   
BEGIN   
postgres=*# SELECT * FROM parent WHERE id=10 FOR UPDATE;  
t   
----    
10  
(1 row) 
postgres=*# UPDATE parent SET balance=balance-2 WHERE id=10;    
UPDATE 1    

                                                                postgres=# BEGIN;
                                                                BEGIN
                                                                postgres=*# INSERT INTO child VALUES(10, now());
                                                                .... waiting....
                                                                .... waiting....

As you see in the above output, the insert operation on the child table is in waiting state.
Even though the FOR UPDATE clause does not exist in the SESSION 2, the transaction would still remain in a waiting state. The reason for this waiting state is due to the PostgreSQL's FOREIGN KEY data validation process between the child and parent tables.

Now, let us discuss about the FOR UPDATE behavior in PostgreSQL. Select for Update in PostgreSQL
By default, when we use FOR UPDATE, it means that it is going to return the rows which are retrieved by the select statement by acquiring the lock on key columns.

In the above example we are actually updating the non key column(balance), by acquiring the lock on id column.
As we hold the lock on primary key column id, when we try to insert any value into the child table, then it will
try to acquire the SHARE lock on the parent table's id column. In this case, the SESSION 2 transaction will wait until the SESSION 1 transaction is complete.

In majority of the cases, we actually do not need to update any key columns,
rather we update only the non key columns like salary, phone number, address, etc. While updating the non key column values, we could force the behavior to not hold any lock on the key columns. To achieve this with SELECT FOR UPDATE in PostgreSQL, we could use the NO KEY option in the FOR UPDATE clause.

Now, let try the same example with the NO KEY option.

SESSION 1                                                       SESSION 2
postgres=# BEGIN;   
BEGIN   
postgres=*# SELECT * FROM parent WHERE id=10 FOR NO KEY UPDATE;  
t   
----    
10  
(1 row) 
postgres=*# UPDATE parent SET balance=balance-2 WHERE id=10;    
UPDATE 1    

                                                                postgres=# BEGIN;
                                                                BEGIN
                                                                postgres=*# INSERT INTO child VALUES(10, now());
                                                                INSERT 0 1
                                                                postgres=*# END;
                                                                COMMIT
 postgres=*# END;
 COMMIT

From the above results, the INSERT operation in the SESSION 2 was much smooth and has completed it's activity without any waits.

Now, let us try whether the FOR NO KEY UPDATE solves the deadlock issue as well.

SESSION 1                                                       SESSION 2
postgres=# BEGIN;   
BEGIN   
postgres=*# SELECT * FROM parent WHERE id=10 FOR NO KEY UPDATE;  
t   
----    
10  
(1 row) 
                                                                postgres=# BEGIN;
                                                                BEGIN
                                                                postgres=*# SELECT * FROM parent WHERE id=10 FOR NO KEY UPDATE;
                                                                .... waiting....
                                                                .... waiting....
postgres=*# UPDATE parent SET balance=balance-2 WHERE id=10;    
UPDATE 1                                                
postgres=*# END;
COMMIT
                                                                postgres=*# UPDATE parent SET balance=balance-2 WHERE id=10;    
                                                                UPDATE 1
                                                                postgres=*# END;
                                                                COMMIT

As you see, FOR NO KEY UPDATE is also solving the dead lock problem.

In some cases, we may really require to hold a lock on the key columns. Some use cases are like updating the employee ID/SSN/Passport number,
where we need to hold the lock on the key value, which also requires a lock to be held on all of it's referenced column values. This is required to maintain the data consistency across the parent and child tables. This is what was happening in the plain FOR UPDATE situation, where the child table transaction has gone into a waiting state.

Now, imagine that we are using FOR UPDATE in a view or in a humdinger SQL query to avoid a deadlock. Then, the FOR UPDATE will may avoid the deadlock problems but it is going to block a huge number of key values, which will make all the child table transactions to go into the waiting state. By saying this, we would suggest you to use the FOR UPDATE with the right usage and see the business logic before implementing it. 

Conclusion

SELECT for UPDATE in PostgreSQL may solve some of the concurrency problems but also introduces additional challenges due to the locking. It may be a good approach for some of the application but when you see the locking causing more damages, we may need to approach this with pessimistic locking achieved through advisory locks. We shall look into advisory locks and how it can be efficiently as part of our next article. Meanwhile, if you have have any performance problems with PostgreSQL or need support in migrating to PostgreSQL, please contact us or you may fill the following form. 

 

3 thoughts on “SELECT FOR UPDATE and its behavior with foreign keys in PostgreSQL”

  1. Thanks for the article!
    You might have mentioned that the problem can be avoided by using FOR NO KEY UPDATE instead of FOR UPDATE. That would have been sufficient in your example, since you don’t update a key, and this lock does not conflict with the FOR KEY SHARE lock required by the foreign key constraint.
    FOR UPDATE is only appropriate if you intend to delete the row or modify a primary or unique key column.

    1. Laurenz,
      Thank you so much for your inputs.
      Yes, you are right. The FOR KEY SHARE do conflict with the FOR UPDATE lock.

      But, I see many developers are using the FOR UPDATE clause to avoid the deadlocks,
      and after that they complain about the slow application behavior.

      We see few customers are facing this kind of issue, where they are actually updating the non-key columns
      by acquiring lock on key column. In their use case, they no need to use the FOR UPDATE rather they can just go with the FOR NO KEY UPDATE,
      which solves their purpose of dead lock prevention, besides to not holding the lock on key column value.

      Hope this helps other customers to use the FOR UPDATE clause in the use cases as you suggested.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top