In one of our articles on pgBackRest, we have discussed how to setup a robust backup strategy using pgBackrest. However, it is also important to know how to restore and test our backups for the purpose of recoverability. This makes sure that we have proper backup and if a need arises, we can restore the backup as of a specific point in time. pgBackrest supports a handful of restore options that can be used to restore a specific backup bundle either on the same server or into a different server. In this article, we shall discuss all the steps involved in restoring a backup from a backup repository using pgBackRest.
Pre-requisites
Before we discuss about how to restore, we shall look into the prerequisites that we need to address to make the process seamless.
- pgBackrest must be installed on the server (or target server) to which the backup is being restored to.
- If the repository server and the target server are different, a trusted authentication using SSH keys between these 2 servers is required.
Types of restore being discussed in this article
Based on the generally observed restore requirements, I have demonstrated 3 mostly used restore methods in this article.
- Restoring a backup from a backup repository to a different data directory other than the actual data directory location.
- Restoring up to a specific Incremental backup.
- Point in time recovery or PITR to a different data directory.
Generate sample backups to test restore and point-in-time recovery methods.
In order to test the restore, let us use pgBackRest to take a few backups (Full and Incremental). I would create a table with a timestamp column followed by inserting a few records.
Following are the commands I have to generate backups. You can read my colleague's article on How to take backups using pgBackRest, to reproduce these example backups.
-- Create database : pgbench
postgres@pg1:~$ psql -c "create database pgbench”
-- Initialize pgbench to load sample data
postgres@pg1:~$ pgbench -i -s 100 pgbench
-- Take a Full Backup
$ pgbackrest --config=/etc/pgbackrest/backrest_standby.conf --stanza=backupstanza --type=full --archive-timeout=2000 --compress backup
-- Create database : pitr
postgres@pg1:~$ psql -c "create database pitr”
-- Connect to database : pitr and create a table and load sample data
postgres@pg1:~$ psql -d pitr
CREATE TABLE users(
id SERIAL PRIMARY KEY,
firstname VARCHAR(100) NOT NULL,
lastname VARCHAR(200) NOT NULL,
gender VARCHAR(6) NOT NULL CHECK (gender IN (‘male’, ‘female’)),
inserted_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Take an Incremental Backup
$ pgbackrest --config=/etc/pgbackrest/backrest_standby.conf --stanza=backupstanza --type=incr --archive-timeout=2000 --compress backup
-- Insert a few sample records to test PITR
INSERT INTO users (firstname, lastname, gender) VALUES ('larissa', 'jordan','female');
select pg_sleep(30);
INSERT INTO users (firstname, lastname, gender) VALUES ('steve', 'jordan','male');
Output of pgBackRest info command to list all backups appears as following upon executing the above commands.
pgbackrest@backupserver:~$ pgbackrest --config=/etc/pgbackrest/backrest_standby.conf --stanza=backupstanza info
stanza: backupstanza
status: ok
cipher: none
db (current)
wal archive min/max (13): 0000000300000002000000FD/0000000300000003000000AE
full backup: 20211011-083144F
timestamp start/stop: 2021-10-11 08:31:44 / 2021-10-11 08:31:56
wal start/stop: 0000000300000002000000FD / 000000030000000300000004
database size: 216.0MB, database backup size: 216.0MB
repo1: backup set size: 10.9MB, backup size: 10.9MB
incr backup: 20211011-083144F_20211011-094611I
timestamp start/stop: 2021-10-11 09:46:11 / 2021-10-11 09:47:12
wal start/stop: 0000000300000003000000A7 / 0000000300000003000000AE
database size: 1.8GB, database backup size: 1.7GB
repo1: backup set size: 181.2MB, backup size: 176.9MB
backup reference list: 20211011-083144F
Please Note : We are always restoring the backups to a different folder. This is because the recovery of a backup on to a live server overwriting the existing data directory may avoid the possibilities to rollback. Sometimes, we may also try to restore the backup to a different location to get the required data and then destroy the newly restored PostgreSQL instance.
Restoring a backup to a different data directory but not to the original data directory location.
This kind of restore is the easiest of all. It’s just a single command to restore the full backup along with the incremental backups and the archives that got generated after the last incremental backup. The following command would restore and recover the database until the latest point in time. It starts by applying the latest full backup followed up restoring all the incremental backups and then applies all the WAL segments generated after the last incremental backup until the last WAL segment archived
pgbackrest --log-level-console=info --log-level-file=info --archive-mode=off --repo1-path=/backupdir --repo1-host=backupserver --repo1-host-user=pgbackrest --stanza=backupstanza --pg1-path=/var/lib/postgresql/13_restore restore
When you have wal segments that were not yet archived, we need to copy those from the pg_wal directory of the Primary Instance to the corresponding pg_wal directory of the target.
The output of the above command looks like following.
postgres@pg2:~/13_restore$ pgbackrest --log-level-console=info --log-level-file=info --archive-mode=off --repo1-path=/backupdir --repo1-host=backupserver --repo1-host-user=pgbackrest --stanza=backupstanza --pg1-path=/var/lib/postgresql/13_restore restore
2021-10-11 12:24:37.506 P00 INFO: restore command begin 2.35: --archive-mode=off --exec-id=38673-ef1a350d --log-level-console=info --log-level-file=info --pg1-path=/var/lib/postgresql/13_restore --repo1-host=backupserver --repo1-host-user=pgbackrest --repo1-path=/backupdir --stanza=backupstanza
2021-10-11 12:24:37.960 P00 INFO: repo1: restore backup set 20211011-083144F_20211011-101232I
2021-10-11 12:24:37.961 P00 INFO: remap data directory to '/var/lib/postgresql/13_restore'
2021-10-11 12:24:52.248 P00 INFO: write updated /var/lib/postgresql/13_restore/postgresql.auto.conf
2021-10-11 12:24:52.251 P00 INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2021-10-11 12:24:52.252 P00 INFO: restore size = 1.7GB, file total = 1539
2021-10-11 12:24:52.252 P00 INFO: restore command end: completed successfully (14748ms)
$ pg_ctl -D /var/lib/postgresql/13_restore start
Please Note : It is very important to set the flag --archive-mode=off option to avoid starting the archiver process on the new Instance as soon as it is started. The archiver process could corrupt the previous taken backups as it starts pushing the newly generated WAL segments to the same repository.
The above command will restore the backup in the location "/var/lib/postgresql/13_restore". Once the restore is done, we may change the port from 5432 or the original port to a different port, if the intention is to start the new PostgreSQL Instance in the same server as the existing.
When we observe the Postgres logs, we could notice that it is restoring the newer archives from the backup repository server.
postgres@pg1:~/log$ cat postgresql-2021-10-11.log
2021-10-11 12:32:31 UTC [38708]: [3-1] user=,db=,app=,client=LOG: starting PostgreSQL 13.4 (Ubuntu 13.4-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2021-10-11 12:32:31 UTC [38708]: [4-1] user=,db=,app=,client=LOG: listening on IPv4 address "0.0.0.0", port 6432
..
..
2021-10-11 12:32:32 UTC [38711]: [2-1] user=,db=,app=,client=LOG: starting archive recovery
2021-10-11 12:32:33 UTC [38711]: [3-1] user=,db=,app=,client=LOG: restored log file "0000000300000003000000B1" from archive
2021-10-11 12:32:33 UTC [38711]: [4-1] user=,db=,app=,client=LOG: redo starts at 3/B1000028
2021-10-11 12:32:33 UTC [38711]: [5-1] user=,db=,app=,client=LOG: consistent recovery state reached at 3/B1002388
2021-10-11 12:32:33 UTC [38708]: [6-1] user=,db=,app=,client=LOG: database system is ready to accept read only connections
..
2021-10-11 12:32:33 UTC [38711]: [6-1] user=,db=,app=,client=LOG: restored log file "0000000300000003000000B2" from archive
2021-10-11 12:32:34 UTC [38711]: [7-1] user=,db=,app=,client=LOG: restored log file "0000000300000003000000B3" from archive
2021-10-11 12:32:34 UTC [38711]: [8-1] user=,db=,app=,client=LOG: restored log file "0000000300000003000000B4" from archive
..
..
2021-10-11 12:33:16 UTC [38711]: [76-1] user=,db=,app=,client=LOG: redo done at 3/F7BEFF20
2021-10-11 12:33:16 UTC [38711]: [77-1] user=,db=,app=,client=LOG: last completed transaction was at log time 2021-10-11 10:49:02.378996+00
2021-10-11 12:33:16 UTC [38711]: [78-1] user=,db=,app=,client=LOG: restored log file "0000000300000003000000F7" from archive
2021-10-11 12:33:17 UTC [38711]: [79-1] user=,db=,app=,client=LOG: selected new timeline ID: 4
2021-10-11 12:33:17 UTC [38711]: [80-1] user=,db=,app=,client=LOG: archive recovery complete
2021-10-11 12:33:17 UTC [38721]: [5-1] user=,db=,app=,client=LOG: checkpoint starting: end-of-recovery immediate wait
2021-10-11 12:33:17 UTC [38721]: [6-1] user=,db=,app=,client=LOG: checkpoint complete: wrote 15482 buffers (94.5%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.107 s, sync=0.004 s, total=0.113 s; sync files=41, longest=0.002 s, average=0.001 s; distance=517466 kB, estimate=632964 kB
2021-10-11 12:33:17 UTC [38708]: [7-1] user=,db=,app=,client=LOG: database system is ready to accept connections
Restoring up to a specific Incremental backup
In this method, we will see how to restore to a specific incremental backup (or a restore point) from the backup repository. To get the list of available restore points, we can use the following command.
$ pgbackrest --config=/etc/pgbackrest/backrest_standby.conf --stanza=backupstanza info
Following is the output of the above command in my setup.
$ pgbackrest --config=/etc/pgbackrest/backrest_standby.conf --stanza=backupstanza info
stanza: backupstanza
status: ok
cipher: none
db (current)
wal archive min/max (13): 0000000300000002000000FD/0000000300000003000000F7
full backup: 20211011-083144F
timestamp start/stop: 2021-10-11 08:31:44 / 2021-10-11 08:31:56
wal start/stop: 0000000300000002000000FD / 000000030000000300000004
database size: 216.0MB, database backup size: 216.0MB
repo1: backup set size: 10.9MB, backup size: 10.9MB
incr backup: 20211011-083144F_20211011-094611I
timestamp start/stop: 2021-10-11 09:46:11 / 2021-10-11 09:47:12
wal start/stop: 0000000300000003000000A7 / 0000000300000003000000AE
database size: 1.8GB, database backup size: 1.7GB
repo1: backup set size: 181.2MB, backup size: 176.9MB
backup reference list: 20211011-083144F
incr backup: 20211011-083144F_20211011-101232I
timestamp start/stop: 2021-10-11 10:12:32 / 2021-10-11 10:12:50
wal start/stop: 0000000300000003000000B1 / 0000000300000003000000B1
database size: 1.7GB, database backup size: 163.3MB
repo1: backup set size: 139.4MB, backup size: 54.5MB
backup reference list: 20211011-083144F, 20211011-083144F_20211011-094611I
As you could see in the above log, this repository contains 1 Full backup and 2 Incremental backups. Now let’s try to go to the restore point “20211011-083144F_20211011-094611I” which is up to the second incremental backup. The command to perform this restore appears like following in my setup.
$ pgbackrest --type=none --set=20211011-083144F_20211011-094611I --log-level-console=info --log-level-file=info --archive-mode=off --repo1-path=/backupdir --repo1-host=backupserver --repo1-host-user=pgbackrest --stanza=backupstanza --pg1-path=/var/lib/postgresql/13_restore restore
This time, postgres need not recovery WAL segments from archive as they are part of the incremental backup.
$ cat postgresql-2021-10-11.log
2021-10-11 22:02:38 UTC [44291]: [3-1] user=,db=,app=,client=LOG: starting PostgreSQL 13.4 (Ubuntu 13.4-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2021-10-11 22:02:38 UTC [44291]: [4-1] user=,db=,app=,client=LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-10-11 22:02:38 UTC [44291]: [5-1] user=,db=,app=,client=LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-10-11 22:02:38 UTC [44293]: [1-1] user=,db=,app=,client=LOG: database system was interrupted; last known up at 2021-10-11 09:46:11 UTC
2021-10-11 22:02:39 UTC [44300]: [1-1] user=,db=,app=,client=LOG: checkpoint starting: end-of-recovery immediate wait
2021-10-11 22:02:39 UTC [44300]: [2-1] user=,db=,app=,client=LOG: checkpoint complete: wrote 14303 buffers (87.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.102 s, sync=0.002 s, total=0.107 s; sync files=12, longest=0.002 s, average=0.001 s; distance=131072 kB, estimate=131072 kB
2021-10-11 22:02:39 UTC [44291]: [7-1] user=,db=,app=,client=LOG: database system is ready to accept connections
2021-10-11 22:04:02 UTC [44347]: [1-1] user=[unknown],db=[unknown],app=[unknown],client=[local]LOG: connection received: host=[local]
Point in time recovery or PITR to a different data directory
To test PITR and ensure that it is restoring up to the desired point in time, I have inserted a few rows and also recorded the timestamps at insert time. Let us fetch the records of the table 'users' .
INSERT INTO users (firstname, lastname, gender) VALUES ('larissa', 'jordan','female');
select pg_sleep(30);
INSERT INTO users (firstname, lastname, gender) VALUES ('steve', 'jordan','male');
select * from users;
id | firstname | lastname | gender | inserted_at
----+-----------+----------+--------+-------------------------------
1 | larissa | jordan | female | 2021-10-11 08:18:24.280182-07
2 | steve | jordan | male | 2021-10-11 08:18:54.289022-07
Now let’s try to recover the database to the point "2022-10-11 08:18:30.289022-07". This way, we only see the record with id = 1 but not the one with id = 2. In order to do that we will use the following pgBackRest command.
$ pgbackrest --log-level-console=info --log-level-file=info --archive-mode=off --repo1-path=/backupdir --repo1-host=backupserver --repo1-host-user=pgbackrest --stanza=backupstanza --type=time --target="2021-10-11 08:18:30.289022-07" --pg1-path=/var/lib/postgresql/13_restore restore
The output of the above command should look like the below.
2021-10-11 22:37:55.866 P00 INFO: restore command begin 2.35: --archive-mode=off --exec-id=44848-4a043ae4 --log-level-console=info --log-level-file=info --pg1-path=/var/lib/postgresql/13_restore --repo1-host=backupserver --repo1-host-user=pgbackrest --repo1-path=/backupdir --stanza=backupstanza --target="2022-10-11 08:18:30.289022-07" --type=time
2021-10-11 22:37:56.305 P00 INFO: repo1: restore backup set 20211011-083144F_20211011-094611I
2021-10-11 22:37:56.305 P00 INFO: remap data directory to '/var/lib/postgresql/13_restore'
2021-10-11 22:38:10.362 P00 INFO: write updated /var/lib/postgresql/13_restore/postgresql.auto.conf
2021-10-11 22:38:10.364 P00 INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2021-10-11 22:38:10.364 P00 INFO: restore size = 1.8GB, file total = 1545
2021-10-11 22:38:10.365 P00 INFO: restore command end: completed successfully (14500ms)
As i have multiple Postgres instances running in the same server, i would change the "port" from "5432" to "6432" in the restored postgresql.conf file and start PostgreSQL.
$ pg_ctl -D /var/lib/postgresql/13_restore start
$ cat postgresql-2021-10-11.log
..
..
2021-10-11 22:44:04 UTC [44895]: [2-1] user=,db=,app=,client=LOG: starting point-in-time recovery to 2021-10-11 10:09:30.069741+00
2021-10-11 22:44:05 UTC [44895]: [3-1] user=,db=,app=,client=LOG: restored log file "0000000300000003000000A7" from archive
2021-10-11 22:44:05 UTC [44895]: [4-1] user=,db=,app=,client=LOG: redo starts at 3/A7000028
2021-10-11 22:44:06 UTC [44895]: [5-1] user=,db=,app=,client=LOG: restored log file "0000000300000003000000A8" from archive
..
2021-10-11 22:44:10 UTC [44895]: [12-1] user=,db=,app=,client=LOG: consistent recovery state reached at 3/AE1060A8
2021-10-11 22:44:10 UTC [44892]: [6-1] user=,db=,app=,client=LOG: database system is ready to accept read only connections
2021-10-11 22:44:11 UTC [44895]: [13-1] user=,db=,app=,client=LOG: restored log file "0000000300000003000000AF" from archive
2021-10-11 22:44:11 UTC [44895]: [14-1] user=,db=,app=,client=LOG: restored log file "0000000300000003000000B0" from archive
2021-10-11 22:44:11 UTC [44895]: [15-1] user=,db=,app=,client=LOG: recovery stopping before commit of transaction 602, time 2022-10-11 08:18:30.289022-07
2021-10-11 22:44:11 UTC [44895]: [16-1] user=,db=,app=,client=LOG: pausing at the end of recovery
2021-10-11 22:44:11 UTC [44895]: [17-1] user=,db=,app=,client=HINT: Execute pg_wal_replay_resume() to promote.
Now that the recovery is done, we can validate the data and then promote the database using select pg_wal_replay_resume();
select * from users;
id | firstname | lastname | gender | inserted_at
----+-----------+----------+--------+-------------------------------
1 | larissa | jordan | female | 2022-10-11 08:18:24.280182-07
As we have recovered the data until the specified timestamp, we can execute the pg_wal_replay_resume() command. After executing the resume command, database is open for connections.
Conclusion
We have covered the most common use cases for restoring a backup taken using pgBackRest in this article. In our future articles, we will cover some more topics on pgBackRest. This backup tool is highly advanced with numerous features that are not generally visible in other commercial backup tools. If you are looking to migrate to PostgreSQL or looking to implement a robust backup strategy or high availability in PostgreSQL, contact MigOps today.