A reliable backup tool with an ability to perform a point in time recovery (PITR) is a usual enterprise need. Postgres includes a backup tool called pg_basebackup for full backups. It is great for a database of a few hundreds of GBs in size but currently lacks the support for parallel, incremental and differential backups. This makes it slightly challenging for bigger postgres databases that are Terrabytes in size. pgBackRest is our savior in such a case. While migrating from Oracle to Postgres, users find pgBackRest as an Oracle RMAN equivalent for PostgreSQL with some more additional features found in pgBackRest only. This is because pgBackRest supports parallel, incremental and differential backups including the support for backup to Cloud. We will discuss many other features of pgBackRest that makes it the best backup tool for PostgreSQL.
Through this Article we would firstly like to thank the very active pgBackRest community who are of course contributors to PostgreSQL too. In one of our recent experiences, we created a bug about pgBackRest backups resulting into Segmentation fault when there are thousands of tablespaces involved (which is highly unusual for Postgres as it serves a different purpose when compared with Oracle).
David Steele from Crunchy Data and the Community maintaining pgBackRest were very prompt and submitted a patch for this unique feature request in no time. Adding such features may be pretty common for the Community. But, our Customers saved a lot of time and some thousands of dollars in building a backup tool that supports their unique requirement. This is an example of how Open Source and the PostgreSQL Community works. A transparent communication and discussion that makes it easier for a user to understand if the feature request sounds reasonable and if it can be targeted against a future release. A user gets the opportunity to discuss a feature, contribute to a feature or test and provide feedback on a feature. This is where Open Source always wins.
By the way, before falling into the trap of a Commercial backup tool, one should always try pgBackRest. None of the commercial backup tools can challenge the features and standards of pgBackRest today. Following is an Architecture diagram of pgBackRest and most of it will be understandable upon reading the rest of this article.
15 Advantages of using pgBackRest
For users who are new to PostgreSQL and thinking of migrating to Postgres, we would like to talk about some of the advantages in using pgBackRest as a preferred backup tool for postgres. These advantages should prove the reason why pgBackRest is the next generation backup tool that is totally unique from a several list of backup tools of different databases in the world today.
- pgBackRest is the most advanced Open Source backup tool that is also available in the PGDG (PostgreSQL Global Development Group) repository for downloads. No licensing or extra costs involved in using this tool.
- Supports Parallel backup that could stream compressed files to a local or a remote repository or to Cloud.
- Supports Incremental and Differential backups.
- Stream backups to Cloud - Supports AWS, Azure and GCS compatible object store.
- Supports encryption. The repository can be encrypted by pgBackRest to secure the backups.
- Does not need a local staging area to stream backups to a remote storage or cloud. Can directly stream backup to cloud or a remote repository without writing any files locally.
- Supports Millions or Billions of Objects in PostgreSQL. See this patch for more details.
- Highly preferred backup tool for postgres databases of Terrabytes in size.
- Simple commands for restore and recovery including point in time recovery. No manual intervention needed in pulling a certain WAL segemnts manually for a PITR.
- Serves the purpose of an Enterprise backup tool that can act as a Unified solution for maintaining backups of all the PostgreSQL databases across the Infrastructure from a single Backup server.
- Support backups from a Standby server (or read replica) that has been configured using Streaming Replication. See this article on how we can setup Streaming Replication in PostgreSQL.
- Ability to utilize the computing power of multiple standby servers in performing backups.
- Supports automatic retention of backups and archives based on the configuration.
- Supports Parallel asynchronous Archiving to improve the archiving speed.
- Supports backup of Postgres databases with thousands of tablespaces.
Steps to configure a PostgreSQL backup using pgBackRest
In this exercise, we shall see to setup a pgBackRest repository server as seen in the Architecture diagram above. We shall see how the archiving can be enabled and a full backup can be taken using pgBackRest. In this exercise, we shall see a postgres server (pg1), pgBackRest repository server (backupserver) on CentOS 7.8 Operating System.
Installing pgBackRest on the Backup Server
1. Following commands can be used to install pgBackRest on the backup server using the packages from the PGDG repository. This step (1) must be executed on the postgres server too.
$ sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm $ sudo yum install epel-release $ sudo yum install pgbackrest $ sudo pgbackrest version pgBackRest 2.33
2. Create the pgbackrest OS user. This OS user must have access to perform reads and writes to the repository directories.
$ sudo groupadd pgbackrest $ sudo useradd -r -g pgbackrest pgbackrest $ sudo mkdir -p /home/pgbackrest $ sudo chown pgbackrest:pgbackrest /home/pgbackrest
3. Make sure to also create the log directory to which the logs can be written by pgBackRest and set appropriate permissions for the same.
$ sudo mkdir -p /var/log/pgbackrest $ sudo chown -R pgbackrest:pgbackrest /var/log/pgbackrest $ sudo chmod -R 700 /var/log/pgbackrest
Installing pgBackRest on the Postgres Server
1. The installation steps are not different from how pgBackRest was installed on the backup server. The only difference is that the log directory will be owned by the postgres user in this case.
$ sudo mkdir -p /var/log/pgbackrest $ sudo chown -R postgres:postgres /var/log/pgbackrest $ sudo chmod -R 700 /var/log/pgbackrest
Setting Up the pgBackRest backup repository on the backup server
1. Create the backup directory where the database cluster specifc backups are stored.
$ sudo mkdir -p /backupdir/pg1 $ sudo chown -R pgbackrest:pgbackrest /backupdir/pg1 $ sudo chmod -R 700 /backupdir/pg1
2. Add the following settings to the pgBackRest configuration file. The configuration files defaults to /etc/pgbackrest.conf.
$ sudo cat /etc/pgbackrest.conf [global] repo1-path=/backupdir/pg1 repo1-retention-full=90 repo1-retention-full-type=time start-fast=y process-max=4 log-level-console=info log-level-file=info [pg1-server] pg1-path=/var/lib/pgsql/13/data pg1-host=pg1 pg1-port=5432
Under the [global] section, we must specify the directory on the backup server where the pgBackRest global repository can be maintained. Some more parameters that specify the retention period of backups, number of parallel processes and log levels are mentioned. More details on some more configuration parameters can found in the official documentation of pgBackRest.
We see another section [pg1-server] in the configuration file above. This includes the hostname or the IP address of the postgres server, its data directory location and the port. The section name can be renamed from [pg1-server] to any other name that can be used to uniquely identify the postgres server.
Adding the pgBackRest repository information on the postgres server
The Postgres cluster should know the pgBackRest repository to which it can stream the archives to. The same repository information can also be used to restore the backup or perform a point in time recovery. For this purpose, we need to create a pgBackRest configuration file and add some of the details to this file on the postgres server.
$ sudo vi /etc/pgbackrest.conf [global] repo1-path=/backupdir/pg1 repo1-host=backupserver [pg1-server] pg1-path=/var/lib/pgsql/13/data pg1-port=5432
Enable trusted authentication between the pgBackRest repository server and the PostgreSQL server
For the purpose of archiving, the postgres server must be able to stream archives to the backupserver. This is only possible with a trusted authentication between both the servers. Similarly, the backup server requires a trusted authentication to stream the backup from the postgres server. Following steps can be performed to enable the trusted authentication between the Postgres server and the pgBackRest backup server using SSH keys.
Step 1 : Generate ssh keys on the backup server and also the Postgres server.
pgbackrest@backupserver:~$ ssh-keygen -t rsa -b 2048 Generating public/private rsa key pair. .... .... postgres@pg1:~$ ssh-keygen -t rsa -b 2048 Generating public/private rsa key pair. .... ....
Step 2 : Copy the public key of pgbackrest server to the postgres server and from the postgres server to the pgbackrest server.
pgbackrest@backupserver:~$ ssh-copy-id postgres@pg1 postgres@pg1:~$ ssh-copy-id pgbackrest@backupserver
Step 3 : Validate the trusted authentication from both the servers.
pgbackrest@backupserver:~$ ssh postgres@pg1 'echo "Connected to $(hostname) successfully"' Connected to pg1 successfully postgres@pg1:~$ ssh pgbackrest@backupserver 'echo "Connected to $(hostname) successfully"' Connected to backupserver successfully
Configure the mandatory parameters on the PostgreSQL server
Following are the mandatory parameters to be configured on the postgres server to ensure that pgBackRest works successfully.
archive_mode : This parameter must be set to 'ON' to ensure that the archiving is enabled. pgBackRest would not work if archiving is not enabled.
$ psql -c "ALTER SYSTEM SET archive_mode TO 'ON'"
archive_command : The archive_command must be set to the following command. This would be handled by pgBackRest.
$ psql -c "ALTER SYSTEM SET archive_command TO 'pgbackrest --stanza=pg1-server archive-push %p'"
Restart of the PostgreSQL server may be if required if the archive_mode must be modified. Else, a reload is sufficient.
Following is the command to perform the reload or a SIGHUP.
$ pg_ctl -D /var/lib/pgsql/13/data reload
Following is the command to perform the restart, if required.
$ pg_ctl -D /var/lib/pgsql/13/data restart -mf
Creating a Stanza on the backup server
In order to maintain a repository for a specific postgres server, a stanza dedicated to that postgres server must be created. The stanza is same as the section name used in the pgbackrest.conf file for the postgres server. In this case, it is [pg1-server]. See the configuration file above.
1. Following command can be used to create the stanza on the backup server.
$ pgbackrest --stanza=pg1-server stanza-create
Output of the above command in this exercise.
pgbackrest@backupserver:~$ pgbackrest --stanza=pg1-server stanza-create 2021-04-09 20:29:42.939 P00 INFO: stanza-create command begin 2.33: --exec-id=23199-1c75331c --log-level-console=info --log-level-file=off --pg1-host=pg1 --pg1-path=/var/lib/pgsql/13/data --pg1-port=5432 --repo1-path=/backupdir/pg1 --stanza=pg1-server 2021-04-09 20:29:43.767 P00 INFO: stanza-create for stanza 'pg1-server' on repo1 2021-04-09 20:29:43.879 P00 INFO: stanza-create command end: completed successfully (941ms)
2. Perform validation using the check command. This would force the postgres server to switch a WAL segment and archive that WAL segment using the archive_command.
$ pgbackrest --stanza=pg1-server check
Output looks like following.
pgbackrest@backupserver:~$ pgbackrest --stanza=pg1-server check 2021-04-09 20:37:16.670 P00 INFO: check command begin 2.33: --exec-id=23252-187c27de --log-level-console=info --log-level-file=info --pg1-host=pg1 --pg1-path=/var/lib/pgsql/13/data --pg1-port=5432 --repo1-path=/backupdir/pg1 --stanza=pg1-server 2021-04-09 20:37:17.504 P00 INFO: check repo1 configuration (primary) 2021-04-09 20:37:17.712 P00 INFO: check repo1 archive for WAL (primary) 2021-04-09 20:37:19.743 P00 INFO: WAL segment 000000010000000000000004 successfully archived to '/backupdir/pg1/archive/pg1-server/13-1/0000000100000000/000000010000000000000004-810ca805a7b4f63e4efdebd3aa383edbfff15e8a.gz' on repo1 2021-04-09 20:37:19.849 P00 INFO: check command end: completed successfully (3180ms)
Taking a full backup of the remote PostgreSQL database using pgBackRest
1. Following command can be used on the backup server to take remote backup using 4 parallel processes.
$ pgbackrest --stanza=pg1-server backup --start-fast --compress --process-max=4
--start-fast : Triggers a fast checkpoint instead of waiting until the next timed checkpoint. This has been set in the pgBackRest configuration above. However, this is only being mentioned for the purpose of simplicity.
--compress : Enables compressed backups. Compression is on by default.
--process-max : Number of concurrent processes using which the backup can be initiated for a faster backup.
Output of the above backup command appears like following.
pgbackrest@backupserver:~$ pgbackrest --stanza=pg1-server backup --start-fast --compress --process-max=4 2021-04-09 20:43:30.143 P00 INFO: backup command begin 2.33: --compress --exec-id=23305-9bf6b99e --log-level-console=info --log-level-file=info --pg1-host=pg1 --pg1-path=/var/lib/pgsql/13/data --pg1-port=5432 --process-max=4 --repo1-path=/backupdir/pg1 --repo1-retention-full=90 --repo1-retention-full-type=time --stanza=pg1-server --start-fast WARN: no prior backup exists, incr backup has been changed to full 2021-04-09 20:43:31.087 P00 INFO: execute non-exclusive pg_start_backup(): backup begins after the requested immediate checkpoint completes 2021-04-09 20:43:31.504 P00 INFO: backup start archive = 000000010000000000000006, lsn = 0/6000028 2021-04-09 20:43:32.646 P03 INFO: backup file pg1:/var/lib/pgsql/13/data/base/1/1255 (648KB, 2%) checksum 99f750789c99416b8b47bcbfbc05e070eab32e89 2021-04-09 20:43:32.712 P02 INFO: backup file pg1:/var/lib/pgsql/13/data/base/13318/1255 (648KB, 5%) checksum 99f750789c99416b8b47bcbfbc05e070eab32e89 ... ... ... 2021-04-09 20:43:35.514 P01 INFO: backup file pg1:/var/lib/pgsql/13/data/base/1/13176 (0B, 100%) 2021-04-09 20:43:35.514 P04 INFO: backup file pg1:/var/lib/pgsql/13/data/base/1/13171 (0B, 100%) 2021-04-09 20:43:35.516 P00 INFO: full backup size = 23.1MB 2021-04-09 20:43:35.516 P00 INFO: execute non-exclusive pg_stop_backup() and wait for all WAL segments to archive 2021-04-09 20:43:35.727 P00 INFO: backup stop archive = 000000010000000000000006, lsn = 0/6000138 2021-04-09 20:43:35.740 P00 INFO: check archive for segment(s) 000000010000000000000006:000000010000000000000006 2021-04-09 20:43:36.072 P00 INFO: new backup label = 20210409-204330F 2021-04-09 20:43:36.204 P00 INFO: backup command end: completed successfully (6063ms) 2021-04-09 20:43:36.205 P00 INFO: expire command begin 2.33: --exec-id=23305-9bf6b99e --log-level-console=info --log-level-file=info --repo1-path=/backupdir/pg1 --repo1-retention-full=90 --repo1-retention-full-type=time --stanza=pg1-server 2021-04-09 20:43:36.216 P00 INFO: repo1: time-based archive retention not met - archive logs will not be expired 2021-04-09 20:43:36.216 P00 INFO: expire command end: completed successfully (11ms)
2. Validation of the backup using the info command.
$ pgbackrest --stanza=pg1-server info
Output looks like following.
pgbackrest@backupserver:~$ pgbackrest --stanza=pg1-server info stanza: pg1-server status: ok cipher: none db (current) wal archive min/max (13): 000000010000000000000001/000000010000000000000006 full backup: 20210409-204330F timestamp start/stop: 2021-04-09 20:43:30 / 2021-04-09 20:43:35 wal start/stop: 000000010000000000000006 / 000000010000000000000006 database size: 23.1MB, database backup size: 23.1MB repo1: backup set size: 2.8MB, backup size: 2.8MB
Conclusion
In this article, we have seen the advantages of using pgBackRest and how it can be configured as a Unified backup solution for an Infrastructure. In our future articles, we shall discuss more about the restore and point in time recovery using pgBackRest. Please Contact Us to know more about the robust Open Source solutions available for PostgreSQL and how we can help you migrate from Oracle and other commercial databases to PostgreSQL.
To stay updated on the latest news on PostgreSQL and detailed articles on PostgreSQL and Migrations to PostgreSQL, please subscribe to our Newsletters.
This is really very nice document. It is easy to use.
In pg_basebackup after the restoration is completed it will set up streaming replication , will the pgBackRest does the same ..
Very nice to see how the community is working in find out and fixing even very edge case problems.
No doubt pgBackRest is the most reliable backup solution for PostgreSQL.
Very good post.