-
The core premise of Postgres DR/ PITR scenarios is combining physical base backups with WAL archiving.
-
When archiving WAL logs for potential DR/ PITR scenarios, there are many solutions available, which we will discuss in this article.
-
We highly recommend using a supported tool for a comprehensive physical backup solution. We recommend Barman or pgBackRest for their reliability and features.
-
While the core behavior of base backups and WAL archives is achieved in both Barman & pgBackRest, they each have various additional functionalities, discussed in later sections.
-
See our comparison of backup method features.
| Solution | WAL Archiving Durability | Complexity Level | Method |
| 1) Barman backup tool | High | Medium | Push and/or Pull |
| 2) pgBackRest backup tool | High | Medium | Push |
| 3) Native `archive_library` (pg15+) | High | Medium | Push |
| 4) Native `pg_receivewal`, replication slot, and `systemd` scheduler | High | High | Pull |
| 5) Native archive_command using `rsync` | Medium | Easy | Push |
| 6) Native archive_command using `cp` | Low | Easy | Push |
| 7) Native archive_command using `scp` | Low | Easy | Push |
-
Solution
column lists the different solutions that can be used for WAL archiving. -
WAL archiving durability
column describes how durable or safe the procedure of archiving the WALs is. Factors affecting this which are elaborated in the article include whether the WAL files arefsync
d to disk, and how networking issues are dealt with. -
Complexity level
describes how difficult the solution is to set up, and maintain a production system on. Even if a method is safer theoretically, if it is too complex for the needs of the user in practice and performed incorrectly, it can become unsafe. -
Method
column describes the broad principle of how the WAL logs are archived. -
Push
method means that postgres is sending WAL files to a location after it has created them. -
Pull
method means that postgres replication is being used to retrieve the WAL data from postgres when it creates it, and store it in a location.
Note - all commands mentioned in this article can have different nuances on different OSs, and should be tested in your testing environment thoroughly before applying to production.
Barman is one of our recommended physical backup tools. Its barman cron
function is very effective at managing the backup processes.
Barman includes lots of features such as:
- Incremental and Full Base backup creation.
- WAL archiving through archive_command and barman-wal-archive.
- Streaming WALs through pg_receivewal.
- Backups restoration.
- Retention policies.
- Parallel jobs.
- Geographical redundancy.
- Cloud snapshots.
Barman usage is documented on its website.
- Barman improves PostgreSQL's
archive_command
parameter through its barman-wal-archive command. -
barman-wal-archive
guarantees that the WAL file content isfsync
ed to disk. This avoids production issues created by the fragility ofrsync
,cp
orscp
. - Additionally,
barman-wal-archive
reduces the risk of copying a WAL file in to the wrong directory on the Barman host, as the only parameter to be used in thearchive_command
is the server's ID.
Example command:
archive_command = 'barman-wal-archive barmanserver db01 %p'
-
barmanserver
is the Barman hostname -
db01
is the database server db01 being backed up, specified as[db01]
in a barman configuration file, such as/etc/barman.d/db01.conf
. -
%p
here represents the pathname of each WAL file that postgres will create ($PGDATA/pg_wal/wal_file_name
). For each file created, Postgres re-runs the command with the new file name.
Archiving to multiple servers:
-
Some use cases involve archiving to multiple locations. Barman georedundancy should be used to achieve this.
-
With this, multiple Barman servers can be used, where Barman servers can use another Barman's backups as the primary data source.
-
Do not combine
rsync
/cp
/scp
withbarman-wal-archive
!
We have seen customers experience production issues due to archive_command
s such as:
#DO NOT DO THIS!
archive_command = test ! -f /archive_location/%f && cp %p /archive_location/%f && barman-wal-archive barmanserver db01 %p
The use of rsync
/cp
/scp
negates the durability that barman-wal-archive
provides.
In this particular case, if the cp
process fails due to the lack of durability, barman-wal-archive
is also prevented from running due to use of &&
.
Although in this example &&
could be replaced with ;
this is still inferior to Barman cascading backups, as the first archival would be using cp
and would not be as reliable.
Barman also supports streaming WAL archives through pg_receivewal
, which minimises the RPO. pg_receivewal
is discussed further in later section pg_receivewal background
.
pgBackRest is also one of our recommended physical backup tools. pgBackRest also uses file and directory level fsync to ensure durability.
pgBackRest includes lots of features, such as:
- Full/base backup, differential & incremental backups creation
- WAL archiving through archive_command and the
pgbackrest archive-push
command - Backups restoration
- Retention policies
- Parallel backup & restore
- Page checksums validation during backups
- Backup repository encryption
- Various compression types supported
- Parallel and asynchronous archiving
- Resuming a failed backup
- S3, Azure, and GCS compatible object store support
- Using multiple repositories at the same time
pgBackRest usage is documented on its website and in EDB docs.
If you are using this solution, you should use pgbackRest archive-push in your archive_command
.
Example command:
archive_command = 'pgbackrest --stanza=demo archive-push %p'
This is demonstrated in the quickstart page.
-
If you are on PG15+, and are unable to use Barman or pgBackRest, we recommend use of native archive_library and basic_archive.
-
This new postgres functionality fsyncs the WAL file, ensuring WAL archiving durability.
-
Additionally, the server will avoid recycling or removing these WAL files until the module indicates that the files were successfully archived.
-
Please note that after server crashes, temporary
archtemp
files should be deleted before starting the server. Please see the documentation notes.
Example setup (EPAS 15, but applies to Postgres 15):
edb=# show shared_preload_libraries;
shared_preload_libraries
$libdir/dbms_pipe,$libdir/edb_gen,$libdir/dbms_aq,$libdir/basic_archive
edb=# alter system set shared_preload_libraries='$libdir/dbms_pipe','$libdir/edb_gen','$libdir/dbms_aq','$libdir/basic_archive';
$ /usr/local/enterprisedb/bin/pg_ctl -D /var/lib/edb/as15 -l /var/lib/edb/as15/log/logfile restart
edb=# show archive_library;
edb=# show basic_archive.archive_directory
edb=# alter system set archive_library = 'basic_archive';
edb=# alter system set basic_archive.archive_directory = '/var/lib/edb/archive_test';
edb=# SELECT pg_reload_conf();
edb=# show archive_library;
basic_archive
edb=# show basic_archive.archive_directory
/var/lib/edb/archive_test
edb=# create table test( i int);
edb=# insert into test generate_series(1,1000);
edb=# select pg_switch_wal();
edb=# insert into test generate_series(1,1000);
edb=# select pg_switch_wal();
edb=# select * from pg_stat_archiver;
archived_count|last_archived_wal|last_archived_time| failed_count|last_failed_wal|last_failed_time|stats_reset
23|000000010000000000000019|09-FEB-23 17:26:38.722981 +00:00|0 ||| 07-FEB-23 16:28:15.290586 +00:00
$ ls -l /var/lib/edb/archive_test
-rw_______ 1 enterprisedb enterprisedb 16777216 Feb 9 17:26 000000010000000000000018
-rw_______ 1 enterprisedb enterprisedb 16777216 Feb 9 17:26 000000010000000000000019
$ ps waux | grep archive
postgres: archiver last was 000000010000000000000019
-
If you are on PG14 or older, and you have some constraint of being unable to use Barman or pgBackRest,
archive_command
can be replaced by the native postgres tool pg_receivewal which streams the WAL logs using replication to an archive location. -
pg_receivewal
WAL archiving is more durable than use ofrsync
,cp
, orscp
withinarchive_command
- When writing WALs to
pg_wal
, and sending via replication, postgres uses fsync to ensure that the updates are physically written to disk.
- When writing WALs to
-
Due to this,
pg_receivewal
lowers the RPO, sincepg_receivewal
streams the write-ahead log in real time instead of waiting for an entire WAL file to be populated. -
Synchronous archiving can be used using
pg_receivewal --synchronous
,synchronous_standby_names
, andsynchronous_commit
, to ensure each transaction is archived and have the RPO at 0, but this creates deteriorated performance. As noted by the documentation, this should not be done if you are runningsynchronous_commit=remote_apply
.
-
Due to needing an external scheduling system to maintain it, this is more complicated than using Barman, and would require a professional services engagement to evaluate whether this solution fits your use case, fully test and implement the solution. This is effectively replacing some of the functionality that
barman cron
offers. -
Examples of maintenance includes restarting the program when it is offline (e.g if postgres is restarted,
pg_receivewal
is killed, the archive server host is restarted). Therefore, a custom process e.g asystemd
process needs to be written and used. -
If using
pg_receivewal
as your only WAL archiving solution, it is highly recommended to use a physical replication slot, otherwise WAL data can be lost. Due to the replication slot, if thepg_receivewal
process stops (e.g networking issues, the program failing, the archive OS being down), the replication slot will be inactive, the primary will not recycle the WALs until the process resumes (assumingmax_slot_wal_keep_size=1
). -
However, if the primary continually holds the WAL logs, the server will eventually reach 100% disk usage and take the primary down. So in the event of a longer unavailability. A judgement call needs to be made, what is more important:
- reduce chance of backup data being lost but have the risk of the primary crashing with no disk space.
- or
- maintain the primary availability, but increase the risk losing WAL segments (with impact in backups and standbys).
-
Monitoring of the archiving is achieved through:
- Database replication -
select * from pg_stat_replication
,select * from pg_replication_slots
. - The scheduling process -
systemctl status systemd_name
.
- Database replication -
If you can not use a backup tool, an archive_command
using rsync can be used.
-
rsync
first creates a temporary copy of the file and then, only if the copy is successful, it renames it as the target filename. -
In the future, when customer environments widely support rsync 3.2.4+, we will discuss use of rsync --fsync.
Example command:
archive_command = 'test ! -f enter_backup_location/%f && rsync -a %p enter_backup_location/%f'
-
Here,
-
-a
runs rsync in archive mode (recursive, and preserving all but hard links). -
%p
here represents the pathname of each WAL file that postgres will create ($PGDATA/pg_wal/wal_file_name
). For each file created, Postgres re-runs the command with the new file name. -
%f
is a postgres parameter representing the name of the WAL segment file. -
test ! -f enter_backup_location location/%f
checks that the filename does not already exist, before writing. If existing, the WAL file will not copy and the command will fail to prevent potential loss of data.
-
-
test ! -f
is implemented in case a problem has occurred, and file with the same name already exists, and needs to be renamed with manual intervention, as the previous file may contain data that is needed.- At this point, postgres will retain it's WALs, until the archive location is fixed.
- If this
test
is not implemented, in this situation, the previous file would be overwritten.
-
rsync flag
--ignore-existing
, should not be used, as if a file with the same name already existed, Postgres will skip archival (even if the file has different contents), as the Linux exit status is 0. This would result in silently losing the backup data.
-
cp
is prone to networking issues. -
Additionally,
cp
is notENOSPC
safe.- If the disk becomes full, and postmaster hits
ENOSPC
, even after freeing disk space, thecp
process will become stuck, leaving a partial file, requiring manual removal.rsync
prevents this issue.
- If the disk becomes full, and postmaster hits
-
rsync
is recommended overcp
as it is more robust during networking issues, and prevents theENOSPC
issue. Therefore, it is less likely to hit a WAL archiving issue. -
If using
cp
inarchive_command
, changing torsync
is an easy change to test in your test environment, with lots of benefits.
scp
is also not recommended due to being inferior to barman-wal-archive
and rsync
, but also becoming deprecated.
- If using
scp
inarchive_command
, changing torsync
is an easy change to test in your test environment, with lots of benefits.
Related to