Incremental Backups in PostgreSQL 17

Afrah Razzak
Afrah Razzak

PostgreSQL 17 introduces native support for incremental backups, a new feature designed to optimize backup strategies for modern, high-performance, and high-availability environments. This new feature addresses the limitations of traditional backup approaches by enabling efficient storage use, faster backup times, and simplified recovery processes.

The Challenge with Traditional Backups

Database administrators have long relied on traditional full backups and WAL archiving to ensure data safety and recoverability in PostgreSQL. While this method is robust, it comes with significant challenges:

  • Storage Demands: Full backups of large databases consume massive amounts of storage, often requiring dedicated hardware.
  • Lengthy Backup Times: Taking a full backup involves scanning the entire database, resulting in extended backup windows and potential performance degradation.
  • Complex Recovery: Restoring from full backups often involves replaying a large number of WAL files, making the process time-consuming and prone to errors.

These limitations have made traditional backup strategies less than ideal for modern, high-availability, and high-performance environments.

Introducing Incremental Backups in PostgreSQL 17

PostgreSQL 17 significantly improves database backup and recovery with native incremental backup support. Incremental backups store only the changes made since the last backup (full or incremental). This innovation significantly reduces storage usage, backup time, and recovery complexity.

Key benefits of incremental backups:

  • Efficient Storage: Only modified data pages are stored, dramatically reducing backup sizes.
  • Faster Backups: Capturing just the changes minimizes the time needed for each backup.
  • Streamlined Recovery: Incremental backups simplify the restoration process by reducing dependency on extensive WAL replay.

Pre-requisites to Setting Up Incremental Backups

Before performing incremental backups, ensure the following:

  1. Enable Write-Ahead Logging (WAL): Incremental backups rely on WAL to track changes. Update the postgresql.conf file:
wal_level = replica
archive_mode = on
archive_command = 'cp %p /archive/%f'

Restart the database to apply these changes:

/usr/pgsql-17/bin/pg_ctl -D /var/lib/pgsql/17/data restart
  1. Enable WAL Summarization: The WAL summarizer process generates summaries for incremental backups. Enable it:
ALTER SYSTEM SET summarize_wal = 'on';
SELECT pg_reload_conf();
  1. Prepare Storage: Ensure sufficient storage is available for both base and incremental backups, as well as WAL files.

Performing Backups

Incremental backups follow a simple workflow as described below.

Full Backup: Establishing the Baseline

The process begins with a full backup, which serves as the foundation for subsequent incremental backups. For instance, you might take a full backup every Sunday:

$ /usr/bin/pg_basebackup -D /var/lib/pgsql/17/backups/full -X stream -P -c fast
...
23189/23189 kB (100%), 1/1 tablespace

This full backup contains all the data in your database as of Sunday.

Incremental Backups: Capturing Daily Changes

Incremental backups store only the changes made since the last backup.

  1. Monday Backup:

Starting Monday, incremental backups capture only the changes since the Sunday full backup. Let's create a table and insert some data before taking the incremental backup:

postgres=# CREATE TABLE increment_test (id INT, name VARCHAR);
CREATE TABLE

postgres=# INSERT INTO increment_test VALUES (generate_series(1,10), 'monday_data');
INSERT 0 10

Now, take the incremental backup:

$ /usr/bin/pg_basebackup -D /var/lib/pgsql/17/backups/inc_mon --incremental /var/lib/pgsql/17/backups/full/backup_manifest -c fast -v

pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/28000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_21484"
pg_basebackup: write-ahead log end point: 0/28000120
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed

Compare the sizes of the full and incremental backups:

$ du -sh /var/lib/pgsql/17/backups/full
39M /var/lib/pgsql/17/backups/full
$ du -sh /var/lib/pgsql/17/backups/inc_mon
22M /var/lib/pgsql/17/backups/inc_mon
  1. Tuesday Backup:

Let's insert additional data for Tuesday before taking another incremental backup:

postgres=# INSERT INTO increment_test VALUES (generate_series(11,20), 'tuesday_data');
INSERT 0 10

Take the incremental backup for Tuesday:

$ /usr/bin/pg_basebackup -D /var/lib/pgsql/17/backups/inc_tue --incremental /var/lib/pgsql/17/backups/inc_mon/backup_manifest -c fast -v

pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2A000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_21616"
pg_basebackup: write-ahead log end point: 0/2A000120
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed

$ du -sh /var/lib/pgsql/17/backups/inc_tue
22M /var/lib/pgsql/17/backups/inc_tue

Similarly, the Tuesday backup captures changes made since Monday. Make sure to use the backup_manifest file of Monday's backup.

Restoring from Incremental Backups

Restoration is where the true power of incremental backups shines.

Imagine data is accidentally deleted on Wednesday. You can restore the database to its Tuesday state using the full and incremental backups.

postgres=# INSERT INTO increment_test VALUES (generate_series(21,30), 'wednesday_data');
INSERT 0 10
postgres=# DELETE FROM increment_test WHERE name = 'tuesday_data';
DELETE 10
postgres=# SELECT * FROM increment_test ;
  id |      name
----+----------------
  1 | monday_data
  2 | monday_data
  3 | monday_data
  4 | monday_data
  5 | monday_data
  6 | monday_data
  7 | monday_data
  8 | monday_data
  9 | monday_data
 10 | monday_data
 21 | wednesday_data
 22 | wednesday_data
 23 | wednesday_data
 24 | wednesday_data
 25 | wednesday_data
 26 | wednesday_data
 27 | wednesday_data
 28 | wednesday_data
 29 | wednesday_data
 30 | wednesday_data
(20 rows)
  1. Combine Backups: Use the new pg_combinebackup utility to merge backups into a new data directory:
pg_combinebackup -o /var/lib/pgsql/17/restore_data /var/lib/pgsql/17/backups/full /var/lib/pgsql/17/backups/inc_mon /var/lib/pgsql/17/backups/inc_tue 
  1. Configure the New Cluster: Change the port in the restored cluster's postgresql.conf file and start the new cluster:
$ cd /var/lib/pgsql/17/restore_data

$ ls
backup_label      global        pg_dynshmem    pg_multixact  pg_snapshots  pg_tblspc    pg_xact
backup_manifest   log           pg_hba.conf    pg_notify     pg_stat       pg_twophase  postgresql.auto.conf
base              logfile       pg_ident.conf  pg_replslot   pg_stat_tmp   PG_VERSION   postgresql.conf
current_logfiles  pg_commit_ts  pg_logical     pg_serial     pg_subtrans   pg_wal

$ echo "port=8888" >> postgresql.conf

$ /usr/pgsql-17/bin/pg_ctl -D /var/lib/pgsql/17/restore_data start
waiting for server to start....2024-11-28 00:36:14.791 UTC [23048] LOG:  redirecting log output to logging collector process
2024-11-28 00:36:14.791 UTC [23048] HINT:  Future log output will appear in directory "log".
 done
server started
  1. Verify Restored Data: Check the database to ensure the Tuesday data is intact:
$ psql -p 8888
psql (17.2)
Type "help" for help.

postgres=# SELECT * FROM increment_test WHERE name='tuesday_data';
 id |     name
----+--------------
 11 | tuesday_data
 12 | tuesday_data
 13 | tuesday_data
 14 | tuesday_data
 15 | tuesday_data
 16 | tuesday_data
 17 | tuesday_data
 18 | tuesday_data
 19 | tuesday_data
 20 | tuesday_data
(10 rows)

From here we can copy Tuesday data from the restored instance back into the production instance. The example is simple because there is only 1 table affected, but in some complex cases the number of tables affected and exact rows to be recovered is not known. In such cases, you can use EDB LiveCompare to compare and restore the deleted data.

Conclusion

PostgreSQL 17's incremental backup feature is a powerful addition that addresses the challenges of traditional backup strategies. By capturing only changes since the last backup, incremental backups reduce storage requirements, backup time, and recovery complexity. Adopting this feature will enhance the resilience and efficiency of your PostgreSQL environment, ensuring minimal downtime and robust data protection.

Was this article helpful?

0 out of 0 found this helpful