PostgreSQL 17 introduces native support for incremental backups, a new feature designed to optimize backup strategies for modern, high-performance, and high-availability environments. This new feature addresses the limitations of traditional backup approaches by enabling efficient storage use, faster backup times, and simplified recovery processes.
Database administrators have long relied on traditional full backups and WAL archiving to ensure data safety and recoverability in PostgreSQL. While this method is robust, it comes with significant challenges:
- Storage Demands: Full backups of large databases consume massive amounts of storage, often requiring dedicated hardware.
- Lengthy Backup Times: Taking a full backup involves scanning the entire database, resulting in extended backup windows and potential performance degradation.
- Complex Recovery: Restoring from full backups often involves replaying a large number of WAL files, making the process time-consuming and prone to errors.
These limitations have made traditional backup strategies less than ideal for modern, high-availability, and high-performance environments.
PostgreSQL 17 significantly improves database backup and recovery with native incremental backup support. Incremental backups store only the changes made since the last backup (full or incremental). This innovation significantly reduces storage usage, backup time, and recovery complexity.
Key benefits of incremental backups:
- Efficient Storage: Only modified data pages are stored, dramatically reducing backup sizes.
- Faster Backups: Capturing just the changes minimizes the time needed for each backup.
- Streamlined Recovery: Incremental backups simplify the restoration process by reducing dependency on extensive WAL replay.
Before performing incremental backups, ensure the following:
-
Enable Write-Ahead Logging (WAL): Incremental backups rely on WAL to track changes. Update the
postgresql.conf
file:
wal_level = replica
archive_mode = on
archive_command = 'cp %p /archive/%f'
Restart the database to apply these changes:
/usr/pgsql-17/bin/pg_ctl -D /var/lib/pgsql/17/data restart
- Enable WAL Summarization: The WAL summarizer process generates summaries for incremental backups. Enable it:
ALTER SYSTEM SET summarize_wal = 'on';
SELECT pg_reload_conf();
- Prepare Storage: Ensure sufficient storage is available for both base and incremental backups, as well as WAL files.
Incremental backups follow a simple workflow as described below.
The process begins with a full backup, which serves as the foundation for subsequent incremental backups. For instance, you might take a full backup every Sunday:
$ /usr/bin/pg_basebackup -D /var/lib/pgsql/17/backups/full -X stream -P -c fast
...
23189/23189 kB (100%), 1/1 tablespace
This full backup contains all the data in your database as of Sunday.
Incremental backups store only the changes made since the last backup.
- Monday Backup:
Starting Monday, incremental backups capture only the changes since the Sunday full backup. Let's create a table and insert some data before taking the incremental backup:
postgres=# CREATE TABLE increment_test (id INT, name VARCHAR);
CREATE TABLE
postgres=# INSERT INTO increment_test VALUES (generate_series(1,10), 'monday_data');
INSERT 0 10
Now, take the incremental backup:
$ /usr/bin/pg_basebackup -D /var/lib/pgsql/17/backups/inc_mon --incremental /var/lib/pgsql/17/backups/full/backup_manifest -c fast -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/28000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_21484"
pg_basebackup: write-ahead log end point: 0/28000120
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
Compare the sizes of the full and incremental backups:
$ du -sh /var/lib/pgsql/17/backups/full
39M /var/lib/pgsql/17/backups/full
$ du -sh /var/lib/pgsql/17/backups/inc_mon
22M /var/lib/pgsql/17/backups/inc_mon
- Tuesday Backup:
Let's insert additional data for Tuesday before taking another incremental backup:
postgres=# INSERT INTO increment_test VALUES (generate_series(11,20), 'tuesday_data');
INSERT 0 10
Take the incremental backup for Tuesday:
$ /usr/bin/pg_basebackup -D /var/lib/pgsql/17/backups/inc_tue --incremental /var/lib/pgsql/17/backups/inc_mon/backup_manifest -c fast -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2A000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_21616"
pg_basebackup: write-ahead log end point: 0/2A000120
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
$ du -sh /var/lib/pgsql/17/backups/inc_tue
22M /var/lib/pgsql/17/backups/inc_tue
Similarly, the Tuesday backup captures changes made since Monday. Make sure to use the backup_manifest
file of Monday's backup.
Restoration is where the true power of incremental backups shines.
Imagine data is accidentally deleted on Wednesday. You can restore the database to its Tuesday state using the full and incremental backups.
postgres=# INSERT INTO increment_test VALUES (generate_series(21,30), 'wednesday_data');
INSERT 0 10
postgres=# DELETE FROM increment_test WHERE name = 'tuesday_data';
DELETE 10
postgres=# SELECT * FROM increment_test ;
id | name
----+----------------
1 | monday_data
2 | monday_data
3 | monday_data
4 | monday_data
5 | monday_data
6 | monday_data
7 | monday_data
8 | monday_data
9 | monday_data
10 | monday_data
21 | wednesday_data
22 | wednesday_data
23 | wednesday_data
24 | wednesday_data
25 | wednesday_data
26 | wednesday_data
27 | wednesday_data
28 | wednesday_data
29 | wednesday_data
30 | wednesday_data
(20 rows)
-
Combine Backups: Use the new
pg_combinebackup
utility to merge backups into a new data directory:
pg_combinebackup -o /var/lib/pgsql/17/restore_data /var/lib/pgsql/17/backups/full /var/lib/pgsql/17/backups/inc_mon /var/lib/pgsql/17/backups/inc_tue
-
Configure the New Cluster: Change the port in the restored cluster's
postgresql.conf
file and start the new cluster:
$ cd /var/lib/pgsql/17/restore_data
$ ls
backup_label global pg_dynshmem pg_multixact pg_snapshots pg_tblspc pg_xact
backup_manifest log pg_hba.conf pg_notify pg_stat pg_twophase postgresql.auto.conf
base logfile pg_ident.conf pg_replslot pg_stat_tmp PG_VERSION postgresql.conf
current_logfiles pg_commit_ts pg_logical pg_serial pg_subtrans pg_wal
$ echo "port=8888" >> postgresql.conf
$ /usr/pgsql-17/bin/pg_ctl -D /var/lib/pgsql/17/restore_data start
waiting for server to start....2024-11-28 00:36:14.791 UTC [23048] LOG: redirecting log output to logging collector process
2024-11-28 00:36:14.791 UTC [23048] HINT: Future log output will appear in directory "log".
done
server started
- Verify Restored Data: Check the database to ensure the Tuesday data is intact:
$ psql -p 8888
psql (17.2)
Type "help" for help.
postgres=# SELECT * FROM increment_test WHERE name='tuesday_data';
id | name
----+--------------
11 | tuesday_data
12 | tuesday_data
13 | tuesday_data
14 | tuesday_data
15 | tuesday_data
16 | tuesday_data
17 | tuesday_data
18 | tuesday_data
19 | tuesday_data
20 | tuesday_data
(10 rows)
From here we can copy Tuesday data from the restored instance back into the production instance. The example is simple because there is only 1 table affected, but in some complex cases the number of tables affected and exact rows to be recovered is not known. In such cases, you can use EDB LiveCompare to compare and restore the deleted data.
PostgreSQL 17's incremental backup feature is a powerful addition that addresses the challenges of traditional backup strategies. By capturing only changes since the last backup, incremental backups reduce storage requirements, backup time, and recovery complexity. Adopting this feature will enhance the resilience and efficiency of your PostgreSQL environment, ensuring minimal downtime and robust data protection.