Write Ahead Logs (WAL) are used by Postgres to achieve durability and consistency while writing data to storage system. So the retaining WAL is important in Postgres.
As a DBA we will likely see the following situations:
-
Archive destination having a high number of wal files.
-
pg_wal destination is filling fast due to archive failure.
-
Required WAL files are not preserved.
pg_archivecleanup
is a Postgres utility that comes with Postgres binaries. It is designed to be used to clean up WAL files in archived locations and can be executed in two different ways:
-
It can be manually invoked.
-
It can also be used in standby side for cleaning up WAL files from an archived location which are no longer required for Standby, using
archive_cleanup_command
.
Please do not use pg_archivecleanup
on WAL files within $PGDATA/pg_wal of postgres servers!
If a postgres server is full due to WAL files, please contact EDB support. The corrective actions will be decided depending on the scenario.
When used as a standalone program all WAL files logically preceding the oldestkeptwalfile
will be removed from the archive location. In this mode, if you specify a .partial
or .backup
file name, then only the file prefix will be used as the oldestkeptwalfile
. This treatment of the .backup
file name allows you to remove all WAL files archived prior to a specific base backup without error. For example, the following example will remove all files older than WAL file name 000000010000003700000010
:
pg_archivecleanup -d archive 000000010000003700000010.00000020.backup
pg_archivecleanup: keep WAL file "archive/000000010000003700000010" and later
pg_archivecleanup: removing file "archive/00000001000000370000000F"
pg_archivecleanup: removing file "archive/00000001000000370000000E"
Various options can be used with pg_archivecleanup,
Dryrun (-n) : Print the names of the files that would have been removed on stdout (performs a dry run). Following example is listing all WALs which are in the default WAL location (pg_wal) inside data directory and older than a specific WAL,
-bash-4.2$ ./pg_archivecleanup -n /var/lib/edb/as14/data/pg_wal/ 00000001000000000000001A
/var/lib/edb/as14/data/pg_wal//000000010000000000000016
/var/lib/edb/as14/data/pg_wal//000000010000000000000017
/var/lib/edb/as14/data/pg_wal//000000010000000000000018
/var/lib/edb/as14/data/pg_wal//000000010000000000000019
Following example is listing all WALs which are in archive location (/var/lib/edb/as14/archive) and older than a specific WAL,
-bash-4.2$ ./pg_archivecleanup -n /var/lib/edb/as14/archive/ 00000001000000000000001A
/var/lib/edb/as14/archive//000000010000000000000002
/var/lib/edb/as14/archive//000000010000000000000003
/var/lib/edb/as14/archive//000000010000000000000004
/var/lib/edb/as14/archive//000000010000000000000005
/var/lib/edb/as14/archive//000000010000000000000006
/var/lib/edb/as14/archive//000000010000000000000007
/var/lib/edb/as14/archive//000000010000000000000008
....
Debug Mode (-d) : Print lots of debug logging output on stderr.
In this mode, pg_archivecleanup does the cleanup by removing all the candidate WALs and printing which files are removed.
-bash-4.2$ ./pg_archivecleanup -d /var/lib/edb/as14/archive/ 00000001000000000000001A
pg_archivecleanup: keeping WAL file "/var/lib/edb/as14/archive//00000001000000000000001A" and later
pg_archivecleanup: removing file "/var/lib/edb/as14/archive//000000010000000000000002"
pg_archivecleanup: removing file "/var/lib/edb/as14/archive//000000010000000000000003"
pg_archivecleanup: removing file "/var/lib/edb/as14/archive//000000010000000000000004"
pg_archivecleanup: removing file "/var/lib/edb/as14/archive//000000010000000000000005"
pg_archivecleanup: removing file "/var/lib/edb/as14/archive//000000010000000000000006"
pg_archivecleanup: removing file "/var/lib/edb/as14/archive//000000010000000000000007"
pg_archivecleanup: removing file "/var/lib/edb/as14/archive//000000010000000000000008"
....
-V --version : Print the pg_archivecleanup version and exit.
-bash-4.2$ ./pg_archivecleanup -V
pg_archivecleanup (EnterpriseDB) 14.2.1
extension (-x) : Provide an extension that will be stripped from all file names before deciding if they should be deleted. This is typically useful for cleaning up archives that have been compressed during storage, and therefore have had an extension added by the compression program. For example: -x .gz
.
-? --help : Show help about pg_archivecleanup command line arguments, and exit.
In standby server all WALs from a point where standby can restart the recovery operation are required. It has to be taken care by Standby only because standby only knows about the current restart point. A shell script can be specified for parameter archive_cleanup_command
in recovery.conf
.
On Linux or Unix systems, you might use:
archive_cleanup_command = 'pg_archivecleanup -d /mnt/standby/archive %r 2>>cleanup.log'
Situation will get complicated when there are more than one standby server referring to the same archive location. Possibility that one standby might be lagging and some required WAL which is already removed by another standby. In such cases, we may have to use custom scripts/program.
We need to be sure that we are not deleting archived WAL files that are required by standby server. If we delete the archived WAL file which is required for the standby then replication will be broken and the standby will be out of sync.
There is a recommend that to check pg_controldata
outputs "Latest checkpoint’s REDO WAL file" which is used to decide which WAL files to remove from the archive directory so that it will not impact the standby.
You can execute pg_controldata
with below syntax,
pg_controldata [[ --pgdata | -D ] datadir]
-bash-4.2$ ./pg_controldata -D /var/lib/edb/as14/data/
pg_control version number: 1300
Catalog version number: 202107181
Database system identifier: 7085522549148311880
Database cluster state: in production
pg_control last modified: Sun 17 Apr 2022 06:06:52 AM PDT
Latest checkpoint location: 0/17000110
Latest checkpoint's REDO location: 0/170000D8
Latest checkpoint's REDO WAL file: 000000010000000000000017
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:1515
Latest checkpoint's NextOID: 16387
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 1497
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1515
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint: Sun 17 Apr 2022 06:06:52 AM PDT
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: replica
wal_log_hints setting: off
max_connections setting: 100
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 0
Mock authentication nonce: eb85eca054463e01d0455d0e76dad1b1492b847814d60efc828607d54535c187