Vacuum and Autovacuum

Erik Jones
Erik Jones
  • Updated

An overview of the purpose and duties of the VACUUM command and the Autovacuum daemon.

Introduction

VACUUM and its automated operation by the aptly named Autovacuum daemon are normal and necessary parts of operating any administering any PostgreSQL server. Rather than going into the low-level details of the mechanics of how VACUUM does what it does, this article aims to describe at a high level the various duties performed by the VACUUM command and automatic table vacuum's run by the Autovacuum daemon so that users and admins can better understand the issues that may arise when when vacuums both do and do not run. The low-level details are covered in the Routine Vacuuming chapter of the PostgreSQL online manual.

Manual VACUUM and ANALYZE commands

VACUUM and ANALYZE are PostgreSQL commands the user can manually execute in order to perform maintenance on tables. Despite the existence of the Autovacuum workers that run these commands automatically, the DBA might find necessary to run these commands manually for different reasons. The purpose and actions performed by each of these commands, discussed below, depend on several aspects of the table in question as well as the existing operations being concurrently performed on the table by other backends.

VACUUM: Tracking dead space for reuse and row visibility

Consider two transactions A and B where A starts first and then B starts and deletes or updates a row. The transactional visibility requirements imposed by ACID-compliance dictate that A should still be able to see the row deleted by B or the old value of the row if B updated it. Technically, this will be dependent on A's transaction isolation mode, but the default is as described and most common.

Many databases will move the deleted row or old value of the row to an UNDO log and in the case of an update overwrite the existing row. Postgres does not do that, it simply marks the old row as deleted with the transaction id of the deleting/updating transaction and for an update inserts a new row into the table. This way when any transaction is reading the table then for any rows marked as deleted the server can simply compare the reading transaction's transaction id against the transaction id of the transaction that deleted it to determine if it is still visible to the reading transaction. (Note: There is a bit more to this, such as tracking whether or not the deleting/updating transaction committed but going into that isn't needed for this discussion.)

It then follows that at some point there will be no more transactions open that can "see" the old version of the row and that row should then be known to be reusable, i.e. new rows should be allowed to overwrite it, and that is VACUUM's primary job: Finding "dead" tuples that are no longer visible to any open transactions and tracking them so that their space can be reused by future inserts or the insert portions of updates.

It also then follows here if VACUUM is not run frequently enough for a given table then dead tuples will accumulate without being marked free for reuse. Similarly, if a transaction is held open for a very long time, tuples deleted in any tables in the database after that transaction started can not be marked as free for reuse even if that long-running transaction hasn't touched those tables because it could before it eventually closes. When this happens, writes of new data require allocation of more space instead of being able to reuse the dead tuple space in the table. If that keeps happening you get an accumulation of dead but not-"removable" rows with an ever growing table. This is what is known as "table bloat" and in some cases can result in frequently updated tables that are usually fairly small quickly increasing in on-disk usage by large amounts.

VACUUM: Preventing transaction id wraparound failures

Postgres allocates transactions ids from a 4-byte integer counter. This means that there are only ~4 billion distinct transaction ids available (4 bytes -> 32 bits -> 2^32 = 4294967296). Since we can expect most active databases to process far more than 4B transactions in its lifetime that means that the values pulled from that counter must wrap, i.e. once 4294967296 has been allocated the next will be 1.

The immediate consequence of this is that to determine if a given transaction id is older than another a simple subtraction operation is not sufficient as it must account for the transaction id values wrapping. But then how much of the available 4B transaction id space should be considered in the future or past at any given time? The answer is half to each. But this also means that with repeated wrapping a given transaction id value will flip between "in the future" and "in the past" many times.

This is where "freezing" of tuples comes in. Essentially, once a live row's insertion transaction id value is older than the current transaction id minus the server's configured vacuum_freeze_min_age then when vacuum encounters that row will be marked as "frozen". After every vacuum operation on a table if any rows have been frozen then that table's pg_class.relfrozenxid value is updated with the oldest unfrozen transaction id still present in the table's rows. If a table's relfrozenxid is ever allowed to cross that 2B past/present threshold horizon from the current transaction id because for some reason a vacuum can not be completed on the table then what is known as a transaction id wraparound failure will have occurred.

So, if a VACUUM will freeze old tuples in the course of normal operations what is the purpose of VACUUM FREEZE? It simply runs the vacuum operation with an effective vacuum_freeze_min_age of 0, so all non-frozen rows will be frozen rather than just those that have crossed the normal vacuum_freeze_min_age threshold.

For more details about transaction id wraparound and how to handle it, consider reading this KB article on the matter.

VACUUM: Updating the table visibility and free space maps

Each table has these two metadata files which are updated by vacuum jobs:

  • Visibility Map - A list of pages in the given table's data files that contain only visible tuples (i.e. no dead tuples). Any time vacuum finishes processing a page with no dead tuples remaining in it, it is added to this list, and any time an update or delete of tuple in a page in this list happens it is removed from the list. This allows for speeding up both index-only scans and later vacuums by allowing them to skip reading the pages' contents.
  • Free Space Map - A list tracking how much free/re-usable space there is in each page in the given table's database files. Used when inserting new rows to determine which existing pages, if any, have room for them.

VACUUM: Reclaiming disk space

Per the above discussion, when tracking dead rows, VACUUM isn't actually freeing any disk space used by them. In fact, a normal VACUUM operation will only ever free disk space under one condition: if the last page or pages in a table no longer have any rows visible to any open transactions, then it will attempt to free those pages. Doing that requires an exclusive lock on the table so that nothing can try to write to them while the free operation is in progress. The word "attempt" is used here because the operation will not proceed if the exclusive lock can not be acquired quickly in order to not cause lock wait queuing of new transactions behind it.

So what about VACUUM FULL? VACUUM FULL simply rewrites the entire table with just it's live rows, marking all rows as frozen in the new table. This means that:

  • An AccessExclusive lock must be acquired to ensure that nothing else is reading or writing to the table.
  • Once it is complete the new table will be compacted with no dead rows in it.
  • Since this requires reading the entire existing table and writing out a new version with only the still live rows, this can take a long time for large tables.
  • When running, there must enough free disk space for the new version being written as this is not an "in-place" operation and the old table is only removed after the new table is created and fully written to be used.

ANALYZE: Updating table statistics

The query planner uses statistics on the values and ranges of values contained in a table's columns to make decisions on how to plan queries and it is ANALYZE that collects and stores these statistics. While this can be done on its own with the ANALYZE command it can also be done by VACUUM jobs with VACUUUM (ANALYZE) and the Autovacuum daemon will also run automatic analyze jobs to it is also considered to be a job under the vacuum-work umbrella.

Autovacuum

The Autovacuum daemon is a background process that automatically runs VACUUM and ANALYZE operations on tables so that their various duties can be performed without regular work or intervention by an admin. Autovacuum will trigger a job on a table for any of the four reasons discussed below.

Automatic VACUUM triggered to prevent transaction id wraparound failure

Given that transaction wraparound is catastrophic, to prevent this any table with a pg_class.relfrozenxid value older than the server's autovacuum_vacuum_freeze_max_age setting or, if set, the table's vacuum_freeze_max_age, will have an automatic vacuum run on it to freeze rows, thereby advancing its pg_class.relfrozenxid. If that can not complete for some reason, then the system will shut itself down once there are less than 1M transaction left before the wraparound point so that an admin can boot the system in single-user mode to get the work done. (Note: The 1M threshold has been increased to 3M as of PG14.)

Automatic VACUUM triggered by delete/update traffic to find and reclaim dead tuple space for reuse

Autovacuum uses this formula to determine when to vacuum a table to reclaim dead tuple space for reuse:

tuples = vacuum base threshold + vacuum scale factor * number of tuples

And in terms of the actual configuration parameters:

pg_stat_all_tables.n_dead_tup = autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * pg_class.reltuples
  • pg_stat_all_tables.n_dead_tup is updated by row updates and deletes and is cleared when vacuum is run and successfully handled those rows.
  • pg_class.reltuples is updated by vacuum at the end of each job with an estimate of how many live tuples are present in the given table.
  • autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor are global parameter settings that can be overridden on a per-table basis via their Storage Parameters.

Let's look at an example with the following default settings in effect for a table:

autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2

Let's say a table has 100 tuples. That means that an autovacuum initiated vacuum to reclaim dead space will be triggered after 50 + 0.2 * 100 -> 70 deletes/updates.

If a table has 1M tuples it will take 50 + 0.2 * 1M -> 200050 (~200K) deletes/updates and if it has 1B rows a new vacuum won't be triggered until 50 + 0.2 * 1B -> 200000050 (~200M).

So now we can see that:

When the scale factor is non-zero, as table grows in size, it takes more and more write traffic against them to have a vacuum triggered to find no-longer-visible dead tuples and mark them for reuse. To increase the frequency at which those vacuums happen, you can lower the scale factor, either globally or on a per-table basis. Values of 0.02 or even 0.01 on large tables are not uncommon. Alternatively, if you want to have these vacuums triggered after a fixed number of deletes/updates you can set the scale factor to 0 and the threshold to the fixed value. This should generally only be done on a per-table basis.

Automatic VACUUM triggered by insert traffic

This works with the same kind of formula used for delete/update traffic, but here for insert traffic:

pg_stat_all_tables.n_ins_since_vacuum = autovacuum_vacuum_insert_threshold + autovacuum_vacuum_insert_scale_factor * pg_class.reltuples
  • pg_stat_all_tables.n_ins_since_vacuum is updated by rows inserts and is cleared when vacuum is run.
  • pg_class.reltuples is updated by vacuum at the end of each job with an estimate of how many live tuples are present in the given table.
  • autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor are global parameter settings that can be overridden on a per-table basis via their Storage Parameters.

This autovacuum trigger was added in PG13 to cover insert-only tables (including those that are never written to after their initial load). Without this trigger, those tables only had automatic vacuum jobs run by the transaction id wraparound trigger. For large and and/or many of these tables that would often result in a lot of heavy vacuum jobs being triggered all at once, so this was added to allow those tables to have automatic vacuum triggered sooner rather than later.

Automatic ANALYZE triggered to update table statistics

This is again triggered by the same kind of formula used for delete/update traffic:

pg_stat_all_tables.n_mod_since_analyze = autovacuum_analyze_threshold + autovacuum_analyze_scale_factor * pg_class.tuples
  • pg_stat_all_tables.n_mod_since_analyze is updated by insert, update and delete traffic and is cleared when analyze is run.
  • pg_class.reltuples is updated by vacuum at the end of each job with an estimate of how many live tuples are present in the given table.
  • autovacuum_analyze_threshold and autovacuum_analyze_scale_factor are global parameter settings that can be overridden on a per-table basis via their Storage Parameters.

Related to

Was this article helpful?

0 out of 0 found this helpful