How to handle transaction wraparound

Israel Barth
Israel Barth
  • Updated

Questions such as:

  • How does PostgreSQL handle live tuples that were last modified more than 2 billion transactions behind the current transaction ID, or multixact ID?
  • How to diagnose wraparound issues?
  • What can potentially cause wraparound issues?
  • How to fix wraparound issues?

Are discussed and explained in this article.

Transaction ID (txid) wraparound

Introduction

Whenever a transaction starts in PostgreSQL, it is assigned with an unique identifier known as txid (transaction ID), which is always incremented 1 by 1. Any tuple that is modified by that transaction is marked with its ID in xmin and/or xmax fields, depending on the operation, and those fields are used for visibility purposes. The transaction ID is represented by a 32-bit unsigned integer, so it can hold values from 0 to 2^32 - 1 (4,294,967,295), and when it reaches the maximum value, it resets to 0 and start incrementing again.

PostgreSQL has a moving horizon which is based on the the current transaction ID. Roughly saying, transactions which ID are up to 2 billion behind the current transaction ID are considered visible, and transactions which ID are up to 2 billion ahead the current transaction ID are considered to be in the future. A transaction ID wraparound problem describes the scenario where old live tuples get passed by the moving horizon, in such a way PostgreSQL would not be able to differentiate tuples that are visible from tuples that are not visible, hence causing inconsistencies and data loss.

But then a question may arise: how does PostgreSQL handle live tuples that were last modified more than 2 billion transactions behind the current transaction ID, i.e., how does it handle the transaction ID wrap around problem? It handles that by continuously freezing tuples, i.e., it sets a control bit in the tuples that makes them visible to all transactions, independently of the moving horizon.

As further reading, you may be interested in:

Before anything, it is important to understand how age is measured in PostgreSQL:

  • Tuple age: the difference between the current transaction ID in the database and the transaction ID which created the tuple. Note that the transaction ID cycles roughly every 4 billion transactions. If this age reaches 2^31 (2147483648), then a wraparound would occur.
  • Table age: the age of the oldest tuple in the table
  • Database age: the age of the oldest table in the database
  • Database cluster age: the age of the oldest database in the database cluster

PostgreSQL tries to avoid the transaction ID wrap around problem by proactively freezing old tuples. The txid freezing behavior is controlled by 4 settings:

  • vacuum_freeze_min_age (default: 50M): is used by postgres to determine which tuples to freeze when cleaning pages of a table. With the default vacuum_freeze_min_age, for tuples in pages being cleaned by vacuum, if their transaction age is at least 50m older than the oldest running transaction, postgres will freeze them.

  • vacuum_freeze_table_age (default: 150M): when a table txid age exceeds this threshold, any pages within the table that have tuples with txid age vacuum_freeze_min_age older than the oldest running transaction, will be frozen.

  • autovacuum_freeze_max_age (default: 200M): the same behaviour as vacuum_freeze_table_age applies, with the difference that postgres starts the vacuum as an "aggressive" vacuum to prevent wraparound. It will appear in pg_stat_activity as autovacuum: VACUUM schema.table (to prevent wraparound). If ever attempted to be canceled (through pg_terminate_backend() for example), it will immediately start again. Additionally, vacuum will not be paused if it locks a user query. However, cost-based delays and index vacuuming are still present.

  • vacuum_failsafe_age PG14+, (default: 1.6B): the same behaviour as vacuum_freeze_table_age applies, with the difference that postgres starts the vacuum as an "extraordinarily aggressive" vacuum to prevent wraparound. It will appear in pg_stat_activity as autovacuum: VACUUM schema.table (to prevent wraparound). If ever attempted to be canceled (through pg_terminate_backend() for example), it will immediately start again. Additionally, vacuum will not be paused if it locks a user query, autovacuum cost-based delay is ignored, and further non-essential maintenance tasks (such as index vacuuming) are bypassed, in order to be as aggressive as possible.

Factors that prevent vacuum cleanup, and related log messages

While vacuum tries to freeze tuples, there are some factors that can prevent it from doing its job, in which case the cluster age will keep growing over time. Some of them are:

  • Long running transactions
  • Long running transactions in a HOT standby that has feedback enabled
  • Prepared transactions
  • Stale replication slots

These conditions prevent that autovacuum freezes old tuples because if there is something in the cluster that still depends on an older snapshot (visibility) of the tuples, then PostgreSQL cannot freeze the tuples that were modified after the snapshot being used, otherwise the tuples would start being visible in that old snapshot, hence also causing data inconsistencies.

Eventually, if the among conditions stay true for a long time, it might get near the 2 billion threshold. In this case you will start seeing messages like these in the PostgreSQL logs -- and eventually also in your PostgreSQL backends and vacuum output:

WARNING: oldest xmin is far in the past
WARNING: database "XXX" must be vacuumed within YYY transactions
WARNING: database with OID XXX must be vacuumed within YYY transactions

Indicating PostgreSQL has been having issues to freeze tuples in the tables, and the window between the moving horizon and the oldest live tuple that is not frozen is getting short, i.e., PostgreSQL is running out of available transactions before hitting the wraparound problem.

If the issues are still not resolved and the window keeps getting shorter, PostgreSQL will eventually perform an emergency shutdown to avoid catastrophic data loss, in which case it will only be able to be started again in single user mode, and restricted to perform read only queries and out of transaction blocks. You would see messages like this being issue in the logs if that condition is met:

ERROR: database is not accepting commands to avoid wraparound data loss in database "XXX"
ERROR: database is not accepting commands to avoid wraparound data loss in database with OID XXX

Solution and troubleshooting

The solution for the transaction wraparound issue consists basically of cleaning up the offending conditions and vacuuming the old objects.

We will split the solution according to the issue severity, meaning if the database cluster is still operable, in which case you are still receiving the WARNING messages that were listed before, or if it requires to be restart in single-user mode, in which case you are receiving the ERROR message listed above.

The steps in the following sections will suggest things like rolling transactions back, committing transactions, dropping objects, terminating backends. So please proceed with caution, analyze the output and proceed accordingly.

Database cluster still operable

If you are still able to use the database cluster in non single-user mode, please follow the instructions contained in this section.

The first thing we need to do is clean up the offending conditions, and later vacuum the old objects.

Cleaning offending conditions

In order to identify what is preventing vacuum from freezing the txids and performing its other functions, you can use a query like the one below:

SELECT (SELECT coalesce(max(age(backend_xmin)), 0)
FROM pg_stat_activity) AS 
oldest_xact_age,
(SELECT coalesce(max(age(backend_xmin)), 0)
FROM pg_stat_replication) AS oldest_standby_xact_age,
(SELECT coalesce(max(age(transaction)), 0)
FROM pg_prepared_xacts) AS oldest_prepared_xact_age,
(SELECT greatest(coalesce(max(age(xmin)), 0), coalesce(max(age(catalog_xmin)), 0))
FROM pg_replication_slots) AS oldest_replication_slot_age;

This is the description of the output columns:

  • oldest_xact_age: the oldest transaction age of all backends running on the server
  • oldest_standby_xact_age: the age of the oldest transaction in the HOT standby servers
  • oldest_prepared_xact_age: the age of the oldest prepared transaction in the server
  • oldest_replication_slot_age: the age of the oldest replication slot in the server

Based on the above output, you want to focus in the columns which values are nearer to 2 billion, and proceed through the corresponding following sub-section(s).

For example, let's say this was the output of the above query:

oldest_xact_age | oldest_standby_xact_age | oldest_prepared_xact_age | oldest_replication_slot_age
5000000 | 0 | 1870000000 | 0
(1 row)

You will be interested in checking old prepared transactions, which oldest age is 1870000000, because in the example it is what is preventing vacuum to perform its job. So, you would follow the instructions in the sub-section Handle offending prepared transactions.

Handle offending backends

Identify offending backends and terminate them, if any. Use the following query to list possible offending backends:

SELECT pid,
age(backend_xmin)
datname,
usename,
application_name,
client_addr,
state,
backend_type,
query
FROM pg_stat_activity
WHERE age(backend_xmin) > current_setting('autovacuum_freeze_max_age')::bigint
ORDER BY 2 DESC
LIMIT 10;

The above query will bring the top 10 backends with oldest transaction id in the database cluster which have crossed the value of autovacuum_freeze_max_age. The nearer the age is to 2 billion the badder.

If you are able to gracefully stop the above backends, like by committing or rolling them back through your application, that is ideal. Otherwise, you might want to terminate them using pg_cancel_backend or pg_terminate_backend based on the PIDs reported by the above query.

Handle offending HOT standbys

If you have a HOT standby attached to the server that is facing issues, and that standby has hot_standby_feedback enabled, identify offending backends and terminate them, if any. Use the following query to identify possibly offending standbys:

SELECT pid,
age(backend_xmin),
usename,
application_name,
client_addr,
state
FROM pg_stat_replication
WHERE age(backend_xmin) > current_setting('autovacuum_freeze_max_age')::bigint
ORDER BY 2 DESC
LIMIT 10;

The above query will bring the top 10 oldest replication backends in the database cluster which have crossed the value of autovacuum_freeze_max_age. The near the age is to 2 billion the badder.

If the above query returns any result, and the result refers to a HOT standby connection with hot_standby_feedback, you might be interested in applying the steps in the previous sub-section Handle offending backends, but this time in that offending HOT standby, so you terminate offending queries in the standby.

Handle offending prepared transactions

Identify old prepared transactions and commit or roll them back, if any. Use the following query to list possible offending prepared transactions:

SELECT transaction,
gid,
age(transaction),
prepared,
owner,
database
FROM pg_prepared_xacts
WHERE age(transaction) > current_setting('autovacuum_freeze_max_age')::bigint
ORDER BY 3 DESC
LIMIT 10;

The above query will bring the top 10 oldest prepared transactions in the database cluster which have crossed the value of autovacuum_freeze_max_age. The nearer the age is to 2 billion the worse the situation is.

You should study to either COMMIT or ROLLBACK those old living prepared transactions. You can issue the commands below to COMMIT or ROLLBACK, using the prepared transaction gid reported by the above query:

  • COMMIT:
COMMIT PREPARED '<gid>';
  • ROLLBACK:
ROLLBACK PREPARED '<gid>';

It is up to your business logic if you will need to commit or roll them back, but you should seek for freeing them either way.

Handle offending replication slots

Identify offending stale replication slots and drop them, if any. Use the following query to list possible offending replication slots:

SELECT slot_name,
age(xmin),
slot_type,
database,
temporary,
active,
active_pid
FROM pg_replication_slots
WHERE greatest(age(xmin), age(catalog_xmin)) > current_setting('autovacuum_freeze_max_age')::bigint
ORDER BY 2 DESC
LIMIT 10;

The above query will bring the top 10 oldest replication slots in the database cluster which have crossed the value of autovacuum_freeze_max_age. The nearer the age is to 2 billion the badder.

At this point you need to investigate what is causing the xmin of the replication slot to be held back. The is usually tied with the queries running in a HOT standby node, and tends to be cleared out once you clear the offending queries in the HOT standby, as mentioned in the section Handle offending HOT standbys.

As a last resource, you may choose to drop the replication slots returned by the above query, if you are not able to make its xmin advance. You can accomplish that by running a command like this:

SELECT pg_drop_replication_slot(slot_name := '<slot_name>');

Passing as argument the name of the offending slot.

Please keep in mind that by dropping the replication slot you may break your standbys, depending on how your setup was done, and if the standbys would be able to fetch the WALs from somewhere else or not, when needed.

Vacuuming old objects

Now that the offending conditions are not present anymore, you should focus on vacuuming old tables. You can check the oldest databases by running the following query:

SELECT datname,
age(datfrozenxid)
FROM pg_database
WHERE age(datfrozenxid) > current_setting('autovacuum_freeze_max_age')::bigint
ORDER BY 2 DESC
LIMIT 10;

The above query will bring the top 10 oldest databases in the database cluster which have crossed the value of autovacuum_freeze_max_age. The nearer the age is to 2 billion the worse the situation is.

Based on the query output, you should connect to the oldest databases and choose one between the following approaches 1) or 2). If close to wraparound, 2) should be chosen.

  1. Start a database-wide vacuum with a command like this:
VACUUM (INDEX_CLEANUP OFF);

NOTE: by disabling INDEX_CLEANUP we allow VACUUM to run faster, which is desired as we are dealing with a wraparound issue. If you are running Postgres 11 or older that option is not available though, and you should just run the VACUUM command.

IMPORTANT: please note that you do not need to run VACUUM FREEZE. A simple VACUUM will be faster and will perform the expected job to get over the wraparound issue. It is faster because it can skip data pages where it already knows that all tuples are frozen, thus avoiding excessive I/O if possible. VACUUM FREEZE is an aggressive option which is a regular VACUUM but with vacuum_freeze_min_age and vacuum_freeze_table_age both set to 0, meaning it will scan every single page of the tables, and will freeze every single tuple of the pages.

OR

  1. Identify the oldest tables in the database and vacuum them manually. You can use this query to identify the oldest tables:
SELECT c.oid::regclass AS table_name,
age(c.relfrozenxid) AS age,
pg_size_pretty(pg_total_relation_size(c.oid)) AS table_total_size
FROM pg_class c
WHERE c.relkind IN ('r', 'm', 't')
AND age(c.relfrozenxid) > current_setting('autovacuum_freeze_max_age')::bigint
ORDER BY 2 DESC
LIMIT 10;

The above query will bring the top 10 oldest tables in the database which have crossed the value of autovacuum_freeze_max_age. The nearer the age is to 2 billion the worse the situation is.

Based on the query output, you should then run a command like this to vacuum the tables:

VACUUM (INDEX_CLEANUP OFF) <table_name>;

Replacing <table_name> with the name of the tables to be vacuumed.

NOTE: by disabling INDEX_CLEANUP we allow VACUUM to run faster, which is desired as we are dealing with an wraparound issue. If you are running Postgres 11 or older that option is not available though, and you should just run the VACUUM command.

IMPORTANT: please note that you do not need to run VACUUM FREEZE. A simple VACUUM will be faster and will perform the expected job to get over the wraparound issue. It is faster because it can skip data pages where it already knows that all tuples are frozen, thus avoiding excessive I/O if possible. VACUUM FREEZE is an aggressive option which is a regular VACUUM but with vacuum_freeze_min_age and vacuum_freeze_table_age both set to 0, meaning it will scan every single page of the tables, and will freeze every single tuple of the pages.

Database cluster requiring single-user mode

If the database cluster requires a shutdown and to be started single-user mode, please follow the instructions contained in this section.

Before anything you will need to start up PostgreSQL in single-user mode by running a command like this:

postgres --single -D <data_directory> <database_name>

Replacing <data_directory> with the path to your PostgreSQL data directory, and <database_name> with the name of the database reported in the error message that we described in the beginning of this article:

ERROR: database is not accepting commands to avoid wraparound data loss in database "XXX"

Then, the first thing we need to do is clean up the offending conditions, and later vacuum the old objects. Please refer to sections Cleaning offending conditions and Vacuuming old objects for the details about that.

Monitoring

It is important to monitor the transaction ID age in the database cluster, and to work on preventing the wraparound issue.

The following query could be used by a monitoring system to identify how many transactions you have left before hitting an emergency shutdown in the cluster:

SELECT 2^31 - max(x.age) AS txid_left
FROM (
SELECT coalesce(max(age(backend_xmin)), 0) AS age
FROM pg_stat_activity

UNION ALL

SELECT coalesce(max(age(backend_xmin)), 0) AS age
FROM pg_stat_replication

UNION ALL

SELECT coalesce(max(age(transaction)), 0) AS age
FROM pg_prepared_xacts

UNION ALL

SELECT coalesce(max(age(xmin)), 0) AS age
FROM pg_replication_slots

UNION ALL

SELECT coalesce(max(age(catalog_xmin)), 0) AS age
FROM pg_replication_slots
) x;

The above query monitors for a critical situation. You can also monitor tables that are near to get an aggressive vacuum (to prevent wraparound, triggered by autovacuum_freeze_max_age). You can check the top 10 worst tables which are closest to an autovacuum to prevent wraparound:

SELECT c.oid::regclass AS table_name,
c.relfrozenxid,
current_setting('autovacuum_freeze_max_age')::bigint - age(c.relfrozenxid) AS xid_left_to_antifreeze,
pg_size_pretty(pg_total_relation_size(c.oid)) AS table_total_size
FROM pg_class c
WHERE c.relkind IN ('r','m','t')
ORDER BY 3 ASC
LIMIT 10;

The xid_left_to_antifreeze column will show how many transactions the table is left until an autovacuum to prevent wraparound kicks in. If it shows a negative value, it is likely there is already an aggressive autovacuum running against that table.

NOTE: this query will only check tables in the database where it is connected to, meaning pg_class is not a shared catalog object.

Recommendations

If you have a very busy database cluster, meaning it has a high transaction ID consumption rate, and you want to avoid aggressive vacuums to prevent wraparound to kick in during your business hours, you might be interested in configuring our avoid_anti_wraparound_vacuums.sh script to run in your servers. The idea of this script is to bring the tables age down during low traffic hours. You can have more details about it in the article Instructions to set up the AAWV script.

As a last point, one of the most important advices for PostgreSQL in general is: keep transactions as short as possible. It is likely to save you from facing a lot of issues in PostgreSQL, among them the transaction wraparound issue.

MultiXact ID (mxid) wraparound

Introduction

Similar to transaction ID we also have the concept known as MultiXact ID (mxid). The latter is also represented by a 32-bit unsigned integer and can also face issues with wraparounds.

MultiXact IDs are consumed whenever you have 2 or more transactions holding locks on the very same tuple at the same time. This is usually accomplished by transactions that run SELECT ... FOR SHARE (or any SELECT ... FOR variant).

PostgreSQL has the following settings to control mxid freezing behavior:

  • vacuum_multixact_freeze_min_age (default: 5M): is used by postgres to determine which tuples to freeze when cleaning pages of a table. With the default vacuum_multixact_freeze_min_age, for tuples in pages being cleaned by vacuum, if their multixact age is at least 5M older than the oldest running transaction, postgres will freeze them.

  • vacuum_multixact_freeze_table_age (default 150M): when a table mxid age exceeds this threshold, any pages within the table that have tuples with multixact age vacuum_multixact_freeze_min_age older than the oldest running transaction, will be frozen.

  • autovacuum_multixact_freeze_max_age (default: 400M): The same behaviour as vacuum_multixact_freeze_table_age applies, with the difference that postgres starts the vacuum as an "aggressive" vacuum to prevent wraparound. It will appear in pg_stat_activity as autovacuum: VACUUM schema.table (to prevent wraparound). If ever attempted to be canceled (through pg_terminate_backend() for example), it will immediately start again. Additionally, vacuum will not be paused if it locks a user query. However, cost-based delays and index vacuuming are still present.

  • vacuum_multixact_failsafe_age (PG 14+) (default: 1.6B): The same behaviour as vacuum_multixact_freeze_table_age applies, with the difference that postgres starts the vacuum as an "extraordinarily aggressive" vacuum to prevent wraparound. It will appear in pg_stat_activity as autovacuum: VACUUM schema.table (to prevent wraparound). If ever attempted to be canceled (through pg_terminate_backend() for example), it will immediately start again. Additionally, vacuum will not be paused if it locks a user query, autovacuum cost-based delay is ignored, and further non-essential maintenance tasks (such as index vacuuming) are bypassed, in order to be as aggressive as possible.

It is important to note that the oldest not-frozen MultiXact ID used by a table is stored in pg_class.relminmxid, and the oldest not-frozen MultiXact ID used by a database is stored in pg_database.datminmxid.

Solution and troubleshooting

You can check the oldest databases by running the following query:

SELECT datname,
mxid_age(datminmxid)
FROM pg_database
WHERE mxid_age(datminmxid) > current_setting('autovacuum_multixact_freeze_max_age')::bigint
ORDER BY 2 DESC
LIMIT 10;

You can check the oldest tables in that database with the following query, noting down the returned table names:

SELECT c.oid::regclass AS table_name,
c.relminmxid as table_mxid,
mxid_age(c.relminmxid) as age_of_table_mxid,
pg_size_pretty(pg_total_relation_size(c.oid)) AS table_total_size
FROM pg_class c
WHERE c.relkind IN ('r','m','t')
ORDER BY 3 DESC
LIMIT 10;

In order to identify what is preventing vacuum from freezing the mxids and performing its other functions, you can use the query one below, inputting the table name from the above query:

SELECT s.pid,
s.backend_start,
s.query
FROM pg_stat_activity s
INNER JOIN pg_locks l
ON s.pid = l.pid
WHERE l.relation = 'tablename'::regclass
ORDER BY 2 DESC
LIMIT 10;

This looks for the long-running backends that are holding locks on an individual table. The locks are likely to be the ones causing the mxid to be held.

The query would bring the top 10 longest running backends that have attempted a lock on the problematic table.

VACUUM might be able to advance the relminmxid of the corresponding table once you terminate these likely offending backends.

Monitoring

You can check how far the database cluster is in terms of MultiXact ID wraparound by using a query like this:

SELECT 2^31 - max(mxid_age(datminmxid)) AS mxid_left
FROM pg_database;

You can also monitor tables that are near to get an aggressive vacuum (to prevent wraparound, triggered by autovacuum_multixact_freeze_max_age). You can check the top 10 worst tables in regards to autovacuum to prevent wraparound:

SELECT c.oid::regclass AS table_name,
c.relminmxid as table_mxid,
mxid_age(c.relminmxid) as age_of_table_mxid,
current_setting('autovacuum_multixact_freeze_max_age')::bigint - mxid_age(c.relminmxid) AS xid_left_to_antifreeze,
pg_size_pretty(pg_total_relation_size(c.oid)) AS table_total_size
FROM pg_class c
WHERE c.relkind IN ('r','m','t')
ORDER BY 3 DESC
LIMIT 10;

The query behaves similarly to what was described in the Monitoring section for regular transaction IDs.

If you ever identify tables that are getting old in terms of MultiXact ID age, you should look into cleaning the offending backends that might be preventing VACUUM to advance the MultiXact ID of such tables.

Was this article helpful?

0 out of 0 found this helpful