Questions such as:
- How does PostgreSQL handle live tuples that were last modified more than 2 billion transactions behind the current transaction ID, or multixact ID?
- How to diagnose wraparound issues?
- What can potentially cause wraparound issues?
- How to fix wraparound issues?
Are discussed and explained in this article.
Whenever a transaction starts in PostgreSQL, it is assigned with an unique identifier
known as txid
(transaction ID), which is always incremented 1 by 1. Any tuple
that is modified by that transaction is marked with its ID in xmin
and/or xmax
fields, depending on the operation, and those fields are used for visibility purposes.
The transaction ID is represented by a 32-bit unsigned integer, so it can hold values
from 0
to 2^32 - 1
(4,294,967,295
), and when it reaches the maximum value, it resets to
0 and start incrementing again.
PostgreSQL has a moving horizon which is based on the the current transaction ID. Roughly saying, transactions which ID are up to 2 billion behind the current transaction ID are considered visible, and transactions which ID are up to 2 billion ahead the current transaction ID are considered to be in the future. A transaction ID wraparound problem describes the scenario where old live tuples get passed by the moving horizon, in such a way PostgreSQL would not be able to differentiate tuples that are visible from tuples that are not visible, hence causing inconsistencies and data loss.
But then a question may arise: how does PostgreSQL handle live tuples that were last modified more than 2 billion transactions behind the current transaction ID, i.e., how does it handle the transaction ID wrap around problem? It handles that by continuously freezing tuples, i.e., it sets a control bit in the tuples that makes them visible to all transactions, independently of the moving horizon.
As further reading, you may be interested in:
Before anything, it is important to understand how age is measured in PostgreSQL:
-
Tuple age
: the difference between the current transaction ID in the database and the transaction ID which created the tuple. Note that the transaction ID cycles roughly every 4 billion transactions. If this age reaches 2^31 (2147483648), then a wraparound would occur. -
Table age
: the age of the oldest tuple in the table -
Database age
: the age of the oldest table in the database -
Database cluster age
: the age of the oldest database in the database cluster
PostgreSQL tries to avoid the transaction ID wrap around problem by proactively freezing old tuples. The txid freezing behavior is controlled by 4 settings:
-
vacuum_freeze_min_age
(default: 50M): is used by postgres to determine which tuples to freeze when cleaning pages of a table. With the defaultvacuum_freeze_min_age
, for tuples in pages being cleaned by vacuum, if their transaction age is at least 50m older than the oldest running transaction, postgres will freeze them. -
vacuum_freeze_table_age
(default: 150M): when a table txid age exceeds this threshold, any pages within the table that have tuples with txid agevacuum_freeze_min_age
older than the oldest running transaction, will be frozen. -
autovacuum_freeze_max_age
(default: 200M): the same behaviour asvacuum_freeze_table_age
applies, with the difference that postgres starts the vacuum as an "aggressive" vacuum to prevent wraparound. It will appear inpg_stat_activity
asautovacuum: VACUUM schema.table (to prevent wraparound)
. If ever attempted to be canceled (throughpg_terminate_backend()
for example), it will immediately start again. Additionally, vacuum will not be paused if it locks a user query. However, cost-based delays and index vacuuming are still present. -
vacuum_failsafe_age
PG14+, (default: 1.6B): the same behaviour asvacuum_freeze_table_age
applies, with the difference that postgres starts the vacuum as an "extraordinarily aggressive" vacuum to prevent wraparound. It will appear inpg_stat_activity
asautovacuum: VACUUM schema.table (to prevent wraparound)
. If ever attempted to be canceled (throughpg_terminate_backend()
for example), it will immediately start again. Additionally, vacuum will not be paused if it locks a user query, autovacuum cost-based delay is ignored, and further non-essential maintenance tasks (such as index vacuuming) are bypassed, in order to be as aggressive as possible.
While vacuum tries to freeze tuples, there are some factors that can prevent it from doing its job, in which case the cluster age will keep growing over time. Some of them are:
- Long running transactions
- Long running transactions in a HOT standby that has feedback enabled
- Prepared transactions
- Stale replication slots
These conditions prevent that autovacuum freezes old tuples because if there is something in the cluster that still depends on an older snapshot (visibility) of the tuples, then PostgreSQL cannot freeze the tuples that were modified after the snapshot being used, otherwise the tuples would start being visible in that old snapshot, hence also causing data inconsistencies.
Eventually, if the among conditions stay true for a long time, it might get near the 2 billion threshold. In this case you will start seeing messages like these in the PostgreSQL logs -- and eventually also in your PostgreSQL backends and vacuum output:
WARNING: oldest xmin is far in the past
WARNING: database "XXX" must be vacuumed within YYY transactions
WARNING: database with OID XXX must be vacuumed within YYY transactions
Indicating PostgreSQL has been having issues to freeze tuples in the tables, and the window between the moving horizon and the oldest live tuple that is not frozen is getting short, i.e., PostgreSQL is running out of available transactions before hitting the wraparound problem.
If the issues are still not resolved and the window keeps getting shorter, PostgreSQL will eventually perform an emergency shutdown to avoid catastrophic data loss, in which case it will only be able to be started again in single user mode, and restricted to perform read only queries and out of transaction blocks. You would see messages like this being issue in the logs if that condition is met:
ERROR: database is not accepting commands to avoid wraparound data loss in database "XXX"
ERROR: database is not accepting commands to avoid wraparound data loss in database with OID XXX
The solution for the transaction wraparound issue consists basically of cleaning up the offending conditions and vacuuming the old objects.
We will split the solution according to the issue severity, meaning if the database
cluster is still operable, in which case you are still receiving the WARNING
messages
that were listed before, or if it requires to be restart in single-user mode, in which
case you are receiving the ERROR
message listed above.
The steps in the following sections will suggest things like rolling transactions back, committing transactions, dropping objects, terminating backends. So please proceed with caution, analyze the output and proceed accordingly.
If you are still able to use the database cluster in non single-user mode, please follow the instructions contained in this section.
The first thing we need to do is clean up the offending conditions, and later vacuum the old objects.
In order to identify what is preventing vacuum from freezing the txids and performing its other functions, you can use a query like the one below:
SELECT (SELECT coalesce(max(age(backend_xmin)), 0)
FROM pg_stat_activity) AS
oldest_xact_age,
(SELECT coalesce(max(age(backend_xmin)), 0)
FROM pg_stat_replication) AS oldest_standby_xact_age,
(SELECT coalesce(max(age(transaction)), 0)
FROM pg_prepared_xacts) AS oldest_prepared_xact_age,
(SELECT greatest(coalesce(max(age(xmin)), 0), coalesce(max(age(catalog_xmin)), 0))
FROM pg_replication_slots) AS oldest_replication_slot_age;
This is the description of the output columns:
-
oldest_xact_age
: the oldest transaction age of all backends running on the server -
oldest_standby_xact_age
: the age of the oldest transaction in the HOT standby servers -
oldest_prepared_xact_age
: the age of the oldest prepared transaction in the server -
oldest_replication_slot_age
: the age of the oldest replication slot in the server
Based on the above output, you want to focus in the columns which values are nearer to 2 billion, and proceed through the corresponding following sub-section(s).
For example, let's say this was the output of the above query:
oldest_xact_age | oldest_standby_xact_age | oldest_prepared_xact_age | oldest_replication_slot_age
5000000 | 0 | 1870000000 | 0
(1 row)
You will be interested in checking old prepared transactions, which oldest age is
1870000000
, because in the example it is what is preventing vacuum to perform its job.
So, you would follow the instructions in the sub-section Handle offending prepared transactions
.
Identify offending backends and terminate them, if any. Use the following query to list possible offending backends:
SELECT pid,
age(backend_xmin)
datname,
usename,
application_name,
client_addr,
state,
backend_type,
query
FROM pg_stat_activity
WHERE age(backend_xmin) > current_setting('autovacuum_freeze_max_age')::bigint
ORDER BY 2 DESC
LIMIT 10;
The above query will bring the top 10 backends with oldest transaction id in the database
cluster which have crossed the value of autovacuum_freeze_max_age
. The nearer the age is
to 2 billion the badder.
If you are able to gracefully stop the above backends, like by committing or rolling them
back through your application, that is ideal. Otherwise, you might want to terminate them
using pg_cancel_backend
or pg_terminate_backend
based on the PIDs reported by the above
query.
If you have a HOT standby attached to the server that is facing issues, and that
standby has hot_standby_feedback
enabled, identify offending backends and terminate
them, if any. Use the following query to identify possibly offending standbys:
SELECT pid,
age(backend_xmin),
usename,
application_name,
client_addr,
state
FROM pg_stat_replication
WHERE age(backend_xmin) > current_setting('autovacuum_freeze_max_age')::bigint
ORDER BY 2 DESC
LIMIT 10;
The above query will bring the top 10 oldest replication backends in the database
cluster which have crossed the value of autovacuum_freeze_max_age
. The near the age is to
2 billion the badder.
If the above query returns any result, and the result refers to a HOT standby connection
with hot_standby_feedback
, you might be interested in applying the steps in the
previous sub-section Handle offending backends
, but this time in that offending
HOT standby, so you terminate offending queries in the standby.
Identify old prepared transactions and commit or roll them back, if any. Use the following query to list possible offending prepared transactions:
SELECT transaction,
gid,
age(transaction),
prepared,
owner,
database
FROM pg_prepared_xacts
WHERE age(transaction) > current_setting('autovacuum_freeze_max_age')::bigint
ORDER BY 3 DESC
LIMIT 10;
The above query will bring the top 10 oldest prepared transactions in the database
cluster which have crossed the value of autovacuum_freeze_max_age
. The nearer the
age is to 2 billion the worse the situation is.
You should study to either COMMIT
or ROLLBACK
those old living prepared transactions.
You can issue the commands below to COMMIT
or ROLLBACK
, using the prepared
transaction gid
reported by the above query:
-
COMMIT
:
COMMIT PREPARED '<gid>';
-
ROLLBACK
:
ROLLBACK PREPARED '<gid>';
It is up to your business logic if you will need to commit or roll them back, but you should seek for freeing them either way.
Identify offending stale replication slots and drop them, if any. Use the following query to list possible offending replication slots:
SELECT slot_name,
age(xmin),
slot_type,
database,
temporary,
active,
active_pid
FROM pg_replication_slots
WHERE greatest(age(xmin), age(catalog_xmin)) > current_setting('autovacuum_freeze_max_age')::bigint
ORDER BY 2 DESC
LIMIT 10;
The above query will bring the top 10 oldest replication slots in the database cluster
which have crossed the value of autovacuum_freeze_max_age
. The nearer the age
is to 2 billion the badder.
At this point you need to investigate what is causing the xmin
of the replication
slot to be held back. The is usually tied with the queries running in a HOT standby
node, and tends to be cleared out once you clear the offending queries in the HOT
standby, as mentioned in the section Handle offending HOT standbys
.
As a last resource, you may choose to drop the replication slots returned by the
above query, if you are not able to make its xmin
advance. You can accomplish
that by running a command like this:
SELECT pg_drop_replication_slot(slot_name := '<slot_name>');
Passing as argument the name of the offending slot.
Please keep in mind that by dropping the replication slot you may break your standbys, depending on how your setup was done, and if the standbys would be able to fetch the WALs from somewhere else or not, when needed.
Now that the offending conditions are not present anymore, you should focus on vacuuming old tables. You can check the oldest databases by running the following query:
SELECT datname,
age(datfrozenxid)
FROM pg_database
WHERE age(datfrozenxid) > current_setting('autovacuum_freeze_max_age')::bigint
ORDER BY 2 DESC
LIMIT 10;
The above query will bring the top 10 oldest databases in the database cluster
which have crossed the value of autovacuum_freeze_max_age
. The nearer the age
is to 2 billion the worse the situation is.
Based on the query output, you should connect to the oldest databases and choose one between the following approaches 1) or 2). If close to wraparound, 2) should be chosen.
- Start a database-wide vacuum with a command like this:
VACUUM (INDEX_CLEANUP OFF);
NOTE: by disabling INDEX_CLEANUP
we allow VACUUM
to run faster, which is
desired as we are dealing with a wraparound issue. If you are running Postgres
11 or older that option is not available though, and you should just run the
VACUUM
command.
IMPORTANT: please note that you do not need to run VACUUM FREEZE
. A simple
VACUUM
will be faster and will perform the expected job to get over the wraparound
issue. It is faster because it can skip data pages where it already knows that all
tuples are frozen, thus avoiding excessive I/O if possible. VACUUM FREEZE
is an
aggressive option which is a regular VACUUM
but with vacuum_freeze_min_age
and
vacuum_freeze_table_age
both set to 0
, meaning it will scan every single page of
the tables, and will freeze every single tuple of the pages.
OR
- Identify the oldest tables in the database and vacuum them manually. You can use this query to identify the oldest tables:
SELECT c.oid::regclass AS table_name,
age(c.relfrozenxid) AS age,
pg_size_pretty(pg_total_relation_size(c.oid)) AS table_total_size
FROM pg_class c
WHERE c.relkind IN ('r', 'm', 't')
AND age(c.relfrozenxid) > current_setting('autovacuum_freeze_max_age')::bigint
ORDER BY 2 DESC
LIMIT 10;
The above query will bring the top 10 oldest tables in the database which have crossed
the value of autovacuum_freeze_max_age
. The nearer the age is to 2 billion the worse the situation is.
Based on the query output, you should then run a command like this to vacuum the tables:
VACUUM (INDEX_CLEANUP OFF) <table_name>;
Replacing <table_name>
with the name of the tables to be vacuumed.
NOTE: by disabling INDEX_CLEANUP
we allow VACUUM
to run faster, which is
desired as we are dealing with an wraparound issue. If you are running Postgres
11 or older that option is not available though, and you should just run the
VACUUM
command.
IMPORTANT: please note that you do not need to run VACUUM FREEZE
. A simple
VACUUM
will be faster and will perform the expected job to get over the wraparound
issue. It is faster because it can skip data pages where it already knows that all
tuples are frozen, thus avoiding excessive I/O if possible. VACUUM FREEZE
is an
aggressive option which is a regular VACUUM
but with vacuum_freeze_min_age
and
vacuum_freeze_table_age
both set to 0, meaning it will scan every single page of the tables, and
will freeze every single tuple of the pages.
If the database cluster requires a shutdown and to be started single-user mode, please follow the instructions contained in this section.
Before anything you will need to start up PostgreSQL/EPAS in single-user mode by running one of the the following commands:
Postgres:
postgres --single -D <data_directory> <database_name>
EPAS:
edb-postgres --single -D <data_directory> <database_name>
Replacing <data_directory>
with the path to your PostgreSQL/EPAS data directory, and
<database_name>
with the name of the database reported in the error message that
we described in the beginning of this article:
ERROR: database is not accepting commands to avoid wraparound data loss in database "XXX"
postgres
and edb-postgres
are binaries. If the PATH
is not set for your user, you will need to specify the binary path in the command.
Then, the first thing we need to do is clean up the offending conditions, and later
vacuum the old objects. Please refer to sections Cleaning offending conditions
and Vacuuming old objects
for the details about that.
To exit single user mode, use Ctrl + D
(or other EOF character if used).
It is important to monitor the transaction ID age in the database cluster, and to work on preventing the wraparound issue.
The following query could be used by a monitoring system to identify how many transactions you have left before hitting an emergency shutdown in the cluster:
SELECT 2^31 - max(x.age) AS txid_left
FROM (
SELECT coalesce(max(age(backend_xmin)), 0) AS age
FROM pg_stat_activity
UNION ALL
SELECT coalesce(max(age(backend_xmin)), 0) AS age
FROM pg_stat_replication
UNION ALL
SELECT coalesce(max(age(transaction)), 0) AS age
FROM pg_prepared_xacts
UNION ALL
SELECT coalesce(max(age(xmin)), 0) AS age
FROM pg_replication_slots
UNION ALL
SELECT coalesce(max(age(catalog_xmin)), 0) AS age
FROM pg_replication_slots
) x;
The above query monitors for a critical situation. You can also monitor tables
that are near to get an aggressive vacuum (to prevent wraparound, triggered by
autovacuum_freeze_max_age
). You can check the top 10 worst tables which are closest to an
autovacuum to prevent wraparound:
SELECT c.oid::regclass AS table_name,
c.relfrozenxid,
current_setting('autovacuum_freeze_max_age')::bigint - age(c.relfrozenxid) AS xid_left_to_antifreeze,
pg_size_pretty(pg_total_relation_size(c.oid)) AS table_total_size
FROM pg_class c
WHERE c.relkind IN ('r','m','t')
ORDER BY 3 ASC
LIMIT 10;
The xid_left_to_antifreeze
column will show how many transactions the table is left
until an autovacuum to prevent wraparound kicks in. If it shows a negative value, it is
likely there is already an aggressive autovacuum running against that table.
NOTE: this query will only check tables in the database where it is connected to,
meaning pg_class
is not a shared catalog object.
If you have a very busy database cluster, meaning it has a high transaction ID consumption
rate, and you want to avoid aggressive vacuums to prevent wraparound to kick in
during your business hours, you might be interested in configuring our
avoid_anti_wraparound_vacuums.sh
script to run in your servers. The idea of this
script is to bring the tables age down during low traffic hours. You can have more
details about it in the article Instructions to set up the AAWV script.
As a last point, one of the most important advices for PostgreSQL in general is: keep transactions as short as possible. It is likely to save you from facing a lot of issues in PostgreSQL, among them the transaction wraparound issue.
Similar to transaction ID we also have the concept known as MultiXact ID (mxid). The latter is also represented by a 32-bit unsigned integer and can also face issues with wraparounds.
MultiXact IDs are consumed whenever you have 2 or more transactions holding locks
on the very same tuple at the same time. This is usually accomplished by
transactions that run SELECT ... FOR SHARE
(or any SELECT ... FOR
variant).
PostgreSQL has the following settings to control mxid freezing behavior:
-
vacuum_multixact_freeze_min_age
(default: 5M): is used by postgres to determine which tuples to freeze when cleaning pages of a table. With the defaultvacuum_multixact_freeze_min_age
, for tuples in pages being cleaned by vacuum, if their multixact age is at least 5M older than the oldest running transaction, postgres will freeze them. -
vacuum_multixact_freeze_table_age
(default 150M): when a table mxid age exceeds this threshold, any pages within the table that have tuples with multixact agevacuum_multixact_freeze_min_age
older than the oldest running transaction, will be frozen. -
autovacuum_multixact_freeze_max_age
(default: 400M): The same behaviour asvacuum_multixact_freeze_table_age
applies, with the difference that postgres starts the vacuum as an "aggressive" vacuum to prevent wraparound. It will appear inpg_stat_activity
asautovacuum: VACUUM schema.table (to prevent wraparound)
. If ever attempted to be canceled (throughpg_terminate_backend()
for example), it will immediately start again. Additionally, vacuum will not be paused if it locks a user query. However, cost-based delays and index vacuuming are still present. -
vacuum_multixact_failsafe_age
(PG 14+) (default: 1.6B): The same behaviour asvacuum_multixact_freeze_table_age
applies, with the difference that postgres starts the vacuum as an "extraordinarily aggressive" vacuum to prevent wraparound. It will appear inpg_stat_activity
asautovacuum: VACUUM schema.table (to prevent wraparound)
. If ever attempted to be canceled (throughpg_terminate_backend()
for example), it will immediately start again. Additionally, vacuum will not be paused if it locks a user query, autovacuum cost-based delay is ignored, and further non-essential maintenance tasks (such as index vacuuming) are bypassed, in order to be as aggressive as possible.
It is important to note that the oldest not-frozen MultiXact ID used by a table
is stored in pg_class.relminmxid
, and the oldest not-frozen MultiXact ID used
by a database is stored in pg_database.datminmxid
.
You can check the oldest databases by running the following query:
SELECT datname,
mxid_age(datminmxid)
FROM pg_database
WHERE mxid_age(datminmxid) > current_setting('autovacuum_multixact_freeze_max_age')::bigint
ORDER BY 2 DESC
LIMIT 10;
You can check the oldest tables in that database with the following query, noting down the returned table names:
SELECT c.oid::regclass AS table_name,
c.relminmxid as table_mxid,
mxid_age(c.relminmxid) as age_of_table_mxid,
pg_size_pretty(pg_total_relation_size(c.oid)) AS table_total_size
FROM pg_class c
WHERE c.relkind IN ('r','m','t')
ORDER BY 3 DESC
LIMIT 10;
In order to identify what is preventing vacuum from freezing the mxids and performing its other functions, you can use the query one below, inputting the table name from the above query:
SELECT s.pid,
s.backend_start,
s.query
FROM pg_stat_activity s
INNER JOIN pg_locks l
ON s.pid = l.pid
WHERE l.relation = 'tablename'::regclass
ORDER BY 2 DESC
LIMIT 10;
This looks for the long-running backends that are holding locks on an individual table. The locks are likely to be the ones causing the mxid to be held.
The query would bring the top 10 longest running backends that have attempted a lock on the problematic table.
VACUUM
might be able to advance the relminmxid
of the corresponding table once
you terminate these likely offending backends.
You can check how far the database cluster is in terms of MultiXact ID wraparound by using a query like this:
SELECT 2^31 - max(mxid_age(datminmxid)) AS mxid_left
FROM pg_database;
You can also monitor tables that are near to get an aggressive vacuum (to prevent
wraparound, triggered by autovacuum_multixact_freeze_max_age
). You can check the
top 10 worst tables in regards to autovacuum to prevent wraparound:
SELECT c.oid::regclass AS table_name,
c.relminmxid as table_mxid,
mxid_age(c.relminmxid) as age_of_table_mxid,
current_setting('autovacuum_multixact_freeze_max_age')::bigint - mxid_age(c.relminmxid) AS xid_left_to_antifreeze,
pg_size_pretty(pg_total_relation_size(c.oid)) AS table_total_size
FROM pg_class c
WHERE c.relkind IN ('r','m','t')
ORDER BY 3 DESC
LIMIT 10;
The query behaves similarly to what was described in the Monitoring
section
for regular transaction IDs.
If you ever identify tables that are getting old in terms of MultiXact ID age, you
should look into cleaning the offending backends that might be preventing VACUUM
to advance the MultiXact ID of such tables.