autovacuum configuration and tuning

Tomas Vondra
Tomas Vondra

The default autovacuum configuration is suitable mostly for small and read-mostly databases databases. On larger databases the cleanup may be delayed for extended periods of time, affecting performance by accumulating bloat and then performing the cleanup at once. Furthermore, the default throttling limits are set too low for current hardware, so on busy databases the cleanup may not be able to keep up.

This internal KB has a customer-facing companion article that tries to keep things simpler and avoid going into the innards too much. Consider reading it first.

There's a lot to know about vacuum and MVCC. If this is overwhelming, read the summary at the end first. The critical point to take away are that you should not turn autovacuum off; in fact, you should usually tune it up, not down, when you think it's using too many resources.

Dead tuples

When PostgreSQL performs a DELETE, the row is not immediately removed from the data file. Instead it is only marked as deleted by setting a field in a header. The same is true for UPDATE, which may be seen as DELETE of the old row + INSERT of the new row. This is one of the ideas behind Multi-Version Concurrency Control (MVCC), as it allows greater concurrency with only minimal locking between transactions. The downside of PostgreSQL's particular MVCC implementation is that it leaves behind the deleted rows, even after all transactions that might need them finish.

If not cleaned up, those “dead tuples” (effectively invisible to any transaction) would stay in the data files forever, wasting disk space, and for tables with many DELETE and UPDATE commands, the dead rows might easily account for large part of disk space. Those dead tuples would also be referenced from indexes, further increasing the amount of wasted disk space. This is what we call “bloat” and not only it wastes disk space, it also impacts performance - the more data queries have to process (even if 99% of it is immediately thrown away as “dead”), the slower the queries.

Transaction IDs and wrap-around

PostgreSQL uses a 32-bit transaction-id counter. The creating and (later) deleting transaction ID is recorded along with tuples on disk to allow PostgreSQL to skip over tuples that a given transaction should not "see". Many installations perform enough transactions that the counter hits its 2-billion transaction ID limit every few weeks or months; others take years. When the counter hits its limit it restarts at the beginning, re-using old transaction IDs.

If tuples with old transaction IDs still exist on disk when their IDs are re-used, PostgreSQL can no longer correctly filter them for visibiliy. So it must either vacuum them away if deleted, or "freeze" them as permanent tuples. Failure to do so can produce incorrect query results. PostgreSQL detects this and warns the user in the logs, forces vacuum to run, and may eventually stop accepting commands to protect against data loss.

VACUUM

The most straightforward way to reclaim space occupied by dead tuples (and make it available for new rows) is by manually running VACUUM command. This maintenance command will scan the table and remove dead tuples both from the table (and indexes). It will not generally release the disk space back to the operating system, but it will make it usable for new rows in the same table.

Note: VACUUM FULL does reclaim the space and returns it to the OS, but is has a number of serious disadvantages. Firstly it acquires exclusive lock on the table, blocking all operations (including SELECT queries). Secondly, it essentially creates a copy of the table containing all the non-dead rows, nearly temporarily doubling the disk space needed, so it’s not very practical when already running out of disk space.

The trouble with VACUUM is that it’s entirely manual action – it only happens when you decide run it, not when it’s needed. You might put it into cron and run it every 5 minutes on all tables, but the chances are most of the runs will not actually clean anything, and the only effect will be higher CPU and I/O usage on the system and more difficulty getting locks for schema changes. Or you may run it only once a day at night, in which case you will probably accumulate more dead tuples that you would like.

autovacuum

The database does know approximately how many dead tuples were produced over time, as each transaction reports the number of tuples it deleted and updated. This means the database can trigger cleanup when the table accumulates a certain number of dead tuples (by default this is 20% of the table). So the cleanup will be executed more often during busy periods, and less often when the database is mostly idle (or when it serves mostly read-only queries).

This is the primary purpose of autovacuum daemon; do the cleanup as needed to keep the amount of wasted space under control.

It can be tuned to run at varying thesholds and intensities on a system-wide and per-table basis depending on the workload.

Monitoring

Before doing any sort of tuning, you need to be able to collect relevant data, so that you can evaluate the impact of configuration changes. You should have some basic monitoring in place, collecting basic metrics from the database. For cleanup, you should to be looking at least at these values:

  • pg_stat_all_tables.n_dead_tup – number of dead rows in each table (both user tables and system catalogs)
  • n_dead_tup / (n_dead_tup + n_live_tup) – fraction of dead rows in each table
  • (pg_class.relpages * 8192 / pg_class.reltuples) – space “per row”

If you already have a monitoring system deployed (and you should), chances are you’re already collecting such metrics. If you don't, now is a good time to start because you need something in place to alert you to disk space emergencies and other serious issues before they bring the database down. You also need to monitor for transaction ID wrap-around.

Note: There is also a handy pgstattuple extension, allowing you to perform analysis on tables and indexes, including computing the amount of free space, dead tuples etc.

Tuning Goals

When tuning autovacuum parameters, we want to achieve two general goals:

  • timely cleanup of dead tuples – Keep the number of dead rows reasonably low, not to waste unreasonable amount of disk space, prevent index bloat and keep queries fast.
  • minimize impact on the database – Do not perform cleanup too often, and do not consume too much resources (CPU, I/O and RAM) which might negatively impact performance of user queries.

These two goals are somewhat contradictory - running the cleanup "immediately" would make the queries faster (by not having to process any extra rows) but it would also mean the cleanup is running all the time (which is fairly expensive). So autovacuum tuning is about finding the right balance for your system / workload, and it may take a few iterations to converge to good values.

Most default values in postgresql.conf are quite conservative, for two reasons. Firstly, the default values were decided a few years ago, based on the resources (CPU, RAM, …) common at that time. Secondly, we the default configuration needs to work everywhere, including tiny machines like Raspberry Pi or small VPS servers. For many deployments (particularly smaller ones and/or handling read-mostly workloads) the default configuration parameters will however work just fine.

But as the database size and/or amount of writes increase, problems start to appear. The typical issue is that the cleanup does not happen often enough, and then when it happens it significantly disrupts performance, because it has to deal with a lot of garbage. If those cases you should follow this simple rule:

If it hurts, you’re not doing it often enough.

That is, tune the parameters so that the cleanup happens more often, and processes smaller number of dead tuples every time.

Don't disable autovacuum: People sometimes follow a different rule – If it hurts, don’t do it. – and disable autovacuum entirely. Please don’t do that unless you really (really really) know what you’re doing, and have regular cleanup script in place. Otherwise you’re painting yourself in the corner, and instead of somewhat degraded performance you’ll have to deal with severely degraded performance or possibly even an outage.

Autovacuum is important even for read-only or append-only tables: it is responsible for rewriting tuples when PostgreSQL's transaction-id counter is approaching its 32-bit wrap-around point and needs to re-use old transaction IDs for new transactions. If you disable autovacuum entirely, PostgreSQL will actually re-enable it and force vacuuming of tables that are approaching this wrap-around threshold. This will be very disruptive if the tables are large, and if it takes too long or is repeatedly canceled, it may cause the database to stop accepting transactions entirely until the VACUUM(s) complete.

Thresholds and Scale Factors

The first thing you may need to tweak is when a cleanup on a table gets triggered, which is affected by two parameters:

autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2

The cleanup on a table is triggered whenever the number of dead tuples (which you can see as pg_stat_all_tables.n_dead_tup) exceeds

threshold + pg_class.reltuples * scale_factor

This basically says that up to 20% of a table may be dead tuples before it gets cleaned up (the threshold of 50 rows is there to prevent very frequent cleanups of tiny tables).

The default scale factor works fine for small and medium-sized tables, but not so much for very large tables. For a 10GB table this allows roughly 2GB of dead tuples, which seems like a reasonable amount, as it can be reused for new data after cleanup. On a 1TB table the limit is ~200GB, which seems rather excessive. And once the cleanup triggers, it may need to do a low of work (CPU, I/O) at once.

Per the "do it more often" rule mentioned above, the right solution is to do the cleanup more often by decreasing the scale factor, perhaps even like this:

autovacuum_vacuum_scale_factor = 0.01

which decreases the limit to only 1% of the table. An alternative solution is to abandon the scale factor entirely, and use the threshold instead:

autovacuum_vacuum_scale_factor = 0
autovacuum_vacuum_threshold = 10000

which should trigger the cleanup after generating 10000 dead tuples. We however recommend sticking to lowering the scale factor in most cases, as it works both for large and small tables.

One possible issue is that these changes in postgresql.conf affect all tables (the whole cluster, in fact), and it may undesirably affect cleanups of small tables, including for example system catalogs. The easiest solution is to simply ignore the problem entirely. Cleanup of small tables is going to be fairly cheap, and the improvement on large tables is usually so significant that even if you ignore small inefficiency on small tables, the overall effect is still very positive. It's only likely to be an issue if you have vast numbers of small tables, such as in a schema-sharded system.

If you decided to change the configuration in a way that would significantly delay cleanup on small tables (as for example with setting scale_factor=0 and threshold=10000), it may be better to apply those changes only to particular tables using ALTER TABLE:

ALTER TABLE t SET (autovacuum_vacuum_scale_factor = 0);
ALTER TABLE t SET (autovacuum_vacuum_threshold = 10000);

Try to keep the configuration as simple as possible, and override the parameters for as few tables as possible. It’s also a good idea to include this into your internal documentation, including the reasoning for particular values.

Throttling

A nice feature build into autovacuum is throttling. The cleanup is meant to be a maintenance task running in the background, with minimum impact on user queries etc. In other words, it should not consume too much resources (CPU and disk I/O), and this is exactly the purpose of the throttling.

The cleanup process is fairly simple – it reads pages (8kB chunks of data) from data files, and checks if a cleanup is needed. If there are no dead tuples, the page is simply thrown away without any changes. Otherwise the page is cleaned up (dead tuples are removed), marked as “dirty” and written back to disk. The throttling is based on defining cost for three basic operations:

vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20

That is, if the page is found in shared_buffers (so a read from data file is not needed), it counts as 1. If it’s not found in shared_buffers it needs to be read from the OS and counts as 10 (it might still be served from RAM, but we don’t know that). And finally, if the page is dirtied by the cleanup, it counts as 20. That allows us to compute “cost of work” done by autovacuum.

Whenever the autovacuum process performs certain amount of work, it sleeps for a while. By default the amount of work in one iteration is limited to 200 and the sleep is 20ms:

autovacuum_vacuum_cost_delay = 20ms
autovacuum_vacuum_cost_limit = 200

This "cost" and "delay" is somewhat abstract, so how much work does that actually mean? With 20ms delay, the cleanup can do 50 rounds per second, and with 200 tokens per round that means there are about 10000 tokens per second. That means, the autovacuum can do:

  • 4 MB/s writes (pages dirtied by the autovacuum process)
  • 8 MB/s reads from OS (possibly from disk)
  • 80 MB/s reads from shared_buffers (assuming it’s not dirtied)

Considering the capabilities of current hardware, and that the reads/writes are mostly sequential, those limits are fairly low. In fact, busy databases may generate dead tuples at higher rate, which means the autovacuum may not be able to keep up.

We recommend increasing the cost_limit parameter to better reflect capabilities of your hardware. For example increase to 1000 tokens increases the throughput by 5x. You may of course tweak the other parameters (cost per page operation, sleep delay), but we do that only rarely – changing the cost limit works well enough.

Another thottling measure is that, unlike explicit VACUUM, autovacuum cancels itself wen it detects conflicting lock attempts. This makes it easier to perform system maintenance operations, but it can also be a problem if the application routinely performs tasks that cancel vacuum. Enable autovacuum logging and monitor your logs and the autovacuum statistics to detect if this becomes a problem. You'll usually find that it becomes obvious though growing bloat.

Number of Workers

One configuration option not yet mentioned is autovacuum_max_workers, so what’s that about? Well, the cleanup does not happen in a single autovacuum process, but the database is allowed to start up to autovacuum_max_workers processes that actually do cleanup of different databases/tables. That seems useful, because you for example don’t want to stop cleaning up small tables until a cleanup of a single large table finishes (which may take quite a bit of time, because of the throttling).

The trouble is users often assume the number of workers is proportional to the amount of cleanup that can happen. If you bump the number of workers to 6, it’ll surely do twice as much work compared to the default 3 workers, right?

Unfortunately, that's not the case. The cost limit is global, shared by all autovacuum workers. Each worker process only gets about 1/autovacuum_max_workers of the total cost limit, so increasing the number of workers will simply make them go slower.

So if the cleanup on your database can’t keep up with user activity, increasing the number of workers is probably not a solution, unless you also tweak the other parameters.

Per-table Throttling

Similarly to scale factor and threshold, it’s possible to set the cost limit and delay per table:

ALTER TABLE t SET (autovacuum_vacuum_cost_limit = 1000);
ALTER TABLE t SET (autovacuum_vacuum_cost_delay = 10);

Workers processing such tables are then not included in the global costing, and are throttled independently. This gives you quite a bit of flexibility and power, but it also makes the overall configuration quite a bit more complicated. It also does not guarantee availability of a worker (it may be busy processing other tables).

In practice, we almost never use this feature, for two basic reasons. Firstly, you generally do want to use a single global limit on the background cleanup. Secondly, having multiple workers that are sometimes throttled together and sometimes independently makes it much harder to monitor and analyze behavior of the system.

autoanalyze

Cleanup of dead tuples and tuple freezing are not the only tasks of autovacuum. It’s also responsible for updating data distribution statistics, used by the optimizer when planning queries. You may collect those manually by running ANALYZE, but it suffers similar issues as VACUUM – you’re likely to run it either too often or not often enough. But the database does know how many rows were modified in the table, and run ANALYZE automatically. This is what autoanalyze is about.

Note: The costs of running ANALYZE manually are greater than VACUUM, because while the cost of VACUUM is mostly proportional to the amount of dead tuples (so fairly low when there are few/none), ANALYZE has to to sample the table and rebuild the statistics from scratch on every execution. So the cost of running it too often is fairly high. On the other hand, if not running it often enough may result in poor plan choices, which may be just as bad. PostgreSQL does not currently have any facility to incrementally update statistics.

The autoanalyze configuration is fairly similar to the cleanup - follows roughly the same reasoning and uses pretty the same formula (threshold + scale factor). The scale factor is set to 10% by default, which means the ANALYZE will happen after ~10% of the table changes. That is usually sufficient for good planning in most cases, and collecting statistics is fairly expensive process (so the cost for "doing it unnecessarily too often" is high).

Summary

  • Don’t disable autovacuum, unless your really know what you’re doing. It may temporarily help, but is likely to cause serious issues or outage later.
  • On busy databases (doing a lot of UPDATE and DELETE), particularly large ones, you should probably decrease the scale factor, so that cleanup happens more frequently.
  • On reasonable hardware (good storage, multiple cores), you should probably increase the throttling parameters, so that the cleanup can keep up.
  • Increasing autovacuum_max_workers alone will not really help in most cases. You’ll get more processes that go slower.
  • You can set the parameters per table using ALTER TABLE, but think twice if you really need that. It makes the system more complex and more difficult to inspect.
  • Don't tweak the autoanalyze configuration, the defaults are usually good enough.
  • You need to (auto)vacuum even append-only or read-only tables

Further reading

The PostgreSQL manual on vacuuming.

Was this article helpful?

0 out of 0 found this helpful