Linux writeback configuration

Tomas Vondra
Tomas Vondra

Proper kernel writeback configuration is important for eliminating latency spikes, particularly on PostgreSQL versions prior to 9.6.

PostgreSQL does not use direct I/O when writing changes to data files, and so relies on page cache maintained by the kernel. With the default configuration, the kernel may easily accumulate large amounts of dirty data in the cache and then evict it at once, saturating the storage devices and negatively affecting performance of user queries (usually in the form of COMMIT latency spikes). This partially defeats the idea that writes to data files happen in the background to minimize impact on user queries.

This is generally less of an issue on PostgreSQL 9.6 and newer due to the feature mentioned at the end of this article, but may still be a concern on busier systems.

The amount of dirty data in the page cache can be observed in /proc/meminfo:

$ cat /proc/meminfo | grep '\(Dirty\|Writeback\):'
Dirty: 546874 kB
Writeback: 47848 kB

and the primary configuration parameters specifies amounts of dirty data as percentage of total available memory:

vm.dirty_background_ratio = 10
vm.dirty_ratio = 20

These default values say that when the amount of dirty data in page cache reaches 10% of RAM, the system will initiate writeback. And when the amount reaches 20%, the system will stop using page cache for writes (effectively making them synchronous).

These defaults might have been appropriate when machines had a couple of gigabytes of RAM, but not for current systems with hundreds of GBs of RAM. For example with 128GB of RAM, the default allows accumulating up to ~13GB of dirty data before initiating writeback. Writing large amounts of data is likely to be disruptive, so we need to apply the usual rule "If it hurts, you need to do it more often."

What values are appropriate? For vm.dirty_background_ratio values between 64MB and 256MB are usually a good choice, particularly when those values fit into write cache on a RAID controller (or into cache on the storage device). Unfortunately the lowest possible value for _ratio parameters is 1 which is usually much higher than this (at least on large machines). But there are alternative parameters that allow specifying the amount in bytes. For example

vm.dirty_background_bytes = 134217728

sets the limit to 128MB. To set the configuration parameter, add it to /etc/sysctl.conf and force a reload by running sysctl -p. Or, on systemd-based systems, create /etc/sysctl.d/writeback.conf with the above entry, then run sudo systemctl restart systemd-sysctl.service to apply the changes.

For vm.dirty_ratio we can keep the defalt value (20%). It's not desirable to turn the writes to synchronous, so it's better to keep the value higher.

The page cache expiration may also be triggered by time, every 30 seconds as specified by this kernel parameter:

vm.dirty_expire_centisecs = 3000

We do not recommend changing this - the goal was to initiate the writeback in smaller chunks, and vm.dirty_background_bytes works well enough for that.

Note: Lowering the background limit may have negative impact on short-lived temporary files, which would not trigger a writeback with the default limit. There's not much you can do about this, unfortunately, it's a trade-off. Large temporary files however tend to stick around for longer than 30 seconds, triggering vm.dirty_expire_centisecs.

PostgreSQL 9.6

One of the improvements in PostgreSQL 9.6 was introduction of parameters that force the database to regularly flush data (by calling sync_file_range) exactly to prevent accumulation of large amounts of dirty data in page cache. The flush interval depends on which process is performing the writes, and the default values are:

  • backend_flush_after = 0 (disabled)
  • bgwriter_flush_after = 512kB
  • checkpoint_flush_after = 256kB
  • wal_writer_flush_after = 1MB

In case of checkpoint this is particularly efficient as the checkpointer does various additional optimizations (e.g. sorting the writes to make them more sequential).

So if you're running PostgreSQL 9.6 (or a newer release), you may not need to modify the kernel writeback configuration to eliminate latency peaks, the database already operates in a way that prevents accumulation of large amounts of dirty data in page cache.

Was this article helpful?

0 out of 0 found this helpful