This article describes issues relating to locale data updates in glibc.
The following information is for users of operating systems using the GNU C library (glibc), which includes most popular Linux distributions. Users of other operating systems such as Windows, FreeBSD, macOS are not affected by this particular instance of the issue, but similar issues could exist in all operating systems. All versions of PostgreSQL are affected.
PostgreSQL uses locale data provided by the operating system’s C library for sorting text. Sorting happens in a variety of contexts, including for user output, merge joins, B-tree indexes, and range partitions. In the latter two cases, sorted data is persisted to disk. If the locale data in the C library changes during the lifetime of a database, the persisted data may become inconsistent with the expected sort order, which could lead to erroneous query results and other incorrect behavior. For example, if an index is not sorted in a way that an index scan is expecting it, a query could fail to find data that is actually there, and an update could insert duplicate data that should be disallowed. Similarly, in a partitioned table, a query could look in the wrong partition and an update could write to the wrong partition. Therefore, it is essential to the correct operation of a database that the locale definitions do not change incompatibly during the lifetime of a database.
Operating system vendors, and in particular the authors of the GNU C library, change the locale data from time to time in minor ways to correct mistakes or add support for more languages. While this in theory violates the above rule, it has historically affected few users and has not received wide attention. However, in glibc version 2.28, released 2018-08-01, a major update to the locale data has been included, which can potentially affect the data of many users. It should be noted that the update itself is legitimate, as it brings the locale data in line with current international standards. But problems are bound to happen if these updates are applied to an existing PostgreSQL system.
The integration of glibc updates into Linux distributions is the domain of the operating system vendor. We expect that vendors of long-term support Linux distributions will not apply incompatible locale updates to their distribution within a given release, but this is only an expectation, as we cannot predict or influence future actions. Moreover, PostgreSQL currently has no way to detect an incompatible glibc update. Therefore, some manual care is required in planning any updates or upgrades.
Situations that are potentially affected involve changed locale data being applied to an existing instance or binary-equivalent instance, in particular:
- Changing locale data on a running instance (even if restarted).
- This includes in particular upgrading the Linux distribution to a new major release while keeping the PostgreSQL data directory around.
- Using streaming replication to a standby instance with different locale data. (The standby is then potentially corrupted, but the primary is fine.)
- Restoring a binary backup (pg_basebackup, Barman, etc.) on a system with different locale data.
Not affected are situations where the data is transported in a logical (not binary) way, including:
- Backups using pg_dump
- Logical replication (including pglogical, BDR)
When an instance needs to be upgraded to a new glibc release, for example to upgrade the operating system, then after the upgrade
- All indexes involving columns of type text, varchar, char, and citext should be reindexed before the instance is put into production.
- Range-partitioned tables using those types in the partition key should be checked to verify that all rows are still in the correct partitions. (This is quite unlikely to be a problem, only with particularly obscure partitioning bounds.)
- To avoid downtime due to reindexing or repartitioning, consider upgrading using logical replication.
- Databases or table columns using the “C” or “POSIX” locales are not affected. All other locales are potentially affected.
- Table columns using collations with the ICU provider are not affected.
Use this SQL query in each database to find out which indexes are affected:
SELECT indrelid::regclass::text, indexrelid::regclass::text, collname, pg_get_indexdef(indexrelid)
FROM (SELECT indexrelid, indrelid, indcollation[i] coll FROM pg_index, generate_subscripts(indcollation, 1) g(i)) s
JOIN pg_collation c ON coll=c.oid
WHERE collprovider IN ('d', 'c') AND collname NOT IN ('C', 'POSIX');
To help users assess the situation with their current operating system, we have assembled the following information for each group of Linux distributions that we support. Note again that this is only a report of the current situation and that we cannot influence what these vendors do or might do in the future.
- Versions 6 and 7 use the old locale data. We don’t expect any incompatible changes within those major version branches. Upgrades from version 6 to 7 are safe.
- Version 8 uses the new locale data data. Therefore, caution will be necessary when upgrading.
- Version 8 (jessie) and 9 (stretch) use the old locale data. We don’t expect any incompatible changes inside those releases. Upgrades from version 8 to 9 are safe.
- Release 10 (buster) uses the new locale data. Therefore, caution will be necessary when upgrading.
- Current LTS versions 14.04, 16.04, 18.04 use the old locale data. We don’t expect any incompatible changes inside those releases. Upgrades between those versions are safe.
- Releases 18.10 (not LTS) and newer uses the new locale data. Therefore, caution is necessary when upgrading.
- SLES 11 and 12 use the old locale data. We don’t expect any incompatible changes inside those releases. Upgrades from version 11 to 12 are safe.
- We have not evaluated version 15 yet.
- We have learned that in Fedora, some of the locale data changes at issue have been backported to earlier glibc package versions. It is currently unclear which versions are affected.
- We do not recommend using Fedora or other Linux distributions with a short support window and aggressive package update policies for production databases.
We are in contact with packagers and other stakeholders to consider options for alleviating the impact of this issue. However, since we have little influence over the actions of Linux distribution vendors and other third parties, the available options are likely reduced to further guidance and diagnostics.
In the medium term, we are working on enhancing PostgreSQL to be able to use the ICU library for locale data. ICU has a more precise update policy and better APIs for detecting changes.