Postgres major version upgrades using pglogical

William Ivanski
William Ivanski

The purpose of this article is to demonstrate using pglogical to perform a PostgreSQL major version upgrade with minimal downtime.

0. Requirements

The following settings are required for pglogical:

  • shared_preload_libraries should include pglogical.;
  • wal_level = 'logical' (note that if you changed this on the primary, you also need to change this on all physical standbys and on the subscriber);
  • track_commit_timestamp = on;
  • max_replication_slots high enough for one additional replication slot per database;
  • max_wal_senders high enough for one additional WAL sender process per database, on the provider.
  • max_worker_processes high enough for one additional WAL sender process per database (on the provider), and two additional processes on the subscriber: one pglogical receiver and one pglogical writer.

Changing the settings above requires a Postgres restart.

Another requirement is to configure pg_hba.conf on the provider, to allow regular and replication connections from the subscriber:

On each database being upgraded, you should also check for tables without a primary key, using the query below:

SELECT
n.nspname as schema,
c.relname as table,
pg_size_pretty(pg_relation_size(c.oid)) as size
FROM
pg_class c
JOIN
pg_namespace n
ON n.oid = c.relnamespace
WHERE
c.relkind = 'r'
AND NOT EXISTS (
SELECT 1
FROM pg_constraint con
WHERE con.conrelid = c.oid
AND con.contype = 'p'
)
AND n.nspname <> ALL (
ARRAY [
'pg_catalog',
'information_schema',
'pglogical'
]
);

For any table without a primary key, pglogical will be able to replicate INSERTs and TRUNCATEs, but not UPDATEs or DELETEs. Your goal during a major upgrade is to replicate all information, so for these tables, if they are important, you should consider creating a primary key.

Last but not least: please make sure you are installing the latest pglogical 2 versions on both provider and subscriber. Having also the latest PostgreSQL minor release versions on both is not required, but strongly recommended.

1. Configure the provider

On the provider, for each database you need to upgrade, first you need to create the pglogical extension:

CREATE EXTENSION pglogical;

Then you need to create the local node:

SELECT pglogical.create_node('provider', 'host=a dbname=testdb');

After that you need to add the tables to the default replication set. You can add all tables specifying each schema:

SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public','myschema']);

But please note that if any table doesn't have a primary key, you will see a warning, and the table will replicate only INSERTs and TRUNCATEs. Alternatively, you can add only the tables that have a primary key:

SELECT pglogical.replication_set_add_table(
set_name := 'default',
relation := c.oid,
synchronize_data := true
)
FROM
pg_class c
JOIN
pg_namespace n
ON n.oid = c.relnamespace
WHERE
c.relkind = 'r'
AND EXISTS (
SELECT 1
FROM pg_constraint con
WHERE con.conrelid = c.oid
AND con.contype = 'p'
)
AND n.nspname <> ALL (
ARRAY [
'pg_catalog',
'information_schema',
'pglogical'
]
);

You can optionally add the sequences to the default replication set, so pglogical will keep the sequences values up-to-date on the subscriber:

SELECT pglogical.replication_set_add_all_sequences('default', ARRAY['public','myschema']);

2. Configure the subscriber

On the subscriber, for each database being upgraded, you need to first create the pglogical extension:

CREATE EXTENSION pglogical;

Then you create the local node:

SELECT pglogical.create_node('subscriber', 'host=b dbname=testdb');

The next step is to create the complete schema, which includes the global objects (tablespaces, roles, privileges) and the database schema (empty tables, sequences, functions, etc).

# Gather global objects from the provider
pg_dumpall -h provider -g -f global_objects.sql
# Restore global objects on the subscriber
psql -f global_objects.sql
# For each database, gather database schema from the provider
pg_dump -h provider -d mydb -s -f mydb_schema.sql
# Create the database on the subscriber
psql -c 'CREATE DATABASE mydb'
# Restore the database schema from the provider into the subscriber
psql -d mydb -f mydb_schema.sql

Please note how we are copying the schema, and not the data.

Then you will finally be ready to create the subscription.

IMPORTANT: The step below will create the replication slot on the provider and start the initial data copy, which can take a long time depending on the size of the database.

The command is as follows:

SELECT pglogical.create_subscription(
subscription_name := 'upgradesub',
provider_dsn := 'host=a dbname=testdb',
replication_sets := ARRAY['default'],
synchronize_structure := false,
synchronize_data := true
);

synchronize_structure is set to false by default, and we don't want to change that, because we had already copied the schema from the provider in the previous step.

synchronize_data is set to true by default, we also don't want to change that, because we need all data to be copied from the provider, and then synced.

IMPORTANT: It's strongly recommended to have synchronize_data := true, because if you run with synchronize_data := false, then pglogical will start replicating only from that point on. Which means that all the data you have copied can be already old, and there might be some transactions happening in the provider after that, and these transactions won't be replicated to the subscriber. So setting synchronize_data := true ensures that the subscriber has all data.

IMPORTANT: If you run pglogical.create_subscription with the recommended default values for synchronize_structure := false and synchronize_data := true, but you didn't manually create the schema as instructed above, then you will receive an error:

initialization failed during nonrecoverable step (d), please try the setup again

This happens because the first data copy requires a table that's not available.

3. Monitoring and managing DDL

During the initial data copy, a large amount of data will be copied from the provider to the subscriber. This step is critical because pglogical will keep an inactive replication slot on the provider, which will cause WAL retention, so you need to watch for the pg_wal/pg_xlog directory size as this can potentially cause disk exhaustion.

During the initial data copy, please don't perform any DDL on the provider, as you won't be able to apply the same DDL on the subscriber at this point, and this can break the initial data copy.

On the subscriber, you can use check the subscription status with:

SELECT * FROM pglogical.show_subscription_status();

During initial sync, field status = initializing. Once the initial sync is completed, then status = replicating.

After the initial data copy is completed, then the replication slot that was kept inactive on the provider will turn active. All the WAL files that were kept on the provider will start to be transferred to the subscriber, and the subscriber will start to apply all those transaction backlog. Depending on the amount of WAL that was generated on the provider, this can take a long time. Over time, upon each automatic checkpoint happening on the provider, the size of the pg_wal/pg_xlog directory will start to decrease until it reaches regular levels.

At this point, DDL will then be allowed on the provider. However, from now on, for every DDL command you execute on the provider, please always use the pglogical.replicate_ddl_command function:

SELECT pglogical.replicate_ddl_command($$ DDL command here ... $$);

This function will execute the DDL on the provider and replicate it to the subscriber.

Provided that you are monitoring the replication and the disk usage, and also performing DDL only with pglogical.replicate_ddl_command, then you can keep the pglogical replication for as long as you want, until the scheduled cutover to the upgraded database.

4. Cutover

When the time of the cutover arrives, you will need to stop applications so they don't write to the provider anymore. This means downtime for as long as the cutover is happening. After that, wait for replication lag to drop to zero.

Next step is to synchronize the sequence values. If you had executed pglogical.replication_set_add_all_sequences as instructed above, then this step will be as simple as, on the provider, per database:

SELECT pglogical.synchronize_sequence(seqoid)
FROM pglogical.sequence_state;

Otherwise, you will need to run the following, per database, on the subscriber:

cat << EOF | psql -h provider -d mydb -At -o mydb_update_sequence_values.sql
SELECT format(
\$\$ALTER SEQUENCE %s START %s RESTART;
\$\$, x.sequence_name, x.next_sequence_value)
FROM (
SELECT quote_ident(n.nspname) || '.' ||
quote_ident(c.relname) AS sequence_name,
nextval(c.oid) AS next_sequence_value
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'S'
) x
EOF
psql -d mydb -f mydb_update_sequence_values.sql

The last step is to drop the subscription:

SELECT pglogical.drop_subscription('upgradesub');

Now you can redirect applications to the subscriber and stop the provider.

Was this article helpful?

0 out of 0 found this helpful