EDB Failover Manager (EFM) is undoubtedly one of the most reliable high-availability solutions for your PostgreSQL cluster. EFM supports both asynchronous and synchronous type replications, making it broadly acceptable for multiple environments in the industry. Since handling a synchronous replica is fairly complex, handling the replication slots driving the replication becomes a little more challenging.
The article below is designed to help the DBAs handle the replication slots when they use EFM with synchronous streaming replication because EFM does not automatically assist with recreating the slots on the new primary databases after an automatic switchover.
In my use case, I use EFM 4.7 with the initial setup as below.
[root@5be68cd1216c /]# /usr/edb/efm-4.7/bin/efm cluster-status efm
Cluster Status: efm
Agent Type Address DB VIP
----------------------------------------------------------------
Primary 172.17.0.2 UP 172.17.0.5*
Standby 172.17.0.3 UP 172.17.0.5
Standby 172.17.0.4 UP 172.17.0.5
Allowed node host list:
172.17.0.2 172.17.0.3 172.17.0.4
Membership coordinator: 172.17.0.2
Standby priority host list:
172.17.0.3 172.17.0.4
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
---------------------------------------------------------------------------
Primary 172.17.0.2 0/A000380
Standby 172.17.0.4 0/A000380 0/A000380
Standby 172.17.0.3 0/A000380 0/A000380
Standby database(s) in sync with primary. It is safe to promote.
In this test, I am using a total of three nodes efm1, efm2, and efm3 with three replication slots spread across these nodes.
The Slot replica1
is created to handle the replication on node 172.17.0.2
, replica2
for 172.17.0.3
, and replica3
for 172.17.0.4
. However, at any given point, one primary DB should only have two slots for the two replicas connected to it.
See the list of replicas.
pid | usesysid | usename | application_name | client_addr | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | sync_priority | sync_state |
---|---|---|---|---|---|---|---|---|---|---|---|
3345 | 10 | postgres | efm2 | 172.17.0.3 | streaming | 0/A000380 | 0/A000380 | 0/A000380 | 0/A000380 | 0 | async |
3349 | 10 | postgres | efm3 | 172.17.0.4 | streaming | 0/A000380 | 0/A000380 | 0/A000380 | 0/A000380 | 0 | async |
Listing the slots available on the primary, we can see 2 slots have been created. Slot replica2 handling the replica efm2 and replica3 handling the replica efm3
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
replica3 | physical | f | t | 3349 | 0/A000380 | reserved | f | |||||||
replica2 | physical | f | t | 3345 | 0/A000380 | reserved | f |
The primary slot configuration on this node, which will only be used when it converts as standby is as below. Configuration
postgres=# show primary_slot_name ;
primary_slot_name
-------------------
replica1
(1 row)
Configuration in the properties file.
[postgres@5be68cd1216c ~]$ grep ^application.name /etc/edb/efm-4.7/efm.properties
application.name=efm1
[postgres@5be68cd1216c ~]$ grep ^update.physical.slots.period /etc/edb/efm-4.7/efm.properties
update.physical.slots.period=5
Listing the slots available on the first standby, we can see 2 slots have been created(but are in an inactive state). Slot replica1 handling the replica efm1 and replica3 handling the replica efm3
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
replica1 | physical | f | f | 0/A000380 | reserved | f | ||||||||
replica3 | physical | f | f | 0/A000380 | reserved | f |
Configuration
postgres=# show primary_slot_name ;
primary_slot_name
-------------------
replica2
(1 row)
Configuration in the properties file.
[root@422306b020a9 /]# grep ^application.name /etc/edb/efm-4.7/efm.properties
application.name=efm2
[root@422306b020a9 /]# grep ^update.physical.slots.period /etc/edb/efm-4.7/efm.properties
update.physical.slots.period=5
Listing the slots available on the second standby, we again see 2 slots have been created(but are in an inactive state). Slot replica1 handling the replica efm1 and replica2 handling the replica efm2
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
replica1 | physical | f | f | 0/A000380 | reserved | f | ||||||||
replica2 | physical | f | f | 0/A000380 | reserved | f |
Configuration
postgres=# show primary_slot_name ;
primary_slot_name
-------------------
replica3
(1 row)
Configuration in the properties file.
[postgres@54cb634f7c9e ~]$ grep ^application.name /etc/edb/efm-4.7/efm.properties
application.name=efm3
[postgres@54cb634f7c9e ~]$ grep ^update.physical.slots.period /etc/edb/efm-4.7/efm.properties
update.physical.slots.period=5
With the above setup, if we perform a manual switchover using EFM, the inactive slots available of the standby nodes should change their state to active. Performing a manual switchover.
[root@54cb634f7c9e /]# /usr/edb/efm-4.7/bin/efm promote efm -switchover
Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original primary. Run the 'cluster-status' command for information about the new cluster state.
[root@54cb634f7c9e /]# /usr/edb/efm-4.7/bin/efm cluster-status efm
Cluster Status: efm
Agent Type Address DB VIP
----------------------------------------------------------------
Standby 172.17.0.2 UP 172.17.0.5
Primary 172.17.0.3 UP 172.17.0.5*
Standby 172.17.0.4 UP 172.17.0.5
Allowed node host list:
172.17.0.2 172.17.0.3 172.17.0.4
Membership coordinator: 172.17.0.2
Standby priority host list:
172.17.0.2 172.17.0.4
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
---------------------------------------------------------------------------
Primary 172.17.0.3 0/B0001B8
Standby 172.17.0.4 0/B0001B8 0/B0001B8
Standby 172.17.0.2 0/B0001B8 0/B0001B8
Standby database(s) in sync with primary. It is safe to promote.
pid | usesysid | usename | application_name | client_addr | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | sync_priority | sync_state |
---|---|---|---|---|---|---|---|---|---|---|---|
3476 | 10 | postgres | efm1 | 172.17.0.2 | streaming | 0/B0001B8 | 0/B0001B8 | 0/B0001B8 | 0/B0001B8 | 0 | async |
3481 | 10 | postgres | efm3 | 172.17.0.4 | streaming | 0/B0001B8 | 0/B0001B8 | 0/B0001B8 | 0/B0001B8 | 0 | async |
Checking the status of the slots available.
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
replica1 | physical | f | t | 3476 | 0/B0001B8 | reserved | f | |||||||
replica3 | physical | f | t | 3481 | 0/B0001B8 | reserved | f |
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
replica3 | physical | f | f | 0/B0001B8 | reserved | f | ||||||||
replica2 | physical | f | f | 0/B0001B8 | reserved | f |