Handling Replication Slots With EFM

Vipul Shukla
Vipul Shukla
  • Updated

EDB Failover Manager (EFM) is undoubtedly one of the most reliable high-availability solutions for your PostgreSQL cluster. EFM supports both asynchronous and synchronous type replications, making it broadly acceptable for multiple environments in the industry. Since handling a synchronous replica is fairly complex, handling the replication slots driving the replication becomes a little more challenging.

The article below is designed to help the DBAs handle the replication slots when they use EFM with synchronous streaming replication because EFM does not automatically assist with recreating the slots on the new primary databases after an automatic switchover.

In my use case, I use EFM 4.7 with the initial setup as below.

Initial EFM cluster status

[root@5be68cd1216c /]# /usr/edb/efm-4.7/bin/efm cluster-status efm
Cluster Status: efm

	Agent Type  Address              DB       VIP
	----------------------------------------------------------------
	Primary     172.17.0.2           UP       172.17.0.5*
	Standby     172.17.0.3           UP       172.17.0.5
	Standby     172.17.0.4           UP       172.17.0.5

Allowed node host list:
	172.17.0.2 172.17.0.3 172.17.0.4

Membership coordinator: 172.17.0.2

Standby priority host list:
	172.17.0.3 172.17.0.4

Promote Status:

	DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info
	---------------------------------------------------------------------------
	Primary     172.17.0.2                              0/A000380          
	Standby     172.17.0.4           0/A000380          0/A000380          
	Standby     172.17.0.3           0/A000380          0/A000380          

	Standby database(s) in sync with primary. It is safe to promote.

In this test, I am using a total of three nodes efm1, efm2, and efm3 with three replication slots spread across these nodes. The Slot replica1 is created to handle the replication on node 172.17.0.2, replica2 for 172.17.0.3, and replica3 for 172.17.0.4. However, at any given point, one primary DB should only have two slots for the two replicas connected to it.

On my primary node (172.17.0.2), I have two replication slots (one for each replica)

See the list of replicas.

pid usesysid usename application_name client_addr state sent_lsn write_lsn flush_lsn replay_lsn sync_priority sync_state
3345 10 postgres efm2 172.17.0.3 streaming 0/A000380 0/A000380 0/A000380 0/A000380 0 async
3349 10 postgres efm3 172.17.0.4 streaming 0/A000380 0/A000380 0/A000380 0/A000380 0 async

Listing the slots available on the primary, we can see 2 slots have been created. Slot replica2 handling the replica efm2 and replica3 handling the replica efm3

slot_name plugin slot_type datoid database temporary active active_pid xmin catalog_xmin restart_lsn confirmed_flush_lsn wal_status safe_wal_size two_phase
replica3 physical f t 3349 0/A000380 reserved f
replica2 physical f t 3345 0/A000380 reserved f

The primary slot configuration on this node, which will only be used when it converts as standby is as below. Configuration

postgres=# show primary_slot_name ;
 primary_slot_name 
-------------------
 replica1
(1 row)

Configuration in the properties file.

[postgres@5be68cd1216c ~]$ grep ^application.name /etc/edb/efm-4.7/efm.properties
application.name=efm1

[postgres@5be68cd1216c ~]$ grep ^update.physical.slots.period /etc/edb/efm-4.7/efm.properties
update.physical.slots.period=5

On the first standby (172.17.0.3)

Listing the slots available on the first standby, we can see 2 slots have been created(but are in an inactive state). Slot replica1 handling the replica efm1 and replica3 handling the replica efm3

slot_name plugin slot_type datoid database temporary active active_pid xmin catalog_xmin restart_lsn confirmed_flush_lsn wal_status safe_wal_size two_phase
replica1 physical f f 0/A000380 reserved f
replica3 physical f f 0/A000380 reserved f

Configuration

postgres=# show primary_slot_name ;
 primary_slot_name 
-------------------
 replica2
(1 row)

Configuration in the properties file.

[root@422306b020a9 /]# grep ^application.name /etc/edb/efm-4.7/efm.properties
application.name=efm2

[root@422306b020a9 /]# grep ^update.physical.slots.period /etc/edb/efm-4.7/efm.properties
update.physical.slots.period=5

On the second standby (172.17.0.4)

Listing the slots available on the second standby, we again see 2 slots have been created(but are in an inactive state). Slot replica1 handling the replica efm1 and replica2 handling the replica efm2

slot_name plugin slot_type datoid database temporary active active_pid xmin catalog_xmin restart_lsn confirmed_flush_lsn wal_status safe_wal_size two_phase
replica1 physical f f 0/A000380 reserved f
replica2 physical f f 0/A000380 reserved f

Configuration

postgres=# show primary_slot_name ;
 primary_slot_name 
-------------------
 replica3
(1 row)

Configuration in the properties file.

[postgres@54cb634f7c9e ~]$ grep ^application.name /etc/edb/efm-4.7/efm.properties
application.name=efm3

[postgres@54cb634f7c9e ~]$ grep ^update.physical.slots.period /etc/edb/efm-4.7/efm.properties
update.physical.slots.period=5

With the above setup, if we perform a manual switchover using EFM, the inactive slots available of the standby nodes should change their state to active. Performing a manual switchover.

[root@54cb634f7c9e /]# /usr/edb/efm-4.7/bin/efm promote efm -switchover 
Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original primary. Run the 'cluster-status' command for information about the new cluster state.
[root@54cb634f7c9e /]# /usr/edb/efm-4.7/bin/efm cluster-status efm
Cluster Status: efm

	Agent Type  Address              DB       VIP
	----------------------------------------------------------------
	Standby     172.17.0.2           UP       172.17.0.5
	Primary     172.17.0.3           UP       172.17.0.5*
	Standby     172.17.0.4           UP       172.17.0.5

Allowed node host list:
	172.17.0.2 172.17.0.3 172.17.0.4

Membership coordinator: 172.17.0.2

Standby priority host list:
	172.17.0.2 172.17.0.4

Promote Status:

	DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info
	---------------------------------------------------------------------------
	Primary     172.17.0.3                              0/B0001B8          
	Standby     172.17.0.4           0/B0001B8          0/B0001B8          
	Standby     172.17.0.2           0/B0001B8          0/B0001B8          

	Standby database(s) in sync with primary. It is safe to promote.

On New Primary(172.17.0.3)

pid usesysid usename application_name client_addr state sent_lsn write_lsn flush_lsn replay_lsn sync_priority sync_state
3476 10 postgres efm1 172.17.0.2 streaming 0/B0001B8 0/B0001B8 0/B0001B8 0/B0001B8 0 async
3481 10 postgres efm3 172.17.0.4 streaming 0/B0001B8 0/B0001B8 0/B0001B8 0/B0001B8 0 async

Checking the status of the slots available.

slot_name plugin slot_type datoid database temporary active active_pid xmin catalog_xmin restart_lsn confirmed_flush_lsn wal_status safe_wal_size two_phase
replica1 physical f t 3476 0/B0001B8 reserved f
replica3 physical f t 3481 0/B0001B8 reserved f

On Old Primary (New Standby) the status of the slots looks like below.

slot_name plugin slot_type datoid database temporary active active_pid xmin catalog_xmin restart_lsn confirmed_flush_lsn wal_status safe_wal_size two_phase
replica3 physical f f 0/B0001B8 reserved f
replica2 physical f f 0/B0001B8 reserved f

Was this article helpful?

0 out of 0 found this helpful