EFM For Cluster With Synchronous Replicas

Vipul Shukla
Vipul Shukla

EDB Failover Manager (EFM) is a tool for managing Postgres database clusters enabling high availability of primary-standby deployment architectures using streaming replication. While using EFM with asynchronous standby is a common practice and pretty straight forward, you may also effectively use EFM with your environment having multiple synchronous standby servers without having to manually intervene every time a failover occurs.

Using EFM with synchronous standbys

Over the past few years EFM has added a lot of new features enabling it to maintain the synchronous streaming replicas seamlessly in case of a failover or a manual switchover. When you perform a manual switchover using efm promote efm -switchover, EFM is able to seamlessly reconfigure your old primary or any other standbys present in the EFM cluster to be reconnected to the new master as synchronous replica, using the configuration parameters like synchronous_standby_names on the primary and application_name on the standby. However, you would still need to rebuild the failed node manually (similar to an automatic failover scenario in async mode), because in case of a failover, the old primary servers may get deviated from the standby's state and cannot catch back up to the replication.

The following demonstrates a simple switchover scenario with synchronous replicas configured.

In my test case, I am using EFM 4.7, with PostgreSQL 15, running on Ubuntu 22.04. The EFM cluster has 3 nodes (1 primary and 2 synchronous standbys).

root@ip-172-31-1-92:~# /usr/edb/efm-4.7/bin/efm cluster-status efm
Cluster Status: efm

Agent Type Address DB VIP
Standby 172.31.0.255 UP 
Primary 172.31.1.92 UP 
Standby 172.31.14.240 UP 

Allowed node host list:
172.31.14.240 172.31.1.92 172.31.0.255

Membership coordinator: 172.31.14.240

Standby priority host list:
172.31.14.240 172.31.0.255

Promote Status:

DB Type Address WAL Received LSN WAL Replayed LSN Info
Primary 172.31.1.92 0/11014FF8 
Standby 172.31.0.255 0/11014FF8 0/11014FF8 
Standby 172.31.14.240 0/11014FF8 0/11014FF8 

Standby database(s) in sync with primary. It is safe to promote.
root@ip-172-31-1-92:~# psql postgres postgres -c "show synchronous_standby_names;"
synchronous_standby_names 
ANY 2(pg1, pg2, pg3)
(1 row)
root@ip-172-31-1-92:~# psql postgres postgres -c "select pid, usename, application_name, client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, sync_priority, sync_state FROM pg_stat_replication;"
pid | usename | application_name | client_addr | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | sync_priority | sync_state 
4100 | postgres | pg2 | 172.31.14.240 | streaming | 0/11014FF8 | 0/11014FF8 | 0/11014FF8 | 0/11014FF8 | 1 | quorum
4102 | postgres | pg3 | 172.31.0.255 | streaming | 0/11014FF8 | 0/11014FF8 | 0/11014FF8 | 0/11014FF8 | 1 | quorum
(2 rows)

As per the above configuration, the database continues with the transactions if at least 2 of the 3 standbys are connected to it. In the current case pg2 and pg3 are connected to it as syncronous standbys, and pg1 is the name of the primary node itself, configured in the EFM configuration file.

root@ip-172-31-1-92:~# grep ^application.name /etc/edb/efm-4.7/efm.properties
application.name=pg1

When the failover happens and this node attempts to connect back to new primary, it may use that application name that the primary server is already expecting.

When a switchover occurs, and parameter auto.reconfigure is set to true, EFM will reconfigure the standby servers using parameters like application_name, restore_command etc, based on the values provided through the efm configuration file.

root@ip-172-31-1-92:~# grep ^restore.command /etc/edb/efm-4.7/efm.properties
restore.command=scp postgres@%h:/var/lib/postgresql/15/archive/%f %p

root@ip-172-31-1-92:~# grep ^application.name /etc/edb/efm-4.7/efm.properties
application.name=pg1

root@ip-172-31-1-92:~# grep ^auto.reconfigure /etc/edb/efm-4.7/efm.properties
auto.reconfigure=true

#The other important parameter that is required to be configured in the postgresql configuration file (synchronous_standby_names). 

root@ip-172-31-1-92:~# grep ^synchronous_standby_names /etc/postgresql/15/main/postgresql.auto.conf
synchronous_standby_names = 'ANY 2(pg1, pg2, pg3)'

Invoking a manual switchover.

root@ip-172-31-1-92:~# /usr/edb/efm-4.7/bin/efm promote efm -switchover
Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original primary. Run the 'cluster-status' command for information about the new cluster state.
root@ip-172-31-1-92:~# /usr/edb/efm-4.7/bin/efm cluster-status efm
Cluster Status: efm

Agent Type Address DB VIP
Primary 172.31.0.255 UP 
Standby 172.31.1.92 UP 
Standby 172.31.14.240 UP 

Allowed node host list:
172.31.1.92 172.31.0.255 172.31.14.240

Membership coordinator: 172.31.1.92

Standby priority host list:
172.31.1.92 172.31.14.240

Promote Status:

DB Type Address WAL Received LSN WAL Replayed LSN Info
Primary 172.31.0.255 0/200001B8 
Standby 172.31.1.92 0/200001B8 0/200001B8 
Standby 172.31.14.240 0/200001B8 0/200001B8 

Standby database(s) in sync with primary. It is safe to promote.

For details on how to setup a new EFM cluster please refer to the document here. To find details on how to configure a synchronous standby replica, refer the document here.

Was this article helpful?

0 out of 0 found this helpful