One of the common problems with using replication slots is that it needs to be reconfigured after a failover and switchover. Replication Slot doesn’t resume at the same restart LSN.
This problem doesn’t exist with EFM v3.10 and later using database version 12 or later.
You need to use the update.physical.slots.period property to define the slot advance frequency for database version 12 and later. When update.physical.slots.period is set to a positive integer value, the primary agent reads the current restart_lsn of the physical replication slots after every update.physical.slots.period seconds and sends this information with its pg_current_wal_lsn and primary_slot_name (if it is set in the postgresql.conf file) to the standbys. The physical slots must already exist on the primary for the agent to find them. If physical slots do not already exist on the standbys, standby agents create the slots and then update restart_lsn parameter for these slots. A non-promotable standby doesn't create new slots but updates them if they exist. Note: all slot names, including one set on the current primary if desired, must be unique.
Period in seconds between having the primary agent update promotable standbys with physical replication slot information so that the cluster will continue to use replication slots after a failover. This parameter value can be set > 0 to enable it.
Set to zero to turn off.
[enterprisedb@localhost ~]# /usr/edb/as14/bin/psql -p 5468 edb -U enterprisedb
psql (14.2.1, server 14.2.1)
Type "help" for help.
edb=# select pg_create_physical_replication_slot('replica');
pg_create_physical_replication_slot
(replica,)
(1 row)
edb=# select * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_l
sn | wal_status | safe_wal_size | two_phase
replica | | physical | | | f | t | 26830 | | | 0/4000148 |
| reserved | | f
(1 row)
edb=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backe
nd_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |
reply_time
26830 | 10 | enterprisedb | walreceiver | 172.19.12.61 | | 53709 | 21-APR-22 04:01:23.599989 -07:00 |
| streaming | 0/4000148 | 0/4000148 | 0/4000148 | 0/4000148 | | | | 0 | async | 21-A
PR-22 04:01:43.770416 -07:00
(1 row)
[enterprisedb@localhost ~]$ /usr/edb/as14/bin/psql -p 5468 edb
psql (14.2.1, server 14.2.1)
Type "help" for help.
edb=# select pg_create_physical_replication_slot('replica');
pg_create_physical_replication_slot
(replica,)
(1 row)
edb=# select * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_l
sn | wal_status | safe_wal_size | two_phase
replica | | physical | | | f | f | | | | |
| | | f
(1 row)
[root@localhost efm-4.4]# /usr/edb/efm-4.4/bin/efm cluster-status efm
Cluster Status: efm
Agent Type Address DB VIP
Witness 172.19.10.189 N/A
Primary 172.19.11.73 UP
Standby 172.19.12.61 UP
Allowed node host list:
172.19.11.73 172.19.12.61 172.19.10.189
Membership coordinator: 172.19.11.73
Standby priority host list:
172.19.12.61
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
Primary 172.19.11.73 0/4000148
Standby 172.19.12.61 0/4000148 0/4000148
Standby database(s) in sync with primary. It is safe to promote.
[root@localhost efm-4.4]# /usr/edb/efm-4.4/bin/efm promote efm -switchover
Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original primary. Run the 'cluster-status' command for information about the new cluster state.
[root@localhost efm-4.4]# /usr/edb/efm-4.4/bin/efm cluster-status efm
Cluster Status: efm
Agent Type Address DB VIP
Witness 172.19.10.189 N/A
Standby 172.19.11.73 UP
Primary 172.19.12.61 UP
Allowed node host list:
172.19.11.73 172.19.12.61 172.19.10.189
Membership coordinator: 172.19.11.73
Standby priority host list:
172.19.11.73
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
Primary 172.19.12.61 0/40002D8
Standby 172.19.11.73 0/40002D8 0/40002D8
Standby database(s) in sync with primary. It is safe to promote.
[enterprisedb@localhost ~]$ /usr/edb/as14/bin/psql -p 5468 edb
psql (14.2.1, server 14.2.1)
Type "help" for help.
edb=# select * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_l
sn | wal_status | safe_wal_size | two_phase
replica | | physical | | | f | t | 106061 | | | 0/40002D8 |
| reserved | | f
(1 row)
edb=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | back
end_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state |
reply_time
106061 | 10 | enterprisedb | walreceiver | 172.19.11.73 | | 35062 | 21-APR-22 04:28:42.667902 -07:00 |
| streaming | 0/40002D8 | 0/40002D8 | 0/40002D8 | 0/40002D8 | | | | 0 | async | 21-
APR-22 04:29:22.783566 -07:00
(1 row)
[root@localhost efm-4.4]# /usr/edb/as14/bin/psql -p 5468 edb -U enterprisedb
psql (14.2.1, server 14.2.1)
Type "help" for help.
edb=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn
| write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
(0 rows)
edb=# select * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_l
sn | wal_status | safe_wal_size | two_phase
replica | | physical | | | f | f | | | | 0/40002D8 |
| reserved | | f
(1 row)
Primary:
2022-04-21 04:26:08 com.enterprisedb.efm.utils.PhysicalSlotMonitor stopMonitoring INFO: Stopped the physical slot monitor
2022-04-21 04:26:11 com.enterprisedb.efm.utils.ClusterUtils getObjectFromNode INFO: Sending request physical_slot_info_request to node localhost-2166(172.19.12.61)
2022-04-21 04:26:13 com.enterprisedb.efm.nodes.EfmAgent advancePhysicalReplicationSlots INFO: slot info to advance: {replica=0/40002D8}
i. Shutdown Primary database
[root@localhost ~]# /usr/edb/efm-4.4/bin/efm cluster-status efm
Cluster Status: efm
Agent Type Address DB VIP
Witness 172.19.10.189 N/A
Primary 172.19.11.73 UP
Standby 172.19.12.61 UP
Allowed node host list:
172.19.11.73 172.19.10.189 172.19.12.61
Membership coordinator: 172.19.11.73
Standby priority host list:
172.19.12.61
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
Standby 172.19.12.61 0/4000518 0/4000518
UNKNOWN 172.19.11.73 UNKNOWN UNKNOWN Connection to 172.19.11.73:5468 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
No primary database was found.
[root@localhost ~]# /usr/edb/efm-4.4/bin/efm cluster-status efm
Cluster Status: efm
Agent Type Address DB VIP
Witness 172.19.10.189 N/A
Idle 172.19.11.73 UNKNOWN
Primary 172.19.12.61 UP
Allowed node host list:
172.19.11.73 172.19.10.189 172.19.12.61
Membership coordinator: 172.19.11.73
Standby priority host list:
(List is empty.)
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
Primary 172.19.12.61 0/4000630
No standby databases were found.
Idle Node Status (idle nodes ignored in WAL LSN comparisons):
Address WAL Received LSN WAL Replayed LSN Info
172.19.11.73 UNKNOWN UNKNOWN Connection to 172.19.11.73:5468 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
ii. Slot is inactive at the moment:
edb=# select * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn |
wal_status | safe_wal_size | two_phase
replica | | physical | | | f | f | | | | 0/40002D8 | |
reserved | | f
(1 row)
-bash-4.2$ /usr/edb/as14/bin/pg_basebackup -D /var/lib/edb/as14/data/ -h 172.19.12.61 -p 5468
-bash-4.2$ touch /var/lib/edb/as14/data/standby.signal
[root@localhost ~]# /usr/edb/efm-4.4/bin/efm cluster-status efm
Cluster Status: efm
Agent Type Address DB VIP
Witness 172.19.10.189 N/A
Standby 172.19.11.73 UP
Primary 172.19.12.61 UP
Allowed node host list:
172.19.11.73 172.19.10.189 172.19.12.61
Membership coordinator: 172.19.11.73
Standby priority host list:
172.19.11.73
Promote Status:
DB Type Address WAL Received LSN WAL Replayed LSN Info
Primary 172.19.12.61 0/6000060
Standby 172.19.11.73 0/6000060 0/6000060
Standby database(s) in sync with primary. It is safe to promote.
edb=# select * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn |
wal_status | safe_wal_size | two_phase
replica | | physical | | | f | t | 114527 | | | 0/6000060 | |
reserved | | f
(1 row)
edb=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_x
min | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | re
ply_time
114527 | 10 | enterprisedb | walreceiver | 172.19.11.73 | | 35290 | 21-APR-22 22:15:45.318252 -07:00 |
| streaming | 0/6000060 | 0/6000060 | 0/6000060 | 0/6000060 | | | | 0 | async | 21-APR-22 22:
16:15.407942 -07:00
(1 row)