This article is based on the premise of a 2 node cluster managed by repmgr outside of EFM or pg_bouncer, whilst also maintaining a virtual ip address for whichever node is primary.
We will be using network manager to manage the network devices.
Starting off with the 2 nodes
-
repmgr-node1
-
IP address : 192.168.0.51
-
Physical network device : enp0s3
-
repmgr-node2
-
IP address : 192.168.0.52
-
Physical network device : enp0s3
-
Virtual Ip setup
-
IP address : 192.168.0.54
-
Network alias : postgres-vip
We will also use the host/ip address of a witness or application server: app-server with the ip address of 192.168.0.50
Let's check the current network of the servers using ip a as shown below, with the generated output. Any changes must be applied to both nodes.
-bash-4.2$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:d5:ed:cf brd ff:ff:ff:ff:ff:ff
inet 192.168.0.52/24 brd 192.168.0.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fed5:edcf/64 scope link
valid_lft forever preferred_lft forever
Log in as root. In network manager, we clone physical devices, as opposed to creating aliases using the old ifconfig approach, using the nmcli command:
- Clone the physical device to our named device
nmcli c clone enp0s3 postgres-vip
- Assign an ip address to this device as follows
nmcli c mod postgres-vip +ipv4.addresses 192.168.0.54/24
- To start this device simply, up it
nmcli c up postgres-vip
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/26)
To check that it is up, use ip a once again
-bash-4.2$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:f5:75:45 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.51/24 brd 192.168.0.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.0.54/24 brd 192.168.0.255 scope global secondary noprefixroute enp0s3
valid_lft forever preferred_lft forever
As can be seen net 192.168.0.54/24 is now available
Or you can use ip addr show
ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:f5:75:45 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.51/24 brd 192.168.0.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.0.54/24 brd 192.168.0.255 scope global secondary noprefixroute enp0s3
valid_lft forever preferred_lft forever
- Stopping or downing the virtual interface. Stopping the interface is straight forward, using the nmcli command once again.
nmcli c down postgres-vip
Connection 'postgres-vip' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/26)
Checking, again using ip addr show:
ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:f5:75:45 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.51/24 brd 192.168.0.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
You can now see that the virtual ip is no longer shown.
Please review the official repmgr documentation available at the following link:
https://www.enterprisedb.com/postgres-tutorials/how-implement-repmgr-postgresql-automatic-failover
A summary of the configuration for use with this document is as follows, using PostgreSQL 15 :
Include repmgr version specific for your Postgres version:
dnf install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
dnf install postgresql15-server postgresql15-contrib repmgr_15
Specifically for repmgr to be used, add the following to the postgresql.conf of the primary server.
listen_addresses = '*' # what IP address(es) to listen on;
max_wal_senders = 10
max_replication_slots = 10
wal_level = 'replica'
hot_standby = on
archive_mode = on
archive_command = '/bin/true'
shared_preload_libraries = 'repmgr'
Obviously those values can be tweaked to suit.
With the database already created, create the repmgr user (this has to be a super user) and a database for the repmgr metadata.
create user --superuser repmgr;
create database repmgr with owner repmgr;
The following changes will need applying to the pg_hba.conf
local replication repmgr trust
host replication repmgr 127.0.0.1/32 trust
host replication repmgr ::1/128 trust
host replication repmgr 192.168.0.0/24 trust
local repmgr repmgr trust
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr ::1/128 trust
host repmgr repmgr 192.168.0.0/24 trust
Again, adjust to suit your environment.
It makes sense to allow members of a dbadmin operating system group to be able to stop and start the postgresql systemd service, along with the ability to bring up or take down the virtual ip address that should be associated with the primary.
As root execute the following:
groupadd dbadmin
usermod -a -G dbadmin postgres
Again, the sudo setup must also be applied to both nodes. Using visudo add the following. Do not edit the file directly using vi or any other text editor:
Defaults exempt_group=dbadmin
%dbadmin ALL = NOPASSWD: /usr/bin/systemctl [stop|start|restart|reload|status]* [postgres|edb|repmgr]*,! /usr/bin/systemctl *able*, /bin/nmcli c [down|up]* postgres-vip
We are going to want to bring up the virtual ip address when a node is promoted. Create a script, eg, /var/lib/postgres/repmgr/promote.sh and make it executable. The contents of this file would be, for this example, as shown below
/usr/pgsql-15/bin/repmgr standby promote -f /etc/repmgr/15/repmgr.conf
failed_node=`/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf cluster show --verbose|grep failed|awk '{print $3}'`
ssh -o ConnectTimeout=2 -o ConnectionAttempts=1 ${failed_node} sudo nmcli c down postgres-vip
sudo nmcli c up postgres-vip
This script will be utilized in the repmgr.conf file. This has to be edited by root or via sudo, if configured.
node_id=1
node_name=repmgr-node1
pg_bindir='/usr/pgsql-15/bin'
conninfo='host=repmgr-node1 user=repmgr connect_timeout=2'
data_directory='/var/lib/pgsql/15/5432_data'
failover=automatic
promote_command='/var/lib/pgsql/repmgr/promote.sh'
follow_command='/usr/pgsql-15/bin/repmgr standby follow -f /etc/repmgr/15/repmgr.conf'
primary_visibility_consensus=true
The above is the config for node1. This should match what is on node 2, with the relevant values substituted accordingly. You can see our script that will also bring up the network address:
promote_command='/var/lib/pgsql/repmgr/promote.sh'
Again, that script, would ideally be placed in the same location on each node.
Once the above configuration has been completed, we can now register the primary server as follows:
Log in as postgres and execute the following:
/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf primary register
Once the primary server has been registered, we can now clone the primary database to the secondary server. /etc/repmgr/15/repmgr.conf for the secondary node has to be in place and configured, likewise, as should any related scripts also be in the relevant locations.
Again as the postgres user, on the secondary node repmgr-node2, execute the following:
/usr/pgsql-15/bin/repmgr -F -h repmgr-node1 -U repmgr -d repmgr -f /etc/repmgr/15/repmgr.conf standby clone
Once this process has completed, start postgresql, as the root user
systemctl start postgresql-15
Register the standby as follows, once again, as the postgres user
/usr/pgsql-15/bin/repmgr -F -h ${hostname} -U repmgr -d repmgr standby register --force
Finally, confirm all is well by checking the cluster status
-bash-4.2$ /usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf cluster show --verbose
NOTICE: using provided configuration file "/etc/repmgr/15/repmgr.conf"
INFO: connecting to database
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
1 | repmgr-node1 | primary | * running | | default | 100 | 25 | host=repmgr-node1 user=repmgr connect_timeout=2
2 | repmgr-node2 | standby | running | repmgr-node1 | default | 100 | 25 | host=repmgr-node2 user=repmgr connect_timeout=2
Shutting down postgres normally makes for a simple and controlled failover. Log in as postgres
Show the current status
/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf cluster show --verbose
NOTICE: using provided configuration file "/etc/repmgr/15/repmgr.conf"
INFO: connecting to database
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
1 | repmgr-node1 | primary | * running | | default | 100 | 13 | host=repmgr-node1 user=repmgr connect_timeout=2
2 | repmgr-node2 | standby | running | repmgr-node1 | default | 100 | 13 | host=repmgr-node2 user=repmgr connect_timeout=2
As can be seen, repmgr-node1 is our primary. Also, using ip a we can see the virtual ip address is also active on repmgr-node1.
Shut the primary down on repmgr-node1
sudo systemctl stop postgresql-15
Now on node 2 check the cluster status
/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf cluster show --verbose
NOTICE: using provided configuration file "/etc/repmgr/15/repmgr.conf"
INFO: connecting to database
ERROR: connection to database failed
DETAIL:
connection to server at "repmgr-node1" (192.168.0.51), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
DETAIL: attempted to connect using:
user=repmgr connect_timeout=2 host=repmgr-node1 fallback_application_name=repmgr options=-csearch_path=
ERROR: connection to database failed
DETAIL:
connection to server at "repmgr-node1" (192.168.0.51), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
DETAIL: attempted to connect using:
user=repmgr connect_timeout=2 host=repmgr-node1 fallback_application_name=repmgr options=-csearch_path=
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
1 | repmgr-node1 | primary | ? unreachable | ? | default | 100 | | host=repmgr-node1 user=repmgr connect_timeout=2
2 | repmgr-node2 | standby | running | ? repmgr-node1 | default | 100 | 13 | host=repmgr-node2 user=repmgr connect_timeout=2
WARNING: following issues were detected
- when attempting to connect to node "repmgr-node1" (ID: 1), following error encountered :
"connection to server at "repmgr-node1" (192.168.0.51), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?"
- node "repmgr-node1" (ID: 1) is registered as an active primary but is unreachable
- unable to connect to node "repmgr-node2" (ID: 2)'s upstream node "repmgr-node1" (ID: 1)
- unable to determine if node "repmgr-node2" (ID: 2) is attached to its upstream node "repmgr-node1" (ID: 1)
After a short while, we can check once again
/usr/pgsql-15/bin/repmgr -f /etc/repmgr/15/repmgr.conf cluster show --verbose
NOTICE: using provided configuration file "/etc/repmgr/15/repmgr.conf"
WARNING: the following problems were found in the configuration file:
parameter "cluster" is deprecated and will be ignored
INFO: connecting to database
ERROR: connection to database failed
DETAIL:
connection to server at "repmgr-node1" (192.168.0.51), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?
DETAIL: attempted to connect using:
user=repmgr connect_timeout=2 host=repmgr-node1 fallback_application_name=repmgr options=-csearch_path=
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
1 | repmgr-node1 | primary | - failed | ? | default | 100 | | host=repmgr-node1 user=repmgr connect_timeout=2
2 | repmgr-node2 | primary | * running | | default | 100 | 14 | host=repmgr-node2 user=repmgr connect_timeout=2
WARNING: following issues were detected
- when attempting to connect to node "repmgr-node1" (ID: 1), following error encountered :
"connection to server at "repmgr-node1" (192.168.0.51), port 5432 failed: Connection refused
Is the server running on that host and accepting TCP/IP connections?"
As can be seen, repmgr-node2 is now running as the primary. What about that virtual IP address?
on repmgr-node1 use ip a again.
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:f5:75:45 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.51/24 brd 192.168.0.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
As can be seen the virtual ip address have been removed. On repmgr-node2
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:d5:ed:cf brd ff:ff:ff:ff:ff:ff
inet 192.168.0.52/24 brd 192.168.0.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.0.54/24 brd 192.168.0.255 scope global secondary noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fed5:edcf/64 scope link
valid_lft forever preferred_lft forever
The virtual ip address has followed to the new primary.
By default if a server is running fine but loses its network connection, due to an error occurring on the network, once the connectivity is restored, if a virtual ip address was running before the outage, it will still be there. This gives rise to the possibility of a split brain scenario, as the second node may have become the primary node. This is not a fault of repmgr. It is behaving exactly as expected. On the primary node, if the network is lost, but the database is fine, then repmgr will simply be unable to see the standby node. However, this will not cause the primary to become the secondary node, nor should it. Also, with a virtual ip address, repmgr would not control this. However, on the standby node, repmgr will see that the primary node is unreachable, and promote the standby, and, if the relevant scripts are provided, start the virtual ip interface. This is where the risk the of split occurring can happen, as, once the network is restored, the virtual ip interface will be active on both servers.
To help manage this risk we can created a script that is called by cron every 30 seconds. This script will ping another server, be it the witness server, or, the application server. If ping fails for 5 times the virtual network interface is brought down, and, as a catchall, so is the postgresql cluster, thus avoiding split brain.
The script has the contents
#!/bin/sh
site='app-server'
sitedblchk='repmgr-node3'
#DEFAULT_ROUTE=$(ip route show default | awk '/default/ {print $3}')
#site=${DEFAULT_ROUTE}
chk_count=0
until $(ping -q -c1 ${site} > /dev/null 2>&1)
do
echo "${site} is unreachable. Retrying"
chk_count=$[chk_count + 1]
if [[ $chk_count -eq 4 ]] ; then
if ping -q -c 1 -W 1 ${sitedblcheck} >/dev/null; then
echo "IPv4 is up"
exit
else
echo "This IP is also unavailable"
fi
sudo nmcli c down postgres-vip
sudo systemctl stop postgresql-15
exit
fi
echo $chk_count
# continue
done
echo "online"
Because sudo has been configured, this script can be called via the postgres user cron. Using crontab -e add the following to the cron file, on both servers. Obivously, the number of network checks can be increased by modifying the the script above.
* * * * * /var/lib/pgsql/repmgr/nw_check.sh 2>&1
* * * * * sleep 30 && /var/lib/pgsql/repmgr/nw_check.sh 2>&1
Which is currently repmgr-node2
INFO: connecting to database
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
1 | repmgr-node1 | standby | running | repmgr-node2 | default | 100 | 20 | host=repmgr-node1 user=repmgr connect_timeout=2
2 | repmgr-node2 | primary | * running | | default | 100 | 20 | host=repmgr-node2 user=repmgr connect_timeout=2
Network connectivity has been removed from repmgr-node2
Pinging for the virtual ip address is no longer working
64 bytes from 192.168.0.54: icmp_seq=14084 ttl=64 time=0.411 ms
From 192.168.0.110 icmp_seq=14093 Destination Host Unreachable
From 192.168.0.110 icmp_seq=14094 Destination Host Unreachable
After a little while
Ping has started working again for the virtual ip address
From 192.168.0.110 icmp_seq=14155 Destination Host Unreachable
64 bytes from 192.168.0.54: icmp_seq=14156 ttl=64 time=7.20 ms
64 bytes from 192.168.0.54: icmp_seq=14157 ttl=64 time=0.557 ms
On repmgr-node1 the virtaul ip address interface has been brought up.
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:f5:75:45 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.51/24 brd 192.168.0.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.0.54/24 brd 192.168.0.255 scope global secondary noprefixroute enp0s3
valid_lft forever preferred_lft forever
repmgr-node2 is still unreachable
ping 192.168.0.52
PING 192.168.0.52 (192.168.0.52) 56(84) bytes of data.
From 192.168.0.110 icmp_seq=10 Destination Host Unreachable
From 192.168.0.110 icmp_seq=11 Destination Host Unreachable
From 192.168.0.110 icmp_seq=12 Destination Host Unreachable
From 192.168.0.110 icmp_seq=13 Destination Host Unreachable
From 192.168.0.110 icmp_seq=17 Destination Host Unreachable
From 192.168.0.110 icmp_seq=18 Destination Host Unreachable
From 192.168.0.110 icmp_seq=19 Destination Host Unreachable
ping: sendmsg: No route to host
From 192.168.0.110 icmp_seq=20 Destination Host Unreachable
From 192.168.0.110 icmp_seq=21 Destination Host Unreachable
From 192.168.0.110 icmp_seq=23 Destination Host Unreachable
From 192.168.0.110 icmp_seq=24 Destination Host Unreachable
Cluster status shows the running primary is on repmgr-node1
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
1 | repmgr-node1 | primary | * running | | default | 100 | 21 | host=repmgr-node1 user=repmgr connect_timeout=2
2 | repmgr-node2 | primary | - failed | ? | default | 100 | | host=repmgr-node2 user=repmgr connect_timeout=2
Now allow network connectivity for repmgr-node2
Ping now works:
ping 192.168.0.52
PING 192.168.0.52 (192.168.0.52) 56(84) bytes of data.
64 bytes from 192.168.0.52: icmp_seq=1 ttl=64 time=0.275 ms
64 bytes from 192.168.0.52: icmp_seq=2 ttl=64 time=0.317 ms
64 bytes from 192.168.0.52: icmp_seq=3 ttl=64 time=0.566 ms
Log in to repmgr-node2 and check for the virtual ip address.
ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:d5:ed:cf brd ff:ff:ff:ff:ff:ff
inet 192.168.0.52/24 brd 192.168.0.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fed5:edcf/64 scope link
valid_lft forever preferred_lft forever
As can be seen the virtual ip address is no longer active on repmgr-node2
Now for the database cluster
sudo systemctl status postgresql-15
● postgresql-15.service - PostgreSQL 15 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-15.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Thu 2023-05-04 16:00:50 BST; 7min ago
Docs: https://www.postgresql.org/docs/15/static/
Process: 14745 ExecStart=/usr/pgsql-15/bin/postmaster -D ${PGDATA} (code=exited, status=0/SUCCESS)
Process: 14739 ExecStartPre=/usr/pgsql-15/bin/postgresql-15-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
Main PID: 14745 (code=exited, status=0/SUCCESS)
May 04 15:38:27 repmgr-node2 systemd[1]: Starting PostgreSQL 15 database server...
May 04 15:38:27 repmgr-node2 postmaster[14745]: 2023-05-04 15:38:27.884 BST [14745] LOG: redirecting log output to logging collector process
May 04 15:38:27 repmgr-node2 postmaster[14745]: 2023-05-04 15:38:27.884 BST [14745] HINT: Future log output will appear in directory "log".
May 04 15:38:28 repmgr-node2 systemd[1]: Started PostgreSQL 15 database server.
May 04 16:00:44 repmgr-node2 systemd[1]: Stopping PostgreSQL 15 database server...
May 04 16:00:50 repmgr-node2 systemd[1]: Stopped PostgreSQL 15 database server.
As can be seen the database cluster is also shut down.
This node can then be rebuilt as normal, and a split brain scenario has been avoided.