BDR
is a tool that provides multi-master replication between PostgreSQL
nodes using logical replication.
The cluster can eventually face data divergence among nodes, usually caused by how users or applications perform things in the database. For example: if a user executes some non-replicated DML, it is likely that there will be data divergence among nodes as a consequence.
In that case, LiveCompare
can be used to compare the state of data among nodes, identify which node is divergent and suggest DML commands to make the divergent nodes compliant with the others.
In this article we will explain how to set up a simple BDR
test cluster, then introduce some divergence, and lastly how to execute LiveCompare
to compare and later fix the divergences.
In this section we will share the steps to set up the test cluster.
At the time this article was written, the latest version of TPAexec
, which was the tool used to deploy the test cluster, was v23.6
. LiveCompare
on its turn, was on version 2.3.0
.
The test cluster was deployed using Docker
containers, and looks like this:
-
BDR
version:3.7
-
BDR
database name:bdrdb
- 2
BDR
nodes: -
ts-54-bdr-node-1
: BDR master node -
ts-54-bdr-node-2
: BDR master node
Note: having a BDR
cluster with an even number of nodes is not ideal. We have configured it that way to show the behavior of difference_tie_breakers
setting in LiveCompare
later in this article.
LiveCompare
was also installed through TPAexec
. We only need to install LiveCompare
in one of the nodes -- it could actually even be installed in a node that is not part of this BDR
cluster. We chose to install it in ts-54-bdr-node-1
. The steps are:
1- Create a TPAexec
cluster configuration. In our example we are storing the cluster configuration under ~/Clusters/ts-54
:
tpaexec configure ~/Clusters/ts-54 --architecture BDR-Always-ON --layout bronze --platform docker --harp-consensus-protocol bdr --use-volatile-subscriptions
2- The previous command will create a template configuration for the cluster. However, we would like to simplify the architecture a bit by removing some nodes, so the cluster has only 2 BDR nodes, and installing LiveCompare as an additional package, as described before. So we change the file ~/Clusters/ts-54/config.yml
to look like this:
--
architecture: BDR-Always-ON
cluster_name: ts-54
cluster_tags: {}
cluster_vars:
bdr_database: bdrdb
bdr_node_group: bdrgroup
bdr_version: '3'
enable_pg_backup_api: false
extra_postgres_extensions:
- pglogical
postgres_coredump_filter: '0xff'
postgres_version: '13'
postgresql_flavour: postgresql
preferred_python_version: python3
tpa_2q_repositories:
- products/bdr3_7/release
- products/pglogical3_7/release
- products/livecompare/release
use_volatile_subscriptions: true
instance_defaults:
image: tpa/rocky:8
platform: docker
vars:
ansible_user: root
instances:
- Name: ts-54-bdr-node-1
node: 1
role:
- bdr
vars:
packages:
common:
- edb-livecompare
- Name: ts-54-bdr-node-2
node: 2
role:
- bdr
3- Ask TPAexec
to provision the cluster:
tpaexec provision ~/Clusters/ts-54
4- Let TPAexec
deploy the cluster. This step usually take 6 minutes or so:
tpaexec deploy ~/Clusters/ts-54
When the deploy finishes, you will have a functional BDR
cluster with the aforementioned characteristics.
In this section we will create a simple test table named test
, fill it with some data and then cause nodes to be divergent.
1- In node ts-54-bdr-node-1
, create the test
table:
CREATE TABLE test (
id integer PRIMARY KEY,
value text NOT NULL
);
2- In node ts-54-bdr-node-1
, fill the test
table with data:
INSERT INTO test (
id,
value
) VALUES (
1,
'One'
), (
2,
'Two'
), (
3,
'Three'
), (
4,
'Four'
), (
5,
'Five'
), (
6,
'Six'
), (
7,
'Seven'
), (
8,
'Eight'
), (
9,
'Nine'
), (
10,
'Ten'
);
At this point both nodes ts-54-bdr-node-1
and ts-54-bdr-node-2
will have the table test
filled with the 10 tuples that were INSERT
ed through node ts-54-bdr-node-1
.
3- In node ts-54-bdr-node-2
, create divergences by running non-replicated transactions:
BEGIN;
SET LOCAL bdr.xact_replication TO off;
DELETE
FROM test
WHERE id = 1;
UPDATE test
SET value = 'random'
WHERE id = 4;
INSERT INTO test (
id,
value
) VALUES (
11,
'Eleven'
);
COMMIT;
At this point, if you check the test
table state in both nodes, you will see they diverge to each other. Node ts-54-bdr-node-1
will show the original data:
bdrdb=# TABLE test;
id | value
1 | One
2 | Two
3 | Three
4 | Four
5 | Five
6 | Six
7 | Seven
8 | Eight
9 | Nine
10 | Ten
(10 rows)
And node ts-54-bdr-node-2
will show the new version of the data, which was not replicated to the other node:
bdrdb=# TABLE test;
id | value
2 | Two
3 | Three
5 | Five
6 | Six
7 | Seven
8 | Eight
9 | Nine
10 | Ten
4 | random
11 | Eleven
(10 rows)
Let's say we did not know that there was data divergence among BDR nodes, but that there was a suspicion that it was the case.
We could run LiveCompare
to compare the data state among nodes, and base on its consensus algorithm it would be able to spot the possibly divergent node and propose DML commands to fix the issues.
As mentioned before, in the test cluster LiveCompare
was installed in the node ts-54-bdr-node-1
. Again, that was made that way for sake of simplicity, and LiveCompare
could have been installed in any server, even in one that is not part of the cluster.
In order to use LiveCompare
, we will proceed as follows:
1- In node ts-54-bdr-node-1
, create the database liveoutput
for LiveCompare
metadata:
CREATE DATABASE liveoutput;
That database will be used by LiveCompare
as its output connection, where it will store its metadata about the comparison sessions. That database could be created with a name of your choice, and in any PostgreSQL
cluster, even in a node that is not part of the BDR
cluster. For sake of simplicity, we created that database in node ts-54-bdr-node-1
, and named the database liveoutput
.
2- In node ts-54-bdr-node-1
, create a simple LiveCompare
configuration file, named livecompare.ini
, with the following contents:
[General Settings]
logical_replication_mode = bdr
max_parallel_workers = 1
all_bdr_nodes = on
difference_tie_breakers = ts-54-bdr-node-2
[Initial Connection]
dsn = dbname=bdrdb user=postgres
[Output Connection]
dsn = dbname=liveoutput user=postgres
[Table Filter]
replication_sets = set_name = 'bdrgroup'
The file is written in INI
format, and is a very simple configuration file for LiveCompare
. These are the details:
-
logical_replication_mode = bdr
: informsLiveCompare
that we are comparing aBDR
cluster, so the tool can provide some extended facilities when connecting to nodes, filtering objects and generating output DML scripts. -
max_parallel_workers = 1
: specify thatLiveCompare
can spawn only 1 process to compare data.LiveCompare
is able to compare multiple tables in parallel, or to split a given table comparison between parallel processes. In this case, as we have only a simple table in the BDR cluster, we would not benefit from that kind of parallelism. It is worth noting that LiveCompare is also able to benefit from parallelism by splitting a table comparison among workers. That would requireparallel_chunk_rows
to be set to the maximum number of tuples a worker is able to compare. As it is not specified, it defaults to0
, which disables table splitting. -
all_bdr_nodes = on
: it is a facility inLiveCompare
that instructs it to fetch the connection strings of allBDR
nodes from theBDR
metadata found in the node specified in the[Initial Connection]
section. By doing that, the user will not need to manually specify a data connection section for each of the BDR nodes. -
difference_tie_breakers = ts-54-bdr-node-2
: our cluster is composed of 2BDR
master nodes. As a consequence,LiveCompare
will never be able to find a winning data state in the comparison, as it would have 1BDR
node saying "state is A", and anotherBDR
node saying "state is B". This setting instructsLiveCompare
to use theBDR
node namedts-54-bdr-node-2
(as perbdr.node_summary.node_name
) as a tie breaker. -
[Initial Connection]
: as mentioned in the previous item, this connection is used byLiveCompare
to find the connection strings to all otherBDR
nodes that are in the cluster. Thepostgres
user was used only for sake of simplicity. You can use any role that is able to access and fetch information fromBDR
metadata. -
[Output Connection]
: points to the output database that will be used byLiveCompare
to store its metadata about the comparison sessions. Thepostgres
user was used only for sake of simplicity. You can use any role that is able to access and create objects in the output database. -
[Table Filter]
: we used this section to specify thatLiveCompare
should only focus on tables that are part of thebdrgroup
replication set. In this case we only have that replication set in ourBDR
cluster, and the tabletest
is subscribed to that replication set. However, in a production environment it is possible to find clusters using multiple replication sets.
Please note that LiveCompare
has a lot of settings that can be used in the INI
file. All settings that are not specified will assume a default value. For example: consensus_mode
will be by default simple_majority
, which will struct LiveCompare
to determine that the desired data state is the one that is found in the majority of the nodes. You can have more details about all LiveCompare
settings in its documentation.
3- In node ts-54-bdr-node-1
, run LiveCompare
based on the above INI
file:
livecompare livecompare.ini
The console output of the tool will look like this:
postgres@ts-54-bdr-node-1:~ $ livecompare livecompare.ini
LiveCompare session 1, compare mode, successfully finished.
Outstanding differences:
| table_name | elapsed_time | num_total_rows | num_processed_rows | num_differences | max_num_ignored_columns |
| public.test | 0:00:00.031073 | 11 | 11 | 3 | 0 |
Saved file ./lc_session_1/summary_20221101.out with the complete table summary.
You can also get the table summary by connecting to the output database and executing:
select * from livecompare.vw_table_summary where session_id = 1;
Elapsed time: 0:00:00.787843
Processed 11 rows in 1 tables from 2 connections using 1 workers.
Found 3 inconsistent rows in 1 tables.
Saved file ./lc_session_1/differences_20221101.out with the list of differences per table.
You can also get a list of differences per table with:
select * from livecompare.vw_open_differences where session_id = 1;
Too see more details on how LiveCompare determined the differences:
select * from livecompare.vw_consensus where session_id = 1;
Script ./lc_session_1/apply_on_the_ts-54-bdr-node-1_20221101.sql was generated, which can be applied to the ts-54-bdr-node-1 connection and make it consistent with the majority of connections.
You can also get this script with:
select difference_fix_dml from livecompare.vw_difference_fix where session_id = 1 and connection_id = 'ts-54-bdr-node-1' and difference_status = 'D' order by difference_fix_order;
Copyright EnterpriseDB UK Limited 2019-2021 - All rights reserved.
As we can see above, LiveCompare
ran the comparison session 1
, compared a single table public.test
and found 3
divergences in that table among the BDR nodes.
We can also see that LiveCompare
generated an output script containing DML commands to fix the node ts-54-bdr-node-1
:
postgres@ts-54-bdr-node-1:~ $ cat lc_session_1/apply_on_the_ts-54-bdr-node-1_20221101.sql
BEGIN;
SET LOCAL bdr.xact_replication = off;
SELECT pg_replication_origin_session_setup('bdr_local_only_origin');
SELECT pg_replication_origin_xact_setup('0/0', '2010-01-01'::timestamptz);;
SET LOCAL ROLE postgres;
DELETE FROM public.test WHERE (id) = (1);
UPDATE public.test SET value = $livecompare49fa974a7ce6849d500f4f5b47bf31781$random$livecompare49fa974a7ce6849d500f4f5b47bf31781$ WHERE (id) = (4);
INSERT INTO public.test (id, value) VALUES (11, $livecompare49fa974a7ce6849d500f4f5b47bf31781$Eleven$livecompare49fa974a7ce6849d500f4f5b47bf31781$);
COMMIT;
As we have earlier specified logical_replication_mode = bdr
, we can see the apply script contains some initial commands to turn off transaction replication and setting up a replication origin when fixing node ts-54-bdr-node-1
. Then, for each table (in this case only public.test
), the commands instruct to change the role to the owner of the table, and perform the DML
operations to make the node consistent with the winner nodes (in this case ts-54-bdr-node-2
).
4- In node ts-54-bdr-node-1
, apply the DML script generated by LiveCompare
:
psql -d bdrdb -f lc_session_1/apply_on_the_ts-54-bdr-node-1_*.sql
The output will look like this:
postgres@ts-54-bdr-node-1:~ $ psql -d bdrdb -f lc_session_1/apply_on_the_ts-54-bdr-node-1_20221101.sql
BEGIN
psql:lc_session_1/apply_on_the_ts-54-bdr-node-1_20221101.sql:3: WARNING: setting bdr.xact_replication = 'false' (previously 'true')
SET
pg_replication_origin_session_setup
(1 row)
pg_replication_origin_xact_setup
(1 row)
SET
DELETE 1
UPDATE 1
INSERT 0 1
COMMIT
And after that the data of the table test
will look like the following in both nodes:
bdrdb=# TABLE test ;
id | value
2 | Two
3 | Three
5 | Five
6 | Six
7 | Seven
8 | Eight
9 | Nine
10 | Ten
4 | random
11 | Eleven
(10 rows)
You are able to confirm that by running LiveCompare
in --recheck
mode, too. That comparison mode basically rechecks the state of data that was previously detected as divergent by LiveCompare
.
5- In node ts-54-bdr-node-1
, re-run LiveCompare
in --recheck
mode to confirm the previously reported divergences are gone:
livecompare livecompare.ini 1 --recheck
That means we are rechecking the divergences reported by session 1
, using the same INI
file livecompare.ini
.
The output will look like this:
postgres@ts-54-bdr-node-1:~ $ livecompare livecompare.ini 1 --recheck
LiveCompare session 1, recheck mode, successfully finished.
Saved file ./lc_session_1/summary_20221101.out with the complete table summary.
You can also get the table summary by connecting to the output database and executing:
select * from livecompare.vw_table_summary where session_id = 1;
Elapsed time: 0:00:00.618702
Processed 3 differences from 2 connections.
No remaining differences.
Copyright EnterpriseDB UK Limited 2019-2021 - All rights reserved.
Indicating that the divergences are gone.
It is important to note that the --recheck
mode can also be useful in case LiveCompare
default comparison mode (--compare
) was run while there was replication lag among the BDR
nodes. Keep in mind that replication lag can cause LiveCompare
to throw false positives, in which case --recheck
node can also be useful to check if the divergences are permanent or if the data eventually got consistent after the replication queue has progressed.