Quickstart with LiveCompare in a BDR cluster

Israel Barth
Israel Barth

BDR is a tool that provides multi-master replication between PostgreSQL nodes using logical replication.

The cluster can eventually face data divergence among nodes, usually caused by how users or applications perform things in the database. For example: if a user executes some non-replicated DML, it is likely that there will be data divergence among nodes as a consequence.

In that case, LiveCompare can be used to compare the state of data among nodes, identify which node is divergent and suggest DML commands to make the divergent nodes compliant with the others.

In this article we will explain how to set up a simple BDR test cluster, then introduce some divergence, and lastly how to execute LiveCompare to compare and later fix the divergences.

Set up a BDR test cluster

In this section we will share the steps to set up the test cluster.

At the time this article was written, the latest version of TPAexec, which was the tool used to deploy the test cluster, was v23.6. LiveCompare on its turn, was on version 2.3.0.

The test cluster was deployed using Docker containers, and looks like this:

  • BDR version: 3.7
  • BDR database name: bdrdb
  • 2 BDR nodes:
  • ts-54-bdr-node-1: BDR master node
  • ts-54-bdr-node-2: BDR master node

Note: having a BDR cluster with an even number of nodes is not ideal. We have configured it that way to show the behavior of difference_tie_breakers setting in LiveCompare later in this article.

LiveCompare was also installed through TPAexec. We only need to install LiveCompare in one of the nodes -- it could actually even be installed in a node that is not part of this BDR cluster. We chose to install it in ts-54-bdr-node-1. The steps are:

1- Create a TPAexec cluster configuration. In our example we are storing the cluster configuration under ~/Clusters/ts-54:

tpaexec configure ~/Clusters/ts-54 --architecture BDR-Always-ON --layout bronze --platform docker --harp-consensus-protocol bdr --use-volatile-subscriptions

2- The previous command will create a template configuration for the cluster. However, we would like to simplify the architecture a bit by removing some nodes, so the cluster has only 2 BDR nodes, and installing LiveCompare as an additional package, as described before. So we change the file ~/Clusters/ts-54/config.yml to look like this:

--
architecture: BDR-Always-ON
cluster_name: ts-54
cluster_tags: {}

cluster_vars:
bdr_database: bdrdb
bdr_node_group: bdrgroup
bdr_version: '3'
enable_pg_backup_api: false
extra_postgres_extensions:
- pglogical
postgres_coredump_filter: '0xff'
postgres_version: '13'
postgresql_flavour: postgresql
preferred_python_version: python3
tpa_2q_repositories:
- products/bdr3_7/release
- products/pglogical3_7/release
- products/livecompare/release
use_volatile_subscriptions: true

instance_defaults:
image: tpa/rocky:8
platform: docker
vars:
ansible_user: root

instances:
- Name: ts-54-bdr-node-1
node: 1
role:
- bdr
vars:
packages:
common:
- edb-livecompare
- Name: ts-54-bdr-node-2
node: 2
role:
- bdr

3- Ask TPAexec to provision the cluster:

tpaexec provision ~/Clusters/ts-54

4- Let TPAexec deploy the cluster. This step usually take 6 minutes or so:

tpaexec deploy ~/Clusters/ts-54

When the deploy finishes, you will have a functional BDR cluster with the aforementioned characteristics.

Create a test table

In this section we will create a simple test table named test, fill it with some data and then cause nodes to be divergent.

1- In node ts-54-bdr-node-1, create the test table:

CREATE TABLE test (
id integer PRIMARY KEY,
value text NOT NULL
);

2- In node ts-54-bdr-node-1, fill the test table with data:

INSERT INTO test (
id,
value
) VALUES (
1,
'One'
), (
2,
'Two'
), (
3,
'Three'
), (
4,
'Four'
), (
5,
'Five'
), (
6,
'Six'
), (
7,
'Seven'
), (
8,
'Eight'
), (
9,
'Nine'
), (
10,
'Ten'
);

At this point both nodes ts-54-bdr-node-1 and ts-54-bdr-node-2 will have the table test filled with the 10 tuples that were INSERTed through node ts-54-bdr-node-1.

3- In node ts-54-bdr-node-2, create divergences by running non-replicated transactions:

BEGIN;

SET LOCAL bdr.xact_replication TO off;

DELETE
FROM test
WHERE id = 1;

UPDATE test
SET value = 'random'
WHERE id = 4;

INSERT INTO test (
id,
value
) VALUES (
11,
'Eleven'
);

COMMIT;

At this point, if you check the test table state in both nodes, you will see they diverge to each other. Node ts-54-bdr-node-1 will show the original data:

bdrdb=# TABLE test;
id | value
1 | One
2 | Two
3 | Three
4 | Four
5 | Five
6 | Six
7 | Seven
8 | Eight
9 | Nine
10 | Ten
(10 rows)

And node ts-54-bdr-node-2 will show the new version of the data, which was not replicated to the other node:

bdrdb=# TABLE test;
id | value
2 | Two
3 | Three
5 | Five
6 | Six
7 | Seven
8 | Eight
9 | Nine
10 | Ten
4 | random
11 | Eleven
(10 rows)

Run LiveCompare to detect and fix divergences

Let's say we did not know that there was data divergence among BDR nodes, but that there was a suspicion that it was the case.

We could run LiveCompare to compare the data state among nodes, and base on its consensus algorithm it would be able to spot the possibly divergent node and propose DML commands to fix the issues.

As mentioned before, in the test cluster LiveCompare was installed in the node ts-54-bdr-node-1. Again, that was made that way for sake of simplicity, and LiveCompare could have been installed in any server, even in one that is not part of the cluster.

In order to use LiveCompare, we will proceed as follows:

1- In node ts-54-bdr-node-1, create the database liveoutput for LiveCompare metadata:

CREATE DATABASE liveoutput;

That database will be used by LiveCompare as its output connection, where it will store its metadata about the comparison sessions. That database could be created with a name of your choice, and in any PostgreSQL cluster, even in a node that is not part of the BDR cluster. For sake of simplicity, we created that database in node ts-54-bdr-node-1, and named the database liveoutput.

2- In node ts-54-bdr-node-1, create a simple LiveCompare configuration file, named livecompare.ini, with the following contents:

[General Settings]
logical_replication_mode = bdr
max_parallel_workers = 1
all_bdr_nodes = on
difference_tie_breakers = ts-54-bdr-node-2

[Initial Connection]
dsn = dbname=bdrdb user=postgres

[Output Connection]
dsn = dbname=liveoutput user=postgres

[Table Filter]
replication_sets = set_name = 'bdrgroup'

The file is written in INI format, and is a very simple configuration file for LiveCompare. These are the details:

  • logical_replication_mode = bdr: informs LiveCompare that we are comparing a BDR cluster, so the tool can provide some extended facilities when connecting to nodes, filtering objects and generating output DML scripts.
  • max_parallel_workers = 1: specify that LiveCompare can spawn only 1 process to compare data. LiveCompare is able to compare multiple tables in parallel, or to split a given table comparison between parallel processes. In this case, as we have only a simple table in the BDR cluster, we would not benefit from that kind of parallelism. It is worth noting that LiveCompare is also able to benefit from parallelism by splitting a table comparison among workers. That would require parallel_chunk_rows to be set to the maximum number of tuples a worker is able to compare. As it is not specified, it defaults to 0, which disables table splitting.
  • all_bdr_nodes = on: it is a facility in LiveCompare that instructs it to fetch the connection strings of all BDR nodes from the BDR metadata found in the node specified in the [Initial Connection] section. By doing that, the user will not need to manually specify a data connection section for each of the BDR nodes.
  • difference_tie_breakers = ts-54-bdr-node-2: our cluster is composed of 2 BDR master nodes. As a consequence, LiveCompare will never be able to find a winning data state in the comparison, as it would have 1 BDR node saying "state is A", and another BDR node saying "state is B". This setting instructs LiveCompare to use the BDR node named ts-54-bdr-node-2 (as per bdr.node_summary.node_name) as a tie breaker.
  • [Initial Connection]: as mentioned in the previous item, this connection is used by LiveCompare to find the connection strings to all other BDR nodes that are in the cluster. The postgres user was used only for sake of simplicity. You can use any role that is able to access and fetch information from BDR metadata.
  • [Output Connection]: points to the output database that will be used by LiveCompare to store its metadata about the comparison sessions. The postgres user was used only for sake of simplicity. You can use any role that is able to access and create objects in the output database.
  • [Table Filter]: we used this section to specify that LiveCompare should only focus on tables that are part of the bdrgroup replication set. In this case we only have that replication set in our BDR cluster, and the table test is subscribed to that replication set. However, in a production environment it is possible to find clusters using multiple replication sets.

Please note that LiveCompare has a lot of settings that can be used in the INI file. All settings that are not specified will assume a default value. For example: consensus_mode will be by default simple_majority, which will struct LiveCompare to determine that the desired data state is the one that is found in the majority of the nodes. You can have more details about all LiveCompare settings in its documentation.

3- In node ts-54-bdr-node-1, run LiveCompare based on the above INI file:

livecompare livecompare.ini

The console output of the tool will look like this:

postgres@ts-54-bdr-node-1:~ $ livecompare livecompare.ini
LiveCompare session 1, compare mode, successfully finished.

Outstanding differences:

| table_name | elapsed_time | num_total_rows | num_processed_rows | num_differences | max_num_ignored_columns |
| public.test | 0:00:00.031073 | 11 | 11 | 3 | 0 |

Saved file ./lc_session_1/summary_20221101.out with the complete table summary.
You can also get the table summary by connecting to the output database and executing:
select * from livecompare.vw_table_summary where session_id = 1;

Elapsed time: 0:00:00.787843
Processed 11 rows in 1 tables from 2 connections using 1 workers.
Found 3 inconsistent rows in 1 tables.

Saved file ./lc_session_1/differences_20221101.out with the list of differences per table.
You can also get a list of differences per table with:
select * from livecompare.vw_open_differences where session_id = 1;
Too see more details on how LiveCompare determined the differences:
select * from livecompare.vw_consensus where session_id = 1;

Script ./lc_session_1/apply_on_the_ts-54-bdr-node-1_20221101.sql was generated, which can be applied to the ts-54-bdr-node-1 connection and make it consistent with the majority of connections.
You can also get this script with:
select difference_fix_dml from livecompare.vw_difference_fix where session_id = 1 and connection_id = 'ts-54-bdr-node-1' and difference_status = 'D' order by difference_fix_order;


Copyright EnterpriseDB UK Limited 2019-2021 - All rights reserved.

As we can see above, LiveCompare ran the comparison session 1, compared a single table public.test and found 3 divergences in that table among the BDR nodes.

We can also see that LiveCompare generated an output script containing DML commands to fix the node ts-54-bdr-node-1:

postgres@ts-54-bdr-node-1:~ $ cat lc_session_1/apply_on_the_ts-54-bdr-node-1_20221101.sql
BEGIN;

SET LOCAL bdr.xact_replication = off;
SELECT pg_replication_origin_session_setup('bdr_local_only_origin');
SELECT pg_replication_origin_xact_setup('0/0', '2010-01-01'::timestamptz);;

SET LOCAL ROLE postgres;
DELETE FROM public.test WHERE (id) = (1);
UPDATE public.test SET value = $livecompare49fa974a7ce6849d500f4f5b47bf31781$random$livecompare49fa974a7ce6849d500f4f5b47bf31781$ WHERE (id) = (4);
INSERT INTO public.test (id, value) VALUES (11, $livecompare49fa974a7ce6849d500f4f5b47bf31781$Eleven$livecompare49fa974a7ce6849d500f4f5b47bf31781$);

COMMIT;

As we have earlier specified logical_replication_mode = bdr, we can see the apply script contains some initial commands to turn off transaction replication and setting up a replication origin when fixing node ts-54-bdr-node-1. Then, for each table (in this case only public.test), the commands instruct to change the role to the owner of the table, and perform the DML operations to make the node consistent with the winner nodes (in this case ts-54-bdr-node-2).

4- In node ts-54-bdr-node-1, apply the DML script generated by LiveCompare:

psql -d bdrdb -f lc_session_1/apply_on_the_ts-54-bdr-node-1_*.sql

The output will look like this:

postgres@ts-54-bdr-node-1:~ $ psql -d bdrdb -f lc_session_1/apply_on_the_ts-54-bdr-node-1_20221101.sql
BEGIN
psql:lc_session_1/apply_on_the_ts-54-bdr-node-1_20221101.sql:3: WARNING: setting bdr.xact_replication = 'false' (previously 'true')
SET
pg_replication_origin_session_setup

(1 row)

pg_replication_origin_xact_setup

(1 row)

SET
DELETE 1
UPDATE 1
INSERT 0 1
COMMIT

And after that the data of the table test will look like the following in both nodes:

bdrdb=# TABLE test ;
id | value
2 | Two
3 | Three
5 | Five
6 | Six
7 | Seven
8 | Eight
9 | Nine
10 | Ten
4 | random
11 | Eleven
(10 rows)

You are able to confirm that by running LiveCompare in --recheck mode, too. That comparison mode basically rechecks the state of data that was previously detected as divergent by LiveCompare.

5- In node ts-54-bdr-node-1, re-run LiveCompare in --recheck mode to confirm the previously reported divergences are gone:

livecompare livecompare.ini 1 --recheck

That means we are rechecking the divergences reported by session 1, using the same INI file livecompare.ini.

The output will look like this:

postgres@ts-54-bdr-node-1:~ $ livecompare livecompare.ini 1 --recheck
LiveCompare session 1, recheck mode, successfully finished.

Saved file ./lc_session_1/summary_20221101.out with the complete table summary.
You can also get the table summary by connecting to the output database and executing:
select * from livecompare.vw_table_summary where session_id = 1;

Elapsed time: 0:00:00.618702
Processed 3 differences from 2 connections.

No remaining differences.

Copyright EnterpriseDB UK Limited 2019-2021 - All rights reserved.

Indicating that the divergences are gone.

It is important to note that the --recheck mode can also be useful in case LiveCompare default comparison mode (--compare) was run while there was replication lag among the BDR nodes. Keep in mind that replication lag can cause LiveCompare to throw false positives, in which case --recheck node can also be useful to check if the divergences are permanent or if the data eventually got consistent after the replication queue has progressed.

Was this article helpful?

0 out of 0 found this helpful