Among other useful information, Lasso gathers PGD metadata from PostgreSQL instances into a tarball. Having multiple Lasso tarballs (one per node from a PGD cluster) downloaded locally, it's possible to run some commands to quickly check the PGD cluster state and health.
The following instructions are useful for BDR 3.7 and newer, but we also include the command for BDR 3.6.
In our example case, the customer has a 4-node BDR cluster, so they sent 4 Lasso tarballs. The first step is to download them all and extract them in the same directory, for example:
$ ls -lh
total 1.6M
drwxr-xr-x 8 william.ivanski 256 Oct 10 2022 edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948
-rw-r--r-- 1 william.ivanski 423K Nov 7 2022 edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948-bba36c2c201309f348d34aef10e5f1e2.tar.bz2
drwxr-xr-x 9 william.ivanski 288 Nov 7 2022 edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413
-rw-r--r-- 1 william.ivanski 418K Nov 7 2022 edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413-8382b7797ca90629cb162082f096ec5d.tar.bz2
drwxr-xr-x 8 william.ivanski 256 Oct 10 2022 edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836
-rw-r--r-- 1 william.ivanski 386K Nov 7 2022 edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836-d7c39d35160faa6f45c6ca23e0c61156.tar.bz2
drwxr-xr-x 9 william.ivanski 288 Nov 7 2022 edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842
-rw-r--r-- 1 william.ivanski 380K Nov 7 2022 edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842-d291d0b52c845d8c9ef206a938a59015.tar.bz2
Each Postgres instance can have multiple databases, but we need to identify the databases where the bdr
extension is installed. When Lasso detects the bdr
extension installed, then it creates a bdr
directory under that database directory, so the command to locate the BDR-enabled databases in all Lasso reports is as simple as:
find . -type d -name bdr
Output will be, for example:
$ find . -type d -name bdr
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/postgresql/dbs/paysourcing/bdr
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/postgresql/dbs/paysourcing/bdr
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/postgresql/dbs/paysourcing/bdr
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/postgresql/dbs/paysourcing/bdr
So we know that the database name is paysourcing
in all 4 nodes.
Knowing the operating system version is important to provide targeted instructions, as well as for the developers to generate hotfix packages for the customer system if needed. The command to check that is:
find . -type f -name release.data -print -exec bash -c "cat {}; echo" \;
Sample output:
$ find . -type f -name release.data -print -exec bash -c "cat {}; echo" \;
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/linux/release.data
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/linux/release.data
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/linux/release.data
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/linux/release.data
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
So we know the customer is running Ubuntu 20.04 on all nodes.
Another crucial piece of information in terms of Support ticket metadata is the BDR version and edition.
- If the customer is running BDR 3.7 or newer, the command to check this information is:
find . -type f -name group_versions_details.data -print -exec bash -c "cat {}; echo" \;
Sample output:
$ find . -type f -name group_versions_details.data -print -exec bash -c "cat {}; echo" \;
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/postgresql/dbs/paysourcing/bdr/group_versions_details.data
node_id node_name postgres_version pglogical_version bdr_version bdr_edition
285269477 ist02_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
835501687 ist01_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
3115261467 ist01_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
1600578929 ist02_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/postgresql/dbs/paysourcing/bdr/group_versions_details.data
node_id node_name postgres_version pglogical_version bdr_version bdr_edition
285269477 ist02_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
835501687 ist01_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
3115261467 ist01_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
1600578929 ist02_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/postgresql/dbs/paysourcing/bdr/group_versions_details.data
node_id node_name postgres_version pglogical_version bdr_version bdr_edition
3115261467 ist01_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
835501687 ist01_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
1600578929 ist02_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
285269477 ist02_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/postgresql/dbs/paysourcing/bdr/group_versions_details.data
node_id node_name postgres_version pglogical_version bdr_version bdr_edition
1600578929 ist02_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
3115261467 ist01_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
835501687 ist01_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
285269477 ist02_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
So we know the customer is running BDR 3.7.17, Enterprise Edition, on top of Postgres Extended (formerly known as 2ndQPG).
- If the customer is on BDR 3.6, then the command would be this one instead:
find . -type f -name monitor_group_versions_details.data -print -exec bash -c "cat {}; echo" \;
In a PGD cluster, Raft Consensus needs to be working at all times so checking this is one of the first steps when assessing a PGD cluster health. The command is:
find . -type f -name bdr_monitor_group_raft.out -print -exec bash -c "cat {}; echo" \;
We know Raft Consensus is working when the output is:
$ find . -type f -name bdr_monitor_group_raft.out -print -exec bash -c "cat {}; echo" \;
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/postgresql/dbs/paysourcing/bdr_monitor_group_raft.out
monitor_group_raft
"(OK,""Raft Consensus is working correctly"")"
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/postgresql/dbs/paysourcing/bdr_monitor_group_raft.out
monitor_group_raft
"(OK,""Raft Consensus is working correctly"")"
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/postgresql/dbs/paysourcing/bdr_monitor_group_raft.out
monitor_group_raft
"(OK,""Raft Consensus is working correctly"")"
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/postgresql/dbs/paysourcing/bdr_monitor_group_raft.out
monitor_group_raft
"(OK,""Raft Consensus is working correctly"")"
When there are any issues with BDR replication, then we will look if the replication slots are inactive, or replication lag is increasing.
- If running BDR 3.7 or newer, we will be performing two commands to:
- a) Check replication slots between the nodes (not including the group slot)
- b) Check the group slot, which is important to check in case it is causing instances to hold WAL
a) Check replication slots between the nodes (not including the group slot):
find . -type f -name group_replslots_details.data -print -exec bash -c "awk -F'\t' '{print \$2,\$3,\$4,\$5,\$7,\$8,\$9}' {}; echo" \;
Sample output (only one node output shown):
$ find . -type f -name group_replslots_details.data -print -exec bash -c "awk -F'\t' '{print \$2,\$3,\$4,\$5,\$7,\$8,\$9}' {}; echo" \;./edb-lasso-report-dc2_node2-5432-20250210-165631-8eb404a39ea206f9591c63fa58e2e3fc/edb-lasso-report-dc2_node2-5432-20250210-165631/postgresql/dbs/bdrdb/bdr/group_replslots_details.data
origin_name target_name slot_name active write_lag flush_lag replay_lag
dc1_node2 dc1_node3 bdr_bdrdb_bdrgroup_dc1_node3 t 00:00:00.000403 00:00:00.000403 00:00:00.000403
dc1_node2 dc1_node4 bdr_bdrdb_bdrgroup_dc1_node4 t 00:00:00.000322 00:00:00.00041 00:00:00.00041
dc1_node2 dc2_node2 bdr_bdrdb_bdrgroup_dc2_node2 t 00:00:00.007075 00:00:00.007101 00:00:00.007101
dc1_node2 dc2_node3 bdr_bdrdb_bdrgroup_dc2_node3 t 00:00:00.007152 00:00:00.007614 00:00:00.007614
dc1_node2 dc2_node4 bdr_bdrdb_bdrgroup_dc2_node4 t 00:00:00.007017 00:00:00.007774 00:00:00.007774
dc1_node3 dc1_node2 bdr_bdrdb_bdrgroup_dc1_node2 t 00:00:00.000235 00:00:00.000253 00:00:00.000253
dc1_node3 dc1_node4 bdr_bdrdb_bdrgroup_dc1_node4 t 00:00:00 00:00:00 00:00:00
dc1_node3 dc2_node2 bdr_bdrdb_bdrgroup_dc2_node2 t 00:00:00 00:00:00 00:00:00
dc1_node3 dc2_node3 bdr_bdrdb_bdrgroup_dc2_node3 f 1 day 10:32:59.35964 1 day 10:32:59.35964 1 day 10:32:59.35964
dc1_node3 dc2_node4 bdr_bdrdb_bdrgroup_dc2_node4 f 1 day 10:32:58.30663 1 day 10:32:58.30663 1 day 10:32:58.30663
dc1_node4 dc1_node2 bdr_bdrdb_bdrgroup_dc1_node2 t 00:00:00 00:00:00 00:00:00
dc1_node4 dc1_node3 bdr_bdrdb_bdrgroup_dc1_node3 t 00:00:00 00:00:00 00:00:00
dc1_node4 dc2_node2 bdr_bdrdb_bdrgroup_dc2_node2 t 00:00:00 00:00:00 00:00:00
dc1_node4 dc2_node3 bdr_bdrdb_bdrgroup_dc2_node3 t 00:00:00 00:00:00 00:00:00
dc1_node4 dc2_node4 bdr_bdrdb_bdrgroup_dc2_node4 t 00:00:00.006861 00:00:00.006928 00:00:00.006928
dc2_node2 dc1_node2 bdr_bdrdb_bdrgroup_dc1_node2 t 00:00:00.007072 00:00:00.007078 00:00:00.007078
dc2_node2 dc1_node3 bdr_bdrdb_bdrgroup_dc1_node3 t 00:00:00.006991 00:00:00.007084 00:00:00.007084
dc2_node2 dc1_node4 bdr_bdrdb_bdrgroup_dc1_node4 t 00:00:00.007161 00:00:00.007367 00:00:00.007367
dc2_node2 dc2_node3 bdr_bdrdb_bdrgroup_dc2_node3 t 00:00:00.000488 00:00:00.000488 00:00:00.000488
dc2_node2 dc2_node4 bdr_bdrdb_bdrgroup_dc2_node4 t 00:00:00.000482 00:00:00.00054 00:00:00.00054
dc2_node3 dc1_node2 bdr_bdrdb_bdrgroup_dc1_node2 t 00:00:00 00:00:00 00:00:00
dc2_node3 dc1_node3 bdr_bdrdb_bdrgroup_dc1_node3 t 00:00:00 00:00:00 00:00:00
dc2_node3 dc1_node4 bdr_bdrdb_bdrgroup_dc1_node4 t 00:00:00 00:00:00 00:00:00
dc2_node3 dc2_node2 bdr_bdrdb_bdrgroup_dc2_node2 t 00:00:00 00:00:00 00:00:00
dc2_node3 dc2_node4 bdr_bdrdb_bdrgroup_dc2_node4 t 00:00:00 00:00:00 00:00:00
dc2_node4 dc1_node2 bdr_bdrdb_bdrgroup_dc1_node2 t 00:00:00.012962 00:00:00.012962 00:00:00.012962
dc2_node4 dc1_node3 bdr_bdrdb_bdrgroup_dc1_node3 t 00:00:00.012903 00:00:00.012927 00:00:00.012927
dc2_node4 dc1_node4 bdr_bdrdb_bdrgroup_dc1_node4 t 00:00:00.016112 00:00:00.016132 00:00:00.016132
dc2_node4 dc2_node2 bdr_bdrdb_bdrgroup_dc2_node2 t 00:00:00 00:00:00 00:00:00
dc2_node4 dc2_node3 bdr_bdrdb_bdrgroup_dc2_node3 t 00:00:00.000616 00:00:00.209223 00:00:00.209223
In this case, two replication slots are inactive, and replication needs to be fixed. We have to investigate further replication between these 2 nodes.
b) Check the group slot:
find . -type f -name node_slots.data -print -exec bash -c 'awk -F"\t" "NR == 1 || (\$4 == \"\\N\" && \$5 == \"\\N\") {print \$4, \$5, \$9, \$17, \$42, \$43, \$44, \$45}" {} && echo' \;
Sample output (only one node output shown):
$ find . -type f -name node_slots.data -print -exec bash -c 'awk -F"\t" "NR == 1 || (\$4 == \"\\N\" && \$5 == \"\\N\") {print \$4, \$5, \$9, \$17, \$42, \$43, \$44, \$45}" {} && echo' \;./edb-lasso-report-dc2_node3-5432-20250210-165631-8eb404a39ea206f9591c63fa58e2e3fc/edb-lasso-report-dc2_node3-5432-20250210-165631/postgresql/dbs/bdrdb/bdr/node_slots.data
origin_name target_name slot_name active sent_lag_size write_lag_size flush_lag_size replay_lag_size
\N \N bdr_bdrdb_bdrgroup f 19 MB 19 MB 19 MB 19 MB
- If the customer is running BDR 3.6, then we will use only one command for both slots between nodes, and the group slot:
find . -type f -name node_slots.data -print -exec bash -c "awk -F'\t' '{print \$4,\$5,\$9,\$15,\$40,\$41,\$42,\$43}' {}; echo" \;
In most of the cases, an inactive replication slot and/or replication lag can be explained by checking the BDR worker errors on the target. The following command shows worker errors in all nodes:
find . -type f -name worker_errors.data -print -exec bash -c "cat {}; echo" \;
Sample output:
$ find . -type f -name worker_errors.data -print -exec bash -c "cat {}; echo" \;
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/postgresql/dbs/paysourcing/bdr/worker_errors.data
node_group_name origin_name source_name target_name sub_name worker_role worker_role_name worker_pid error_time error_ageerror_message error_context_message remoterelid subwriter_id subwriter_name
bdr_paysourcing ist01_vm_db_ps02 ist01_vm_db_ps02 ist01_vm_db_ps01 bdr_paysourcing_bdr_paysourcing_ist0b9af1e1_ist031ccbe7 3 writer 596548 2022-10-10 07:59:40.404119+00 01:31:01.129739 pglogical worker received fast finish request, exiting "while handling global lock GLOBAL_LOCK_DDL in database 16384 requested by node-id 3115261467 in local acquire stage acquiring_local
for ddl_epoch 94500184816528, ddl_epoch_lsn 55F2/8982A4C0
during apply of message in xact with commit-end lsn 0/0 xid 0 committs <unset> (action #0) (effective sess origin id=2 lsn=293/B2388601)
while consuming 'M' message from receiver for subscription bdr_paysourcing_bdr_paysourcing_ist0b9af1e1_ist031ccbe7 (id=2759508289) on node ist01_vm_db_ps01 (id=835501687) from upstream node ist01_vm_db_ps02 (id=3115261467, reporiginid=2)" 0 2617452294 bdr_paysourcing_bdr_paysourcing_ist0b9af1e1_ist031ccbe7
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/postgresql/dbs/paysourcing/bdr/worker_errors.data
node_group_name origin_name source_name target_name sub_name worker_role worker_role_name worker_pid error_time error_ageerror_message error_context_message remoterelid subwriter_id subwriter_name
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/postgresql/dbs/paysourcing/bdr/worker_errors.data
node_group_name origin_name source_name target_name sub_name worker_role worker_role_name worker_pid error_time error_ageerror_message error_context_message remoterelid subwriter_id subwriter_name
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/postgresql/dbs/paysourcing/bdr/worker_errors.data
node_group_name origin_name source_name target_name sub_name worker_role worker_role_name worker_pid error_time error_ageerror_message error_context_message remoterelid subwriter_id subwriter_name
The full PostgreSQL configuration (which includes PGD configuration) can be found in the configuration.out
file in each Lasso report. However, in some cases we want to compare a single setting among the nodes. It's possible to do this with the following command, for example comparing setting *writers_per_subscription
:
find . -type f -name configuration.out -print -exec bash -c "grep writers_per_subscription {}; echo" \;
Sample output:
$ find . -type f -name configuration.out -print -exec bash -c "grep writers_per_subscription {}; echo" \;
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/postgresql/configuration.out
pglogical.max_writers_per_subscription 16 configuration file 16 8 16 /data/postgres/conf.d/bdr_configuration.conf 11 f
pglogical.writers_per_subscription 4 configuration file 4 1 4 /data/postgres/conf.d/bdr_configuration.conf 12 f
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/postgresql/configuration.out
pglogical.max_writers_per_subscription 16 configuration file 16 8 16 /data/postgres/conf.d/bdr_configuration.conf 11 f
pglogical.writers_per_subscription 4 configuration file 4 1 4 /data/postgres/conf.d/bdr_configuration.conf 12 f
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/postgresql/configuration.out
pglogical.max_writers_per_subscription 16 configuration file 16 8 16 /data/postgres/conf.d/bdr_configuration.conf 11 f
pglogical.writers_per_subscription 1 configuration file 1 1 1 /data/postgres/conf.d/bdr_configuration.conf 12 f
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/postgresql/configuration.out
pglogical.max_writers_per_subscription 16 configuration file 16 8 16 /data/postgres/conf.d/bdr_configuration.conf 11 f
pglogical.writers_per_subscription 1 configuration file 1 1 1 /data/postgres/conf.d/bdr_configuration.conf 12 f
We can clearly see the difference (4 in half of the nodes and 1 in the other half).