Among other useful information, Lasso gathers PGD metadata from PostgreSQL instances into a tarball. Having multiple Lasso tarballs (one per node from a PGD cluster) downloaded locally, it's possible to run some commands to quickly check the PGD cluster state and health.
The following instructions are useful for BDR 3.7 and newer, but we also include the command for BDR 3.6.
In our example case, the customer has a 4-node BDR cluster, so they sent 4 Lasso tarballs. The first step is to download them all and extract them in the same directory, for example:
$ ls -lh
total 1.6M
drwxr-xr-x 8 william.ivanski 256 Oct 10 2022 edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948
-rw-r--r-- 1 william.ivanski 423K Nov 7 2022 edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948-bba36c2c201309f348d34aef10e5f1e2.tar.bz2
drwxr-xr-x 9 william.ivanski 288 Nov 7 2022 edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413
-rw-r--r-- 1 william.ivanski 418K Nov 7 2022 edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413-8382b7797ca90629cb162082f096ec5d.tar.bz2
drwxr-xr-x 8 william.ivanski 256 Oct 10 2022 edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836
-rw-r--r-- 1 william.ivanski 386K Nov 7 2022 edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836-d7c39d35160faa6f45c6ca23e0c61156.tar.bz2
drwxr-xr-x 9 william.ivanski 288 Nov 7 2022 edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842
-rw-r--r-- 1 william.ivanski 380K Nov 7 2022 edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842-d291d0b52c845d8c9ef206a938a59015.tar.bz2
Each Postgres instance can have multiple databases, but we need to identify the databases where the bdr
extension is installed. When Lasso detects the bdr
extension installed, then it creates a bdr
directory under that database directory, so the command to locate the BDR-enabled databases in all Lasso reports is as simple as:
find . -type d -name bdr
Output will be, for example:
$ find . -type d -name bdr
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/postgresql/dbs/paysourcing/bdr
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/postgresql/dbs/paysourcing/bdr
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/postgresql/dbs/paysourcing/bdr
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/postgresql/dbs/paysourcing/bdr
So we know that the database name is paysourcing
in all 4 nodes.
Knowing the operating system version is important to provide targeted instructions, as well as for the developers to generate hotfix packages for the customer system if needed. The command to check that is:
find . -type f -name release.data -print -exec bash -c "cat {}; echo" \;
Sample output:
$ find . -type f -name release.data -print -exec bash -c "cat {}; echo" \;
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/linux/release.data
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/linux/release.data
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/linux/release.data
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/linux/release.data
Distributor ID: Ubuntu
Description: Ubuntu 20.04.5 LTS
Release: 20.04
Codename: focal
So we know the customer is running Ubuntu 20.04 on all nodes.
Another crucial piece of information in terms of Support ticket metadata is the BDR version and edition. If running BDR 3.7 or newer, the command to check this information is:
find . -type f -name group_versions_details.data -print -exec bash -c "cat {}; echo" \;
Sample output:
$ find . -type f -name group_versions_details.data -print -exec bash -c "cat {}; echo" \;
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/postgresql/dbs/paysourcing/bdr/group_versions_details.data
node_id node_name postgres_version pglogical_version bdr_version bdr_edition
285269477 ist02_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
835501687 ist01_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
3115261467 ist01_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
1600578929 ist02_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/postgresql/dbs/paysourcing/bdr/group_versions_details.data
node_id node_name postgres_version pglogical_version bdr_version bdr_edition
285269477 ist02_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
835501687 ist01_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
3115261467 ist01_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
1600578929 ist02_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/postgresql/dbs/paysourcing/bdr/group_versions_details.data
node_id node_name postgres_version pglogical_version bdr_version bdr_edition
3115261467 ist01_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
835501687 ist01_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
1600578929 ist02_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
285269477 ist02_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/postgresql/dbs/paysourcing/bdr/group_versions_details.data
node_id node_name postgres_version pglogical_version bdr_version bdr_edition
1600578929 ist02_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
3115261467 ist01_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
835501687 ist01_vm_db_ps01 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
285269477 ist02_vm_db_ps02 13.8 (2ndQPG 13.8r1.1.10) (Ubuntu 2:13.8r2ndq1.1.10-1.focal+1) 3.7.17 3.7.17 EE
So we know the customer is running BDR 3.7.17, Enterprise Edition, on top of Postgres Extended (formerly known as 2ndQPG).
If the customer was on BDR 3.6, then the command would be this one instead:
find . -type f -name monitor_group_versions_details.data -print -exec bash -c "cat {}; echo" \;
In a PGD cluster, Raft Consensus needs to be working at all times so checking this is one of the first steps when assessing a PGD cluster health. The command is:
find . -type f -name bdr_monitor_group_raft.out -print -exec bash -c "cat {}; echo" \;
We know Raft Consensus is working when the output is:
$ find . -type f -name bdr_monitor_group_raft.out -print -exec bash -c "cat {}; echo" \;
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/postgresql/dbs/paysourcing/bdr_monitor_group_raft.out
monitor_group_raft
"(OK,""Raft Consensus is working correctly"")"
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/postgresql/dbs/paysourcing/bdr_monitor_group_raft.out
monitor_group_raft
"(OK,""Raft Consensus is working correctly"")"
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/postgresql/dbs/paysourcing/bdr_monitor_group_raft.out
monitor_group_raft
"(OK,""Raft Consensus is working correctly"")"
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/postgresql/dbs/paysourcing/bdr_monitor_group_raft.out
monitor_group_raft
"(OK,""Raft Consensus is working correctly"")"
When there are any issues with BDR replication, then we will be able to see the replication slots inactive or replication lag increasing. The command to check this is:
find . -type f -name node_slots.data -print -exec bash -c "awk -F'\t' '{print \$4,\$5,\$9,\$17,\$42,\$43,\$44,\$45}' {}; echo" \;
Sample output:
$ find . -type f -name node_slots.data -print -exec bash -c "awk -F'\t' '{print \$4,\$5,\$9,\$17,\$42,\$43,\$44,\$45}' {}; echo" \;
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/postgresql/dbs/paysourcing/bdr/node_slots.data
origin_name target_name slot_name active sent_lag_size write_lag_size flush_lag_size replay_lag_size
ist01_vm_db_ps01 ist02_vm_db_ps02 bdr_paysourcing_bdr_paysourcing_ist02_vm_db_ps02 t 60 MB 107 MB 108 MB 108 MB
ist01_vm_db_ps01 ist01_vm_db_ps02 bdr_paysourcing_bdr_paysourcing_ist01_vm_db_ps02 t 0 bytes 0 bytes 0 bytes 0 bytes
ist01_vm_db_ps01 ist02_vm_db_ps01 bdr_paysourcing_bdr_paysourcing_ist02_vm_db_ps01 t 0 bytes 0 bytes 0 bytes 0 bytes
\N \N bdr_paysourcing_bdr_paysourcing f 108 MB 108 MB 108 MB 108 MB
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/postgresql/dbs/paysourcing/bdr/node_slots.data
origin_name target_name slot_name active sent_lag_size write_lag_size flush_lag_size replay_lag_size
ist01_vm_db_ps02 ist02_vm_db_ps02 bdr_paysourcing_bdr_paysourcing_ist02_vm_db_ps02 t 0 bytes 0 bytes 0 bytes 0 bytes
ist01_vm_db_ps02 ist01_vm_db_ps01 bdr_paysourcing_bdr_paysourcing_ist01_vm_db_ps01 t 0 bytes 0 bytes 0 bytes 0 bytes
ist01_vm_db_ps02 ist02_vm_db_ps01 bdr_paysourcing_bdr_paysourcing_ist02_vm_db_ps01 t 0 bytes 0 bytes 0 bytes 0 bytes
\N \N bdr_paysourcing_bdr_paysourcing f 4745 kB 4745 kB 4745 kB 4745 kB
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/postgresql/dbs/paysourcing/bdr/node_slots.data
origin_name target_name slot_name active sent_lag_size write_lag_size flush_lag_size replay_lag_size
ist02_vm_db_ps01 ist01_vm_db_ps02 bdr_paysourcing_bdr_paysourcing_ist01_vm_db_ps02 t 5608 bytes 5608 bytes 5608 bytes 5608 bytes
ist02_vm_db_ps01 ist01_vm_db_ps01 bdr_paysourcing_bdr_paysourcing_ist01_vm_db_ps01 t 5608 bytes 5608 bytes 5608 bytes 5608 bytes
ist02_vm_db_ps01 ist02_vm_db_ps02 bdr_paysourcing_bdr_paysourcing_ist02_vm_db_ps02 t 5608 bytes 5608 bytes 5608 bytes 5608 bytes
\N \N bdr_paysourcing_bdr_paysourcing f 6674 kB 6674 kB 6674 kB 6674 kB
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/postgresql/dbs/paysourcing/bdr/node_slots.data
origin_name target_name slot_name active sent_lag_size write_lag_size flush_lag_size replay_lag_size
ist02_vm_db_ps02 ist02_vm_db_ps01 bdr_paysourcing_bdr_paysourcing_ist02_vm_db_ps01 t 0 bytes 0 bytes 0 bytes 0 bytes
ist02_vm_db_ps02 ist01_vm_db_ps02 bdr_paysourcing_bdr_paysourcing_ist01_vm_db_ps02 t 0 bytes 0 bytes 0 bytes 0 bytes
ist02_vm_db_ps02 ist01_vm_db_ps01 bdr_paysourcing_bdr_paysourcing_ist01_vm_db_ps01 t 0 bytes 0 bytes 0 bytes 0 bytes
\N \N bdr_paysourcing_bdr_paysourcing f 3463 kB 3463 kB 3463 kB 3463 kB
In this case, all replication slots are active (except the BDR group slot, which is always inactive). However, there is one peer replication slot with considerable replication lag:
origin_name target_name slot_name active sent_lag_size write_lag_size flush_lag_size replay_lag_size
ist01_vm_db_ps01 ist02_vm_db_ps02 bdr_paysourcing_bdr_paysourcing_ist02_vm_db_ps02 t 60 MB 107 MB 108 MB 108 MB
So we can try to investigate further replication between these 2 nodes.
If the customer was running BDR 3.6, then the command would be the following instead:
find . -type f -name node_slots.data -print -exec bash -c "awk -F'\t' '{print \$4,\$5,\$9,\$15,\$40,\$41,\$42,\$43}' {}; echo" \;
In most of the cases, an inactive replication slot and/or replication lag can be explained by checking the BDR worker errors on the target. The following command shows worker errors in all nodes:
find . -type f -name worker_errors.data -print -exec bash -c "cat {}; echo" \;
Sample output:
$ find . -type f -name worker_errors.data -print -exec bash -c "cat {}; echo" \;
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/postgresql/dbs/paysourcing/bdr/worker_errors.data
node_group_name origin_name source_name target_name sub_name worker_role worker_role_name worker_pid error_time error_ageerror_message error_context_message remoterelid subwriter_id subwriter_name
bdr_paysourcing ist01_vm_db_ps02 ist01_vm_db_ps02 ist01_vm_db_ps01 bdr_paysourcing_bdr_paysourcing_ist0b9af1e1_ist031ccbe7 3 writer 596548 2022-10-10 07:59:40.404119+00 01:31:01.129739 pglogical worker received fast finish request, exiting "while handling global lock GLOBAL_LOCK_DDL in database 16384 requested by node-id 3115261467 in local acquire stage acquiring_local
for ddl_epoch 94500184816528, ddl_epoch_lsn 55F2/8982A4C0
during apply of message in xact with commit-end lsn 0/0 xid 0 committs <unset> (action #0) (effective sess origin id=2 lsn=293/B2388601)
while consuming 'M' message from receiver for subscription bdr_paysourcing_bdr_paysourcing_ist0b9af1e1_ist031ccbe7 (id=2759508289) on node ist01_vm_db_ps01 (id=835501687) from upstream node ist01_vm_db_ps02 (id=3115261467, reporiginid=2)" 0 2617452294 bdr_paysourcing_bdr_paysourcing_ist0b9af1e1_ist031ccbe7
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/postgresql/dbs/paysourcing/bdr/worker_errors.data
node_group_name origin_name source_name target_name sub_name worker_role worker_role_name worker_pid error_time error_ageerror_message error_context_message remoterelid subwriter_id subwriter_name
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/postgresql/dbs/paysourcing/bdr/worker_errors.data
node_group_name origin_name source_name target_name sub_name worker_role worker_role_name worker_pid error_time error_ageerror_message error_context_message remoterelid subwriter_id subwriter_name
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/postgresql/dbs/paysourcing/bdr/worker_errors.data
node_group_name origin_name source_name target_name sub_name worker_role worker_role_name worker_pid error_time error_ageerror_message error_context_message remoterelid subwriter_id subwriter_name
The full PostgreSQL configuration (which includes PGD configuration) can be found in the configuration.out
file in each Lasso report. However, in some cases we want to compare a single setting among the nodes. It's possible to do this with the following command, for example comparing setting *writers_per_subscription
:
find . -type f -name configuration.out -print -exec bash -c "grep writers_per_subscription {}; echo" \;
Sample output:
$ find . -type f -name configuration.out -print -exec bash -c "grep writers_per_subscription {}; echo" \;
./edb-lasso-report-ist01-vm-db-ps01-5432-20221010-092948/postgresql/configuration.out
pglogical.max_writers_per_subscription 16 configuration file 16 8 16 /data/postgres/conf.d/bdr_configuration.conf 11 f
pglogical.writers_per_subscription 4 configuration file 4 1 4 /data/postgres/conf.d/bdr_configuration.conf 12 f
./edb-lasso-report-ist01-vm-db-ps02-5432-20221010-093413/postgresql/configuration.out
pglogical.max_writers_per_subscription 16 configuration file 16 8 16 /data/postgres/conf.d/bdr_configuration.conf 11 f
pglogical.writers_per_subscription 4 configuration file 4 1 4 /data/postgres/conf.d/bdr_configuration.conf 12 f
./edb-lasso-report-ist02-vm-db-ps01-5432-20221010-092836/postgresql/configuration.out
pglogical.max_writers_per_subscription 16 configuration file 16 8 16 /data/postgres/conf.d/bdr_configuration.conf 11 f
pglogical.writers_per_subscription 1 configuration file 1 1 1 /data/postgres/conf.d/bdr_configuration.conf 12 f
./edb-lasso-report-ist02-vm-db-ps02-5432-20221010-092842/postgresql/configuration.out
pglogical.max_writers_per_subscription 16 configuration file 16 8 16 /data/postgres/conf.d/bdr_configuration.conf 11 f
pglogical.writers_per_subscription 1 configuration file 1 1 1 /data/postgres/conf.d/bdr_configuration.conf 12 f
We can clearly see the difference (4 in half of the nodes and 1 in the other half).