Enable PGD debug log level and gather stack trace from PGD workers

William Ivanski
William Ivanski

This article explains how to diagnose PGD issues by increasing the log verbosity and gathering stack trace data from any BDR worker. This assumes you were unable to diagnose the issue(s) using the Lasso report or Postgres Logs.

In the example below, we are diagnosing an issue with the AutoPartition worker, but the same steps can be applied to any BDR worker.

Make sure debuginfo and gdb packages are installed

If PostgreSQL was installed from packages, the debuginfo packages must be installed. See the article Installing debuginfo packages. Package gdb also needs to be installed. For more details about requirements for a stack trace, please check the article How to use debugger to get a stack trace of a running process.

Grab a stack trace from the worker

If the worker is stuck or erroring out, then we need to grab a stack trace of the worker in its current state.

First we need to confirm the PIDs of the workers we need to grab information from. As the AutoPartition worker is called autopart, the command to get the PID is:

ps -ef | grep [a]utopart | awk '{print $2}'

Knowing the PID of the worker, then we can get a stack trace with the following command:

pid=PID_RETURNED_ABOVE
gdb -p ${pid} -ex 'set pagination off' -ex 'set logging file gdb_${pid}.txt' -ex 'set logging on' -ex 'set confirm off' -ex 'bt' -ex 'bt full' --eval-command='shell sleep 5' -ex 'bt full' --eval-command='shell sleep 5' -ex 'bt full' -ex 'info threads' -ex 'quit'

Replacing PID_RETURNED_ABOVE with the value returned in step 1, on each node. The command will generate files gdb_*.txt, which the customer needs to attach to the Support ticket, and might be useful for the L2 to review.

Change Postgres and BDR log levels

Run the following when connecting to Postgres, as a Postgres superuser, on a BDR-enabled database, on any PGD data node:

SELECT bdr.run_on_all_nodes($$ALTER SYSTEM SET bdr.debug_level = 'log'$$);
SELECT bdr.run_on_all_nodes($$ALTER SYSTEM SET bdr.trace_level = 'log'$$);
SELECT bdr.run_on_all_nodes($$ALTER SYSTEM SET log_line_prefix = '%m [%p]: u=[%u] db=[%d] app=[%a] c=[%h] s=[%c:%l] tx=[%v:%x] '$$);
SELECT bdr.run_on_all_nodes($$ALTER SYSTEM SET log_min_messages = 'debug3'$$);
SELECT bdr.run_on_all_nodes($$SELECT pg_reload_conf()$$);

The settings above will make Postgres generate a very large log volume, so Postgres can't run for too long with these settings enabled. Instruct the customer to monitor disk usage on the log partition.

Re-create the issue

If possible and applicable, now that debug log verbosity is enabled, we need to re-create the issue so the L2 engineers will be able to investigate further from the logs.

In the example we are using for this article, the issue was in the AutoPartition worker when trying to configure AutoPartition for a specific table, but the steps for recreating the issue you are seeing actually depend on the specific situation you are facing.

In other cases, the user actually doesn't need to run any specific step, and the worker might reach the defective state after some time.

Last but not least, sometimes re-creating the issue requires to first restart Postgres.

After re-creating the issue, it might also be useful to grab another stack trace from the defective worker as instructed above.

Reset Postgres and BDR log levels

Once you have collected the logs with the increased verbosity (and re-creating the issue if applicable), run the following when connecting to Postgres, on a BDR-enabled database, on any PGD data node:

SELECT bdr.run_on_all_nodes($$ALTER SYSTEM RESET bdr.debug_level$$);
SELECT bdr.run_on_all_nodes($$ALTER SYSTEM RESET bdr.trace_level$$);
SELECT bdr.run_on_all_nodes($$ALTER SYSTEM RESET log_line_prefix$$);
SELECT bdr.run_on_all_nodes($$ALTER SYSTEM RESET log_min_messages$$);
SELECT bdr.run_on_all_nodes($$SELECT pg_reload_conf()$$);

Was this article helpful?

0 out of 0 found this helpful