Delaying a Postgres process to collect a stack trace

Luciano Botti
Luciano Botti

In some special cases you need the stack trace of a postgres process which is producing an error shortly after connecting, so fast that won't give you time to attach gdb to it.

If you are facing this situation you can follow this guide to delay the start of the process gaining some valuable time to get the PID and setting gdb options.

Pre-requisites

To have better debugging information the postgresql server should have been installed with the *-debuginfo packages (for Red Hat / Centos / Fedora) or the *-dbgsym / *-dbg packages ( for Debian / Ubuntu). In case you also need to debug some extensions, their debuginfo packages should have also been installed. See related Knowledge Base articles for more information.

You will also need the gdb package installed.

Collecting debug data

Once gdb is installed you should run the following steps, replacing the command with the one you would like to get the stack trace. For example, let's say we want to debug the command reindexdb -d somedb -t sometable

1) Preparing gdb commands

First we need to create a cmds file with some gdb commands that will be used later:

cat > cmds << EOF
set logging file gdb.txt
set logging on
set pagination off
set confirm off
break errfinish
commands 1
print errordata[0]
bt
cont
end
cont
quit
EOF

2) Delaying start of the postgres process

In a shell run the following and send it to background:

PGOPTIONS="-W10" reindexdb -d somedb -t sometable &

Note that during this step the start of the postgres process will be delayed by 10 seconds. So you should be prepared to run or set anything that your command needs before running gdb.

3) Collecting with gdb

Use the postgresql backend PID that is in 'startup' phase as the process parameter in gdb:

gdb -p $(pgrep -f 'postgres:.*startup') -x cmds

Note that in this example gdb will break every time when reaching the errfinish function. You can change this behaviour according to your needs.

This last step will end up with a gdb.txt file with the collected stack trace of your command. For example:

Breakpoint 1, errfinish (dummy=dummy@entry=16779816) at elog.c:414

$1 = {elevel = 19, output_to_server = 1 '\001', output_to_client = 1 '\001', show_funcname = 0 '\000', hide_stmt = 0 '\000', hide_ctx = 0 '\000', filename = 0x989240 "bufpage.c", lineno = 152, funcname = 0x989514 <__func__.7628> "PageIsVerified", domain = 0x964bc9 "postgres-10", context_domain = 0x964bc9 "postgres-10", sqlerrcode = 64, message = 0x28c9528 "page verification failed, calculated checksum 3487 but expected 17674", detail = 0x0, detail_log = 0x0, hint = 0x0, context = 0x0, message_id = 0x989280 "page verification failed, calculated checksum %u but expected %u", schema_name = 0x0, table_name = 0x0, column_name = 0x0, datatype_name = 0x0, constraint_name = 0x0, cursorpos = 0, internalpos = 0, internalquery = 0x0, saved_errno = 0, assoc_context = 0x28c6c78}
#0 errfinish (dummy=dummy@entry=16779816) at elog.c:414
#1 0x0000000000713b41 in PageIsVerified (page=page@entry=0x7f566b47d000 "ERR", blkno=blkno@entry=0) at bufpage.c:149
#2 0x00000000006f0020 in ReadBuffer_common (smgr=0x29a62e8, relpersistence=<optimized out>, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=blockNum@entry=0, mode=RBM_NORMAL, strategy=0x0, hit=hit@entry=0x7ffe42754e17 "") at bufmgr.c:901
#3 0x00000000006f0b40 in ReadBufferExtended (reln=0x7f56c9401068, forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=blockNum@entry=0, mode=mode@entry=RBM_NORMAL, strategy=<optimized out>) at bufmgr.c:664
#4 0x00000000004b1bf3 in heapgetpage (scan=scan@entry=0x2921c98, page=page@entry=0) at heapam.c:373
#5 0x00000000004b2986 in heapgettup (key=0x0, nkeys=0, dir=ForwardScanDirection, scan=0x2921c98) at heapam.c:525
#6 heap_getnext (scan=scan@entry=0x2921c98, direction=direction@entry=ForwardScanDirection) at heapam.c:1804
#7 0x00000000005125d5 in IndexBuildHeapRangeScan (heapRelation=heapRelation@entry=0x7f56c9401068, indexRelation=indexRelation@entry=0x7f56c9401838, indexInfo=indexInfo@entry=0x29220a8, allow_sync=allow_sync@entry=1 '\001', anyvisible=anyvisible@entry=0 '\000', start_blockno=start_blockno@entry=0, numblocks=numblocks@entry=4294967295, callback=callback@entry=0x4c8b50 <btbuildCallback>, callback_state=callback_state@entry=0x7ffe427553e0) at index.c:2311
#8 0x0000000000512b73 in IndexBuildHeapScan (heapRelation=heapRelation@entry=0x7f56c9401068, indexRelation=indexRelation@entry=0x7f56c9401838, indexInfo=indexInfo@entry=0x29220a8, allow_sync=allow_sync@entry=1 '\001', callback=callback@entry=0x4c8b50 <btbuildCallback>, callback_state=callback_state@entry=0x7ffe427553e0) at index.c:2191
#9 0x00000000004c8a67 in btbuild (heap=0x7f56c9401068, index=0x7f56c9401838, indexInfo=0x29220a8) at nbtree.c:209
[...]

Was this article helpful?

0 out of 0 found this helpful