Tracking PostgreSQL syscall waits using strace

Niccolò Fei
Niccolò Fei

Strace Overview

strace is a tool that shows a trace of all the system calls a process is requesting. It could be useful for instance when there's a process that suddenly requests lots of CPU or I/O, or if a process seem to be hanging.

To record the duration of each system call run:

strace -tt -T -p <PID>

The -tt option prints the timestamp including microseconds when the syscall was invoked. The -T option prints the time spent in the syscall. The -p option specify the PID of the process strace will be attached to.

Use Ctrl+C to interrupt the tracing.

The strace output will be something similar to the following:

10:45:53.069105 select(7, [3 4 5 6], NULL, NULL, {56, 70357}) = 1 (in [5], left {53, 19165}) <3.051227>
10:45:56.121108 rt_sigprocmask(SIG_SETMASK, ~[ILL TRAP ABRT BUS FPE SEGV CONT SYS RTMIN RT_1], NULL, 8) = 0 <0.000023>
10:45:56.121358 accept(5, {sa_family=AF_LOCAL, NULL}, [2]) = 11 <0.000028>
10:45:56.121442 getsockname(11, {sa_family=AF_LOCAL, sun_path="/var/run/postgresql/.s.PGSQL.5432"}, [36]) = 0 <0.000013>
10:45:56.121512 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f00177c9b90) = 4458 <0.000347>
10:45:56.121957 close(11) = 0 <0.000020>
10:45:56.122576 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000020>

The output format is:

timestamp syscall(arguments) = return value <duration>

Each line contains the timestamp, the syscall name followed by its arguments within parentheses, its return value, and its duration in seconds. To have more details on each syscall consult the syscalls manpage.

Collecting statistics on syscall waits using strace

To collect statistics about which syscalls took the longest time to be executed, run:

strace -c -p <PID>

The -c option suppresses the regular strace output and produces a summary on the exit, reporting the total time spent for any type of syscall. This shows the type of operation in which the process has spent most of the time.

The strace output will be something similar to the following:

% time seconds usecs/call calls errors syscall
76.41 0.006522 384 17 clone
9.13 0.000779 24 32 wait4
7.30 0.000623 18 34 23 select
2.68 0.000229 2 116 rt_sigprocmask
2.23 0.000190 17 11 accept
1.03 0.000088 4 24 24 rt_sigreturn
0.56 0.000048 3 18 18 stat
0.25 0.000021 2 12 close
0.18 0.000015 1 11 getsockname
0.12 0.000010 3 3 kill
0.06 0.000005 5 1 read
0.06 0.000005 5 1 open
100.00 0.008535 280 65 total

On PostgreSQL 9.6 or higher

Since PostgreSQL 9.6 new columns have been introduced in the pg_stat_activity view to show which kind of wait events a backend is waiting on.

List of current processes that are waiting for an action:

SELECT pid, state, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event is NOT NULL;

which output will be something similar to the following:

pid | state | wait_event_type | wait_event
3091 | | Activity | AutoVacuumMain
3094 | | Activity | LogicalLauncherMain
4459 | active | Lock | transactionid
4460 | idle in transaction | IO | WALWrite
4461 | active | Lock | transactionid
4462 | active | Lock | transactionid
4463 | active | Lock | transactionid
4464 | active | Lock | transactionid
4465 | idle in transaction | Client | ClientRead
4466 | active | Lock | transactionid
4467 | idle in transaction | Client | ClientRead
4468 | active | Lock | transactionid
3089 | | Activity | BgWriterHibernate
3088 | | Activity | CheckpointerMain
3090 | | Activity | WalWriterMain
(15 rows)

The state column shows the current state of the selected backend.

The wait_event is the event name which the backend is currently waiting on. To have a detailed list of the possible events consult this table in the official documentation.

The wait_event_type is the specific type of event or operation which the backend is currently waiting on. A list of possible values is available in the documentation.

Was this article helpful?

0 out of 0 found this helpful