How to find active blocking sessions

Shaun Thomas
Shaun Thomas

Issue

There are cases when one or more DDL (Data Definition Language) actions are blocked by a write activity that is currently "idle in transaction".

This DDL can start a lock cascade as UPDATE events pile up behind it. Common techniques for finding locks may not recognize this situation because it is a lock chain and may not be immediately obvious.

Resolution

Idle in Transaction

Connections which are "idle in transaction" should be traced to their source. If possible, the offending transaction should either be committed or canceled on the client side. If this is not possible, pg_terminate_backend can be used on the PID of the session to clear the block. Using pg_cancel_backend is not possible, as that only cancels any executing queries and will not reset the transaction state.

Active Query

If the problem is an active query, the offending session should be traced back to the client side to determine the importance of the query. Sometimes the issue is an ad-hoc query that can easily be canceled or redirected to a non-production or reporting system. If the owner of the remote query is not responding to inquiries, the query can be canceled with pg_cancel_backend.

Stuck Connection

In very rare edge cases, a TCP stack error can interrupt Postgres while it is transmitting to a client session. These connections will respond to neither pg_cancel_backend, nor pg_terminate_backend. In many cases, they can even prevent restarting the Postgres daemon. The only way to clear these is to use the tcpkill command as such:

sudo tcpkill -i eth0 -9 port [client_port]

If eth0 is not the primary network interface for the server, the correct interface should be substituted. It may take a while (up to a minute) before the network stack notices, but eventually the process will lose its network connection, and then Postgres will cleanly abort the session without taking down the whole database.

Diagnostic Steps

The easiest way to recognize this situation is to determine all active sessions in order of the query start time.

The following queries will display all active Postgres connections based on their last session activity. This way, it is possible to determine blockers whether they are "idle in transaction" or simply a very long-running action. Ideally, one or more queries toward the top of the list will be followed by one or more sessions that are in some kind of wait state.

Postgres 9.6+

These versions of Postgres have access to the wait_event column instead of waiting. As such, it's actually possible to see more detail at what may be blocking each query.

SELECT pid, datname, usename, state, client_addr, client_port, wait_event,
now() - query_start AS duration,
SUBSTRING(query, 1, 30) AS query_part
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY now() - state_change DESC
LIMIT 10;

Postgres 9.2-9.5

These versions of Postgres only have access to the waiting column, so it's only possible to verify that the query is being stopped by something. Still, result order will aid in cause analysis.

SELECT pid, datname, usename, state, client_addr, client_port, waiting,
now() - query_start AS duration,
SUBSTRING(query, 1, 30) AS query_part
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY now() - state_change DESC
LIMIT 10;

Root Cause

Root cause is almost always a session marked "idle in transaction" at the top of the result list.

In rare cases, it may be a long-running query, or even a session that is locked in an invalid TCP state. In each case, the resolution is different.

Was this article helpful?

0 out of 0 found this helpful