There are cases when one or more DDL (Data Definition Language) actions are blocked by a write activity that is currently "idle in transaction".
This DDL can start a lock cascade as UPDATE
events pile up behind it. Common techniques for finding locks may not recognize this situation because it is a lock chain and may not be immediately obvious.
Connections which are "idle in transaction" should be traced to their source. If possible, the offending transaction should either be committed or canceled on the client side. If this is not possible, pg_terminate_backend
can be used on the PID of the session to clear the block. Using pg_cancel_backend
is not possible, as that only cancels any executing queries and will not reset the transaction state.
If the problem is an active query, the offending session should be traced back to the client side to determine the importance of the query. Sometimes the issue is an ad-hoc query that can easily be canceled or redirected to a non-production or reporting system. If the owner of the remote query is not responding to inquiries, the query can be canceled with pg_cancel_backend
.
In very rare edge cases, a TCP stack error can interrupt Postgres while it is transmitting to a client session. These connections will respond to neither pg_cancel_backend
, nor pg_terminate_backend
. In many cases, they can even prevent restarting the Postgres daemon. The only way to clear these is to use the tcpkill
command as such:
sudo tcpkill -i eth0 -9 port [client_port]
If eth0
is not the primary network interface for the server, the correct interface should be substituted. It may take a while (up to a minute) before the network stack notices, but eventually the process will lose its network connection, and then Postgres will cleanly abort the session without taking down the whole database.
The easiest way to recognize this situation is to determine all active sessions in order of the query start time.
The following queries will display all active Postgres connections based on their last session activity. This way, it is possible to determine blockers whether they are "idle in transaction" or simply a very long-running action. Ideally, one or more queries toward the top of the list will be followed by one or more sessions that are in some kind of wait state.
These versions of Postgres have access to the wait_event
column instead of waiting
. As such, it's actually possible to see more detail at what may be blocking each query.
SELECT pid, datname, usename, state, client_addr, client_port, wait_event,
now() - query_start AS duration,
SUBSTRING(query, 1, 30) AS query_part
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY now() - state_change DESC
LIMIT 10;
These versions of Postgres only have access to the waiting
column, so it's only possible to verify that the query is being stopped by something. Still, result order will aid in cause analysis.
SELECT pid, datname, usename, state, client_addr, client_port, waiting,
now() - query_start AS duration,
SUBSTRING(query, 1, 30) AS query_part
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY now() - state_change DESC
LIMIT 10;
Root cause is almost always a session marked "idle in transaction" at the top of the result list.
In rare cases, it may be a long-running query, or even a session that is locked in an invalid TCP state. In each case, the resolution is different.