Idle and Idle in transaction connections and impact on performance and ways to handle it

Amit Wakchaure
Amit Wakchaure

Idle sessions in Postgres are not as "free" as is often assumed. In this article we'll discuss the types of idle sessions that can happen, their effects, and their causes.

Introduction

In PostgreSQL parlance a session is synonymous with a database connection. Sessions have state in the form of various resources such as changeable parameters, prepared transactions, transactional state, and locks that can be acquired executing a transaction. "state" also has specific meaning pertaining to what the session is currently doing, which can be seen for all sessions (viewable by the current user) in pg_stat_activity's state column. Here I've added some extra notes to the manual's description of those states:

  • active: The backend is executing a query. This also necessarily implies that the session is currently executing a transaction since all queries in Postgres are run inside of transactions.
  • idle: The backend is waiting for a new client command. In this case the session does not have an active transaction running.
  • idle in transaction: The backend is in a transaction, but is not currently executing a query.
  • idle in transaction (aborted): This state is similar to idle in transaction, except one of the statements in the transaction caused an error. In this case the session does not actually still have a transaction running since that was aborted when the error was encountered, but the client needs to acknowledge that with a ROLLBACK before it can a run any further commands.
  • fastpath function call: The backend is executing a fast-path function. This is a largely deprecated function execution interface that we won't discuss further here.
  • disabled: This state is reported if track_activities is disabled in this backend. This effectively means that the session's state is unknown, hence EDB does not recommend ever running your servers with track_activities disabled.

This article focuses on the idle and idle in transaction states.

idle Session State

Users and admins alike often assume that sessions that aren't doing anything, i.e. they have open transactions running, incur no costs, but that is not true.

max_connections

The server's max_connections setting imposes a hard-limit on the number of connections (sessions) that may be running at any given point in time. Further, the server pre-allocates some data structures in shared memory for each potential connection at server startup time, meaning that changes to max_connections require a server restart and, thus, downtime. Should you find your server exhausting the available connections with your current max_connections limit but many of them are regularly in idle state then those idle sessions are effectively blocking those new connections that want to do work and, since increasing max_connections requires restarting your should look to limiting your connection lifetimes and/or using connection pooling to allow connection sharing by client processes, thereby reducing session idle time, before increasing max_connections.

Memory Allocation

In the course of doing work sessions will make many memory allocations, both to the Postgres-managed shared (memory) buffers pool and direct from the operating system. While many memory blocks allocated by sessions can be re-used by Postgres or the OS once no longer needed, many cannot and the act of paging to find those previously allocated memory segments that can be re-used is not free, especially if the server is already swapping, and on very busy systems can result in a non-insignificant affect on the server's load that can be hard to quantify in the moment. This is yet another reason imposing connection lifetimes is considered good connection management hygiene.

Even outside of allocations made directly by the sessions, each incurs a ~2MB (or more) memory overhead. That is hard to see in the various system and process memory metrics shown by operating systems, but, noting it here, consider that 500 running sessions will incur a 1GB+ memory footprint even if many are idle.

Transaction Snapshot Creation

A transaction's "snapshot" refers to the transactional state across all connections when the current transaction started, i.e. when in the global transaction stream each started, which is needed to implement and maintain ACID compliance by maintaining Concurrency Control. The work needed to determine the transactional state requires that the server communicate with each session's backend process such that even idle sessions are regularly consuming CPU as other's are doing work. If you've ever heard that PostgreSQL doesn't scale well beyond 400-500 (or even lower) connections this is where that comes from. The work needed for transaction snapshot creation from other sessions was greatly reduced with improvements in PostgreSQL 14, there is still some per-session work that needs doing such that idle sessions are not truly idle from a process perspective but the point remains: Large numbers of idle sessions on the server can slow down the transaction startup work done by active sessions.

idle in transaction Session state

The idle in transaction session state is distinct from the idle state. Where session in idle state is one that is waiting for a new transaction to be started, sessions in idle in transaction state have an open, active transaction started that has not yet committed or rolled back but also does not currently have an query running. While it is entirely normal when examining pg_stat_activity to see sessions in this state for very short, transient times (e.g. while the connected client does client-side work between queries), session that are in this state for long periods can cause a number of problems on your server that can degrade performance or outright block activity by other sessions.

Locks

Locks acquired by queries are held by the running transaction until it finishes by either committing or rolling back. This means that an active transaction that goes idle after acquiring locks on objects (e.g. tables, rows, etc.) will block other transactions that need conflicting locks on the same objects. When the held locks are commonly needed by application sessions, such as for updates to a commonly updated table, this can quickly create a "snowball" effect where the number of blocked sessions continues to grow like a snowball rolling down a hill, continually driving up the server load as more sessions block and even exhaust all of the available connections.

Should you find that you have sessions waiting on locks being held by sessions in idle in transaction state you can use these excellent queries from the PostgreSQL Wiki to get more details but, ultimately, the idle in transaction session will need to either end itself by committing or rolling back or be killed to release the locks it holds.

VACUUM - Table Bloat and Transaction ID Wraparound

As discussed earlier in this article, when a transaction begins it builds snapshot view of the database. That snapshot is used in conjunction with its transaction mode to determine whether or not it can see rows written and committed anywhere in the database after the transaction begins. If it cannot see updates or deletes to previously existing rows then those previously existing row versions must be kept intact in case the running transaction needs to read them. This then means that when a VACUUM is run on a table holding onto those previously existing rows it can not mark those rows as free for reuse. Those rows are what is referred to as "bloat" as they are effectively dead space that cannot be re-used until the idle transaction finishes, and when a transaction is left open for long periods while other sessions continue on with their UPDATE and DELETE queries that dead space bloat can grow to become a significant percentage of tables which then slows down all usage of them.

Another extreme failure that can result from transactions being left in idle state for long periods is Transaction ID Wraparound, which we describe in our How to handle transaction ID wraparound.

Monitoring

Monitoring of connections is primarily done via the system pg_stat_activity view mentioned previously in this article's introduction. Here are a few example queries to check for some of the conditions you'd want to watch for mentioned in this article.

Here are a couple queries that will return information for all and counts on all connections in idle state for more than 30 seconds:

SELECT
datid,
datname,
pid,
usesysid,
usename,
application_name,
client_addr,
client_port,
backend_start,
state, 
state_change AS state_change,
clock_timestamp() - state_change AS idle_duration,
query AS last_query,
query_start AS time_of_last_query,
now()
FROM pg_catalog.pg_stat_activity
WHERE state = 'idle'
AND now() - state_change > '30 seconds'::interval;

SELECT count(*)
FROM pg_catalog.pg_stat_activity
WHERE state = 'idle'
AND now() - state_change > '30 seconds'::interval;

Here is a similar query for connections in idle in transaction state, but returning the transaction-specific information from pg_stat_activity as well:

SELECT
datid,
datname,
pid,
usesysid,
usename,
application_name,
client_addr,
client_port,
backend_start,
state,
state_change AS state_change,
now() - state_change AS idle_in_txn_duration,
query AS last_query,
query_start AS time_of_last_query,
xact_start as transaction_start_time,
now() - xact_start AS transaction_duration,
backend_xmin xact_xmin,
backend_xid xact_xid,
now()
FROM pg_catalog.pg_stat_activity
WHERE state = 'idle in transaction'
AND now() - state_change > '30 seconds'::interval;

SELECT count(*)
FROM pg_catalog.pg_stat_activity
WHERE state = 'idle in transaction'
AND clock_timestamp() - state_change > '30 seconds'::interval;

Most PostgreSQL monitoring tools, such as our EDB's Postgres Enterprise Manager, will also have built-in checks and alerts for these.

Mitigation

Should you find yourself needing to kill a session in idle or idle in transaction state you can use pg_terminate_backend() with the sessions pg_stat_activity.pid value:

SELECT pg_terminate_backend(<pid>);

Or perhaps you wish to terminate all sessions that have been in idle state for more than 10 minutes:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
AND now() - state_change > '10 minutes'::interval;

Or all idle in transaction sessions that have been in that state for more than 30 seconds:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transcation'
AND now() - state_change > '30 seconds'::interval;

If your server is on a PG or EPAS version >= 14 you can use the server idle_session_timeout setting to automatically terminate sessions that stay in idle state for longer than the setting's configured value, and for servers on versions >= 9.6 (which includes all currently supported versions) you can do the same for sessions in idle in transaction state with the idle_in_transaction_timeout setting. Alternatively, when using a connection pooler, such as PgBouncer, those will typically provide connection lifetime settings that can do the same things, and often with more configurability.

Consider, though, that if you have an issue with application connections frequently staying in idle in transaction state for long periods you should fix the application code responsible so that is not happening. If you automatically kill them it may be harder to track down where the issue is in the application code versus being able to examine things while they are present. For example, while you can certainly ensure that the originating client hostname and port are logged for all connections using the %r value in your server's log_line_prefix setting, those values appear in pg_stat_activity's client_addr or client_hostname and client_port fields for active sessions and can then be used to track back sessions to their originating client processes, which may help to narrow down which section of your application is responsible.

Was this article helpful?

0 out of 0 found this helpful