Using txid_status() to make applications resilient against connection loss

Craig Ringer
Craig Ringer
  • Updated

It can happen that the application issues a COMMIT, and while it waits for the server's answer the connection is interrupted.

In that case, the application does not know whether the transaction actually committed, or if it should be retried.

The application can use the new txid_status() PostgreSQL function to solve this problem.

This is a light-weight alternative to the classical approach of using Two Phase Commit (2PC) and a transaction resolver.

The Problem

This is the normal workflow when an application commits a transaction:

  1. Application sends Query message with content COMMIT
  2. Server receives Query message
  3. Server performs COMMIT
  4. Server sends the outcome of the transaction
  5. Application receives the outcome of the transaction

If the connection fails before step 1, or after step 5, then the application knows the outcome of the transaction and can decide whether to retry it or not, based on the application model.

Conversely, if the connection fails in the "grey zone" between step 1 and step 5 then the application doesn't receive a clear answer on whether the transaction was actually committed or not (step 3).

The Solution

The application must read the id of the current transaction before attempting COMMIT. Should the connection fail without a clear outcome, the application can reconnect and verify the outcome of that specific transaction id using txid_status().

Application Requirements and Changes

The application:

  • must use explicit transactions, i.e. BEGIN and COMMIT statements; otherwise the transaction id cannot be read before attempting COMMIT.

  • should be able to retry a transaction in case it fails. This implies remembering all the changes applied within that transaction, until the transaction outcome is confirmed. Some applications choose instead to handle transaction failure by reporting the error to the user.

Worked Example

In this example we show a workflow that complies with the above conditions:

BEGIN;
-- (perform the writes)
SELECT txid_current_if_assigned();

The application will then receive the id of the pending transaction from the server.

The application must store locally that id, together with all the information required to repeat the writes, until it is eventually committed, aborted, or even abandoned (i.e. the application could also decide that the transaction has become obsolete in the meantime due to other events).

At this point, the application issues COMMIT. If the connection fails without returning a clear outcome of the transaction, then the application can operate as follows:

  1. Reconnect
  2. Run the following query:
SELECT txid_status(9999999999);

where 9999999999 must be replaced by the id of the transaction that we are checking, which was received and stored before attempting COMMIT.

The result, as documented in the PostgreSQL manual, can be one of the following four values:

  • 'committed' if the transaction was actually committed;
  • 'aborted' if the transaction was actually aborted (rolled back);
  • 'in progress' if the transaction is still ongoing;
  • NULL if the transaction is too old and PostgreSQL does not remember its outcome any more.

The application can use this result to decide whether to re-execute the transaction, or consider it completed and remove it from the local list of pending transactions.

The in-progress result needs special handling, because it means that the transaction is still ongoing, for instance if the network disconnection was not complete and the server session was left open.

The application could decide to wait until the transaction completes, but this could take a long time due to network and server-side timeout. In most cases it is preferrable to terminate the hung session as follows:

SELECT pid, pg_terminate_backend(pid)
FROM pg_catalog.pg_stat_activity
WHERE backend_xid = CAST('9999999999' AS xid);

Here, as above, 9999999999 must be replaced by the id of the transaction that we are checking.

The reason for writing the transaction id as a string casted to another type (xid) is that backend_xid exposes the transaction id as a value of xid type, which is different from the bigint that we got initially, so we need to convert our bigint to a xid if we want our query to work.

Note that a xid is essentially a bigint truncated to a 32 bit unsigned integer, which reflects the way transaction ids are handled internally by PostgreSQL.

Related to

Was this article helpful?

0 out of 0 found this helpful