This article gathers the most common configuration and tools that need to be enabled on a database system so support engineers are able to properly troubleshoot performance issues in PostgreSQL.
Ensure that standardized PostgreSQL logging configuration is in place to be used by EDB support tools:
ALTER SYSTEM SET log_line_prefix = '%m [%p]: u=[%u] db=[%d] app=[%a] c=[%h] s=[%c:%l] tx=[%v:%x] ';
ALTER SYSTEM SET log_autovacuum_min_duration = '1s';
ALTER SYSTEM SET log_statement = 'ddl';
ALTER SYSTEM SET log_checkpoints = on;
ALTER SYSTEM SET log_lock_waits = on;
ALTER SYSTEM SET log_min_duration_statement = '3s';
ALTER SYSTEM SET log_connections = on;
ALTER SYSTEM SET log_disconnections = on;
-- restore the below ones to default to avoid skipping some events
ALTER SYSTEM RESET log_min_duration_sample;
ALTER SYSTEM RESET log_statement_sample_rate;
ALTER SYSTEM RESET log_transaction_sample_rate;
ALTER SYSTEM RESET log_temp_files;
select pg_reload_conf();
Be sure to also check standby and other nodes, so that the same settings are effective.
WARNING: when the above settings (especially the log_connections
and log_disconnections
) are in effect, please double-check your filesystem space usage and proper log rotation configuration. Please ensure you have filesystem monitoring in place to avoid causing downtime because of full filesystem.
It is absolutely critical when dealing with performance issues to enable sysstat
/SAR
OS data collections. On every host, ensure that sysstat
package is installed. Sysstat
/SAR
is going to collect OS metrics over time, and it is very lightweight utility. See How to install and configure SAR for more details.
(Optional) Autoexplain configuration, in case of dealing with nondeterministic SQL performance behaviour (suspected SQL plan shifts)
If dealing with sporadic and nondeterministic slowdowns of just certain SQLs and not whole OS/DB, where SQL plan shifts are suspected, then it is recommended to deploy auto_explain
to gather more logs for that. See Automatically logging query plans with auto_explain for more details.
In case of dealing with extended-protocol (e.g. used by pgJDBC) and statement_timeout
errors it is advisable to set log_parameter_max_length_on_error
. The default log settings do not log exact values query values, often hindering the troubleshooting effort, example:
postgres@postgres # set statement_timeout to '1s';
SET
postgres@postgres # select pg_sleep($1) \bind 2 \g
ERROR: canceling statement due to statement timeout
In logfile one would find just (generic) value $1:
2024-03-01 12:14:03.321 CET [10036]: u=[postgres] db=[postgres] app=[psql] c=[[local]] s=[65e1b862.2734:4] tx=[3/410:0] ERROR: canceling statement due to statement timeout
2024-03-01 12:14:03.321 CET [10036]: u=[postgres] db=[postgres] app=[psql] c=[[local]] s=[65e1b862.2734:5] tx=[3/410:0] STATEMENT: select pg_sleep($1)
In order to troubleshoot for which bind variables (parameters) it is timing out, it might sense to set log_parameter_max_length_on_error
to display exact binds on (all) errors:
ALTER SYSTEM SET log_parameter_max_length_on_error='-1';
SELECT pg_reload_conf();
With such setting in effect, this produces the following effect:
postgres@postgres # set statement_timeout to '1s';
SET
postgres@postgres # select pg_sleep($1) \bind 2 \g
ERROR: canceling statement due to statement timeout
CONTEXT: unnamed portal with parameters: $1 = '2'
And in logfile one would find the exact parameter used too:
2024-03-01 12:18:46.565 CET [10134]: u=[postgres] db=[postgres] app=[psql] c=[[local]] s=[65e1b915.2796:5] tx=[3/428:0] ERROR: canceling statement due to statement timeout
2024-03-01 12:18:46.565 CET [10134]: u=[postgres] db=[postgres] app=[psql] c=[[local]] s=[65e1b915.2796:6] tx=[3/428:0] CONTEXT: unnamed portal with parameters: $1 = '2'
2024-03-01 12:18:46.565 CET [10134]: u=[postgres] db=[postgres] app=[psql] c=[[local]] s=[65e1b915.2796:7] tx=[3/428:0] STATEMENT: select pg_sleep($1)
Related to