Thank you to Tomas Vondra and Jakub Wartak for their technical contributions to this article.
As a database changes in size over time, sometimes queries can become no longer performant due to changes in the data set.
To get the query performant again, we will need to investigate several factors e.g:
- The query plan.
- Ensuring statistics are updated for the planner.
- Ensuring too much bloat isn't present in the relations.
- Reviewing postgres parameters.
- Whether any indexes can be added.
- Whether hardware improvements are needed.
- Whether the query needs to be re-written.
Most of the time, the planner is clever enough to create a new, performant plan as the data set changes. For a very small % of cases a specific plan path needs to be forced (until running on a newer postgres version where the planner can handle that edge case).
auto_explain provides the ability to automatically log query plans of slow execution queries through the use of EXPLAIN
and the differing flags (ANALYZE
, etc).
This allows us to view the query plan, whether any misestimates were present, analyze the plan path, the differing operations performed and their timings. Additionally, it provides information such as whether disk reads and writes were present.
We can configure auto_explain
to be enabled either at server start (change requires downtime), or at session start (change does not require downtime).
We generally recommended changing shared_preload_libraries
, which requires a restart. This method is recommended as it guarantees that logging of plans will be effective for all sessions since the start of the PostgreSQL cluster.
The auto_explain
modules can be appended to the list of libraries in the shared_preload_libraries
settings in the postgresql.conf
file (if the parameter is not already set in postgresql.auto.conf
). For example:
-- main parameters
shared_preload_libraries = 'pg_stat_statements,auto_explain'
auto_explain.log_min_duration = '60s';
auto_explain.log_analyze = on;
--supplementary parameters for more information
auto_explain.log_buffers = on
auto_explain.log_timing = on
auto_explain.log_triggers = on
auto_explain.log_nested_statements = on
auto_explain.log_verbose = on
log_min_duration=60s
is an example. Please change the value depending on the use case.
Alternatively, we can use ALTER SYSTEM
to append it to the current configuration list in postgresql.auto.conf
:
SHOW shared_preload_libraries;
--main paramaters
ALTER SYSTEM SET shared_preload_libraries = '<previous_values>,auto_explain';
ALTER SYSTEM SET auto_explain.log_min_duration = '60s';
ALTER SYSTEM SET auto_explain.log_analyze = on;
--supplementary parameters for more information
ALTER SYSTEM SET auto_explain.log_buffers = on;
ALTER SYSTEM SET auto_explain.log_timing = on;
ALTER SYSTEM SET auto_explain.log_triggers = on;
ALTER SYSTEM SET auto_explain.log_nested_statements = on;
ALTER SYSTEM SET auto_explain.log_verbose = on;
Additionally, we recommend setting identical settings for both primary and standby databases, so that after potential failover(s), the effective logging settings will be the same.
If you cannot have downtime, one needs to use session_preload_libraries
. This can either be applied for all sessions, or only sessions of chosen users.
If you are not using a connection pooler, using session_preload_libraries
can decrease performance more than shared_preload_libraries
. This is because with shared_preload_libraries
the library has to be loaded into the executable for each new connection, whereas with shared_preload_libraries
it has already been loaded into postmaster. If using pooling, the connections can be re-used and this is less of a factor.
If you are using a connection pooler, and only wish to track a subset of users, use of session_preload_libraries
may have beneficial performance as we are not loading the executable into sessions that are not using it.
To apply to all sessions:
SHOW session_preload_libraries;
--main parameters
ALTER SYSTEM SET session_preload_libraries = '<previous_values>,auto_explain';
ALTER SYSTEM SET auto_explain.log_min_duration = '60s';
ALTER SYSTEM SET auto_explain.log_analyze = on;
--supplementary parameters for more information
ALTER SYSTEM SET auto_explain.log_buffers = on;
ALTER SYSTEM SET auto_explain.log_timing = on;
ALTER SYSTEM SET auto_explain.log_triggers = on;
ALTER SYSTEM SET auto_explain.log_nested_statements = on;
ALTER SYSTEM SET auto_explain.log_verbose = on;
SELECT pg_reload_conf();
log_min_duration=60s
is an example. Please change the value depending on the use case.
Or, to apply to only sessions for chosen users:
Check current settings:
SELECT rolname, rolconfig FROM pg_roles WHERE rolconfig IS NOT NULL;
Or as the specific user, run:
show session_preload_libraries;
show auto_explain.log_min_duration
...
Set per user:
-- main parameters
ALTER ROLE <username> SET session_preload_libraries = '<previous_values>,auto_explain';
ALTER ROLE <username> SET auto_explain.log_min_duration = '60s';
--supplementary parameters for more information
ALTER ROLE <username> SET auto_explain.log_buffers = on ;
ALTER ROLE <username> SET auto_explain.log_timing = on;
ALTER ROLE <username> SET auto_explain.log_triggers = on;
ALTER ROLE <username> SET auto_explain.log_nested_statements = on;
ALTER ROLE <username> SET auto_explain.log_verbose = on;
If no values of session_preload_libraries
were previously set for the role, when setting we can run ALTER ROLE appuser SET session_preload_libraries = 'auto_explain';
, and when disabling we can run ALTER role appuser RESET session_preload_libraries;
.
auto_explain
affects the performance of all queries, regardless of the log_min_duration
, as the plan information needs to be obtained at the time of execution, and auto_explain
does not know ahead of time whether a query will exceed log_min_duration
.
Our development team estimate the overhead to generally be small at 1-5%, but on worse hardware to be slower up to 200-300%. Also, the nature of the query will affect how much its performance is affected by auto_explain
.
In high OLTP scenarios (> thousands of queries per second) one should set auto_explain.sample_rate
as it logs (EXPLAIN) a fraction of the statements in each session. The default is 100% (1), meaning explain all the queries above auto_explain.log_min_duration
. A general starting point could be 0.1 to log every tenth statement, and adjusted upwards or downwards, depending on whether plans of different types of queries are being captured over time. This will reduce, CPU, RAM & disk consumption.
Another way to reduce CPU & RAM resource consumption if exact timings are not needed, and row counts is enough information, is to set auto_explain.log_timing=off
, to avoid system clock reads.
One should ensure that additional disk space is available as auto_explain
can be verbose in its logging, especially during performance overall system deteriorations where the database server is overloaded and everything is slowed down. To reduce unnecessary logging and save disk space, parameter auto_explain.log_min_duration
should be set above the expected average execute time of all queries run against the system. For example, if all queries are expected to run within 5 seconds, then a value of 6s
would be a reasonable threshold to set before plans are logged.
Related to