Troubleshooting can take less time when we have a detailed PostgreSQL log with a comprehensive format. The following article explains how we can achieve that.
This article provides best practices and suggestions to configure PostgreSQL log settings in order to record useful information about database activities. The PostgreSQL configuration file provides us with many parameters in the related # ERROR REPORTING AND LOGGING section, but just a few of them need to be configured as the default values are good enough for the most of the parameters.
An example of the suggested logging configuration in the postgresql.conf
file is:
logging_collector = on
log_destination = 'stderr'
log_checkpoints = on
log_lock_waits = on
log_min_duration_statement = 1000
log_line_prefix = '%m [%p]: u=[%u] db=[%d] app=[%a] c=[%h] s=[%c:%l] tx=[%v:%x] '
log_statement = 'ddl'
log_temp_files = 1024
log_autovacuum_min_duration = '1s'
This is not a one-size-fits-all recipe, it is a starting point. Different workloads benefit from different limits for log durations, and may or may not want more detailed logging of statements.
Following is a list of the settings that can make the difference during troubleshooting.
This configures where to write the log output. We suggest either stderr
(with logging_collector
enabled) or syslog
. The syslog
method is more efficient than stderr
, but could lose messages in case of high load.
- With
stderr
, output will be sent to the standard error file descriptor managed by the operating system. Remember to enable thelogging_collector
parameter when using thestderr
because the logging collector background process is designed to never lose messages. - With
syslog
, output will be sent log to thesyslog
process. Remember to set also the parameterssyslog_facility = 'LOCAL0'
andsyslog_ident = 'postgres'
. Also adjust thesyslog
configuration file to make it work as follows:
local0.* /var/log/postgresql
- With
csvlog
output will be written in CSV format, which is less human readable, but easier to be analyzed by any parser. It's typical to configure this as well as a more human-readable format, e.g
log_destination = 'stderr,csvlog'
This describes the log line format. The format codes are well explained in the Error Reporting and Logging chapter of the official documentation. Keep in mind that those parameters should be tailored to your needs, so adjust them accordingly.
The format you use is largely up to you, but we find this an useful starting point:
log_line_prefix = '%m [%p]: u=[%u] db=[%d] app=[%a] c=[%h] s=[%c:%l] tx=[%v:%x] '
as it logs:
- timestamp
(%m)
- process ID
(%p)
- user and database
- application_name (including for bdr and pglogical apply workers, etc)
- client address
- session ID and session statement-number, which is useful for reassembling session activity
- virtual and real transaction IDs, which are useful for tracing transaction boundaries
If you use syslog
as a log_destination
, you may omit the %m [%p]
from the start of the log format, as syslog adds this information automatically.
We also recommend adding a short fixed host name identifier to your log_line_prefix
. If you do this, you will need to update it after you clone PostgreSQL instance with e.g. pg_basebackup
, as the logging configuration is copied too. If you do this in a disciplined way it makes analyzing logs across various replicas a lot easier, but if not done carefully it just becomes confusing. It's very strongly recommended that you do this for all BDR and pglogical deployments. In the example below mydb1
has been used as the node identifier. Adjust yours accordingly.
log_line_prefix = '%m [%p]: mydb1 u=[%u] ..... '
It's useful to keep track of every change to the schema that could be relevant for later investigation work. For example when DROP
or ALTER
commands are involved.
We suggest to not log all statements if it's not strictly necessary. Setting log_statement
to either mod
or all
would increase the log size and flood it with too much information that could hide the useful ones.
Rather than using log_statement = 'all'
and possibly a tool like PgBadger for log analysis, you should usually prefer to use the pg_stat_statements extension. However, fine grained logging with all
is extremely useful to enable when analysing specific application or server problems where you want to be able to reassemble everything that a transaction did leading up to a problem, or how multiple transactions interact.
Functions, including pglogical and BDR configuration functions, will not be logged at log_statement = 'ddl'
.
It's useful to keep track of checkpoints triggered either by checkpoint_timeout
or max_wal_size
. Collecting them is useful to understand which of the above is triggering the CHECKPOINT
command, and how frequently. A checkpoint executed by reaching the max_wal_size
may impact disk activity more, causing latency on other I/O. Logging checkpoints helps when tuning checkpoint_timeout
, max_wal_size
and checkpoint_completion_target
to deliver smoother I/O.
(Note that max_wal_size
was checkpoint_segments
, and worked somewhat differently, on PostgreSQL versions 9.5 and older)
Logs every temporary file bigger than 1024 KB
written on disk. This is useful to understand which SQL statements require more memory than the default value and helps guide tuning decisions like changing work_mem
locally for specific statements.
This is useful in determining if lock waits are causing poor performance, long pauses, etc.
Logs every statements that requires more than 1 second to be executed. Useful to track down unoptimized queries.
You will want to adjust this value based on your workload.
Rather than setting it to very low values, prefer to use pg_stat_statements for statistical analysis of queries and workloads.
Another important point to be planned is the rotation of the logs. It is recommended to keep a certain amount of previous log files present on the server to help on the troubleshooting of issues, but, as these files can become voluminous, actions have to be taken that involves removing old log files after a certain period of time and compress the remaining ones if possible.
PostgreSQL has a built-in solution for log rotation, but we found that the application
logrotate
, which is installed by default on most Linux distributions, presents a
powerful solution that already considers many aspects of log rotation, such as compression and
automatically removal of obsolete log files.
To let logrotate
manage the PostgreSQL logs just create the file /etc/logrotate.d/postgres
.
An example of a logrotate
configuration is shown below and it has be adjusted to suit your database characteristics.
/var/log/postgres.log {
daily
rotate 14
compress
create 0640 postgres postgres
}
The configuration above will create the PostgreSQL log file named postgres.log
in the
/var/log
directory. This is a safe measure to avoid disk space problems on the $PGDATA.
It is strongly recommended to mount /var/log
in a disk partition different from where
$PGDATA is located.
The rest of the configuration is setting the rotation to happen daily, compress the old log files, and keep a maximum of 14 log files, which allows to have a log history of 2 weeks.