Core dumps are often used to diagnose or debug errors in Linux or UNIX programs. Core dumps can serve as a useful debugging aid for sysadmins to find out why an application or any other program crashed. This article provides some baselines on enabling core dumps on the main Linux distributions and on making a core dump of a process.
A core dump is a file containing a process's address space (memory) when the process terminates unexpectedly. Core dumps may be produced on-demand (such as by a debugger), or automatically upon termination. Support engineers may ask for a core dump after a PostgreSQL (or any other application) crash to understand the state of PostgreSQL during the failure.
Below are the instructions for both systemd-based and non-systemd-based distributions.
In a modern Linux distribution you will most likely have systemd
enabled.
Systemd will not use /etc/security/limits.conf
, the limits will be defined in the service unit or in the global systemd
configuration.
- First create the directory in which the core dumps will be stored and change
kernel.core_pattern
to store the dumps in said directory. You can choose a different location to store the core dumps, as long as you find a partition with enough free space (at least of couple GB of disk space free). Moreover keep in mind that the users running thepostmaster
service (i.e.,postgres
and/orenterprisedb
) will have to be able to write their core dumps in that directory.
# mkdir -p /var/coredumps
- To enable core dump collections and make that new persisted across reboots, execute:
# chmod 1777 /var/coredumps
# echo 'kernel.core_pattern=/var/coredumps/core-%e-%p' >> /etc/sysctl.conf
# sysctl -p
- Then, you can override the PostgreSQL service unit to define the core limit or, alternatively, specify a global default limit for all the services.
- To set the limit only for the PostgreSQL service (substitute
postgresql-16.service
with your proper service name), run:
# export SYSTEMD_EDITOR=vim # or pick any other editor you want
# systemctl edit postgresql-16.service
and paste the following snippet:
[Service]
LimitCORE=infinity
Then reload the service configuration:
# systemctl daemon-reload
- To change the global default you can edit
/etc/systemd/system.conf
setting:
DefaultLimitCORE=infinity
and then restarting systemd
# systemctl daemon-reexec
- Restart PostgreSQL:
# systemctl stop postgresql-16
# systemctl start postgresql-16
At this point the core dumps should be enabled correctly. If you are hitting the bug (crash) and core dumps are still not generated, please continue reading to the next section.
In order to ensure that core dumps are configured properly (or if you are getting crashes and no core dumps are being generated), please check the following:
-
sysctl kernel.core_pattern
command needs to point out to/var/coredumps
directory directly. Due to the fact that PostgreSQL core dumps are often big and problematic, EDB does not support any other way of saving core dumps. Please seeAppendix B
for explanation for various other core patterns. -
Validate that
sysctl kernel.core_pipe_limit
command returns 0 (no-limit, just in case) -
ls -ld /var/coredumps
command should return sticky bit setdrwxrwxrwt
and world-writeable. Stick bit allows every user to write to his (and only his) files. -
Locate PostgreSQL postmaster PID and verify new
limits
are in effect:
ps auxw|grep postmaster
grep -E '^Limit|^Max core file size' /proc/<PROCMASTER_PID>/limits
The line we want to verify is Max core file size
where the Soft Limit
value is high enough to generate a core file:
Limit Soft Limit Hard Limit Units
Max core file size unlimited unlimited bytes
Both columns (soft/hard) should be unlimited
.
On hosts using the systemd
service controller, the limit can also be verified using systemctl show -p LimitCORE <servicename>.service
.
- Locate PostgreSQL postmaster PID again and check if it is having proper
coredump_filters
ps auxw|grep postmaster
cat /proc/<PROCMASTER_PID>/coredump_filter
The Linux kernel defaults are coredump_filter = 0x33 (00110011). Sometimes one can have 0xff (11111111) probably due to TPAexec yaml config. It may be that this non-standard dump facility value causes issues. See man core(5) for documentation on this.
- coredump_filter value of 0x31 inform kernel to NOT dump anonymous shared mappings[shared_buffers] thus the core dump should be much smaller
- coredump_filter value of 0x33 (default) might include partial (and big) anonymous shared memory mappings and thus cut the core dump. This can be potential problem in case of big core dumps that are filling space.
- If you are effected by nonstandard coredump_filter, please verify using
systemctl show postgres.service | grep ExecStartPost
for any line likeExecStartPost=/bin/bash -c 'echo 0xff > /proc/$MAINPID/coredump_filter'
and modify it accordingly to 0x31 or 0x33
From there, the next step to is verify what the operating system does when a crash occurs.
WARNING: the below steps are going to cause outage!
To be sure core dumps are correctly generated, you can verify if a core dump exists after sending a SIGSEGV
signal to PostgreSQL:
- List
PostgreSQL
processes and their PIDs:
$ ps auxf | grep postgres
postgres 6839 0.0 6.7 356708 16440 ? S 09:39 0:00 /usr/pgsql-16/bin/postmaster -D /var/lib/pgsql/16/data
postgres 6841 0.0 0.7 211628 1792 ? Ss 09:39 0:00 \_ postgres: logger
postgres 6843 0.0 0.7 356708 1944 ? Ss 09:39 0:00 \_ postgres: checkpointer
postgres 6844 0.0 0.9 356708 2364 ? Ss 09:39 0:00 \_ postgres: background writer
postgres 6845 0.0 2.5 356708 6116 ? Ss 09:39 0:00 \_ postgres: walwriter
postgres 6846 0.0 1.1 357124 2856 ? Ss 09:39 0:00 \_ postgres: autovacuum launcher
postgres 6847 0.0 0.7 211624 1832 ? Ss 09:39 0:00 \_ postgres: stats collector
postgres 6848 0.0 1.0 357124 2604 ? Ss 09:39 0:00 \_ postgres: logical replication launcher
- Kill the postmaster process with
SIGSEGV
signal:
# kill -SIGSEGV 6839
- Check the directory with core dumps. The files will be named like:
# ls -ltrh /var/coredumps/
-rw_______ 1 root root 244K Feb 26 09:05 core-postmaster-6839
-rw_______ 1 root root 244K Feb 26 09:08 core-postmaster-6840
-rw_______ 1 root root 244K Feb 26 09:10 core-postmaster-6845
Once you've enabled core dumps, if you experience a backend crash, a core dump will be generated by the operating system, and you'll be able to use gdb
on it to collect useful debugging information.
This information can be passed to the support and development team(s) for detailed analysis.
You will need gdb
and debug symbols installed to be able to correctly read the core dump.
- On RHEL-like systems, you can install the debug symbols using the
debuginfo-install
utility included in theyum-utils
package.
# yum install gdb yum-utils
# debuginfo-install postgresql16-server
- On Debian/Ubuntu-like distributions, one can install debug packages
# apt-get install gdb postgresql-16-dbgsym
or:
# apt-get install gdb postgresql-16-dbg
NOTE: We performed this test with various PostgreSQL versions. Please, check the name of the debug packages first.
NOTE: The installed version of the server runtime package and it's debug symbols packages need to exactly match down to the single digit, e.g. this is good situation
$ dpkg -l | grep postgresql-15
ii postgresql-15 15.5-1.pgdg110+1 amd64 The World's Most Advanced Open Source Relational Database
ii postgresql-15-dbgsym 15.5-1.pgdg110+1 amd64 debug symbols for postgresql-15
After having packages properly installed, execute gdb
specifying the location of the postgres binaries and the core dump file, as shown below:
# ls -ltrh /var/coredumps/ # this will sort by time, select the most recent file below
# gdb -q --batch -ex "bt full" /usr/pgsql-16/bin/postgres /var/coredumps/core-postmaster-6839 > /tmp/stacktrace.core-postmaster-6839.txt 2>&1
Attach the resulting (small) text file /tmp/stacktrace.core-postmaster-XXX.txt
to the case. Please do not delete (often big) core dump as the development team might have further questions related to that core dump.
The contents of core_pattern
can vary depending on the distribution used. If the value is:
- 'core' - The core file will be created in the PGDATA directory. There are variations of this to set use meta data in the filename, or to use an absolute path instead of using a relative directory based on the working directory.
-
|/usr/share/apport/apport %p %s %c
- apport is a Ubuntu application that typically saves core files in/var/crash
-
|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h
- abrtd is a RedHat application for processing crashes. The default location for core files is/var/spool/abrt
. In order to get core dumps you might need to add one or both of these settings to/etc/abrt/abrt-action-save-package-data.conf
: ProcessedUnpackaged = yes
OpenGPGCheck = no
-
/var/local/dumps/core.%e.%p
- SLES dump location - Another option is having the core dumps passed through a UNIX pipe to a command. For instance on
systemd
systems:kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
will place the core dumps in/var/lib/systemd/coredump
. From there they canbe listed, copied tostdout
and even bring out gdb directly usingcoredumpctl
.
To expedite and standardize the support process, EDB endorses only gathering core dumps via kernel.core_pattern=/var/coredumps/core-%e-%p
pattern.