How to enable core dumps and get stack traces from them

Raphael Vieira
Raphael Vieira
  • Updated

Core dumps are often used to diagnose or debug errors in Linux or UNIX programs. Core dumps can serve as a useful debugging aid for sysadmins to find out why an application or any other program crashed. This article provides some baselines on enabling core dumps on the main Linux distributions and on making a core dump of a process.

Introduction

A core dump is a file containing a process's address space (memory) when the process terminates unexpectedly. Core dumps may be produced on-demand (such as by a debugger), or automatically upon termination. Support engineers may ask for a core dump after a PostgreSQL (or any other application) crash to understand the state of PostgreSQL during the failure.

Below are the instructions for both systemd-based and non-systemd-based distributions.

Proper core dump configuration

In a modern Linux distribution you will most likely have systemd enabled. Systemd will not use /etc/security/limits.conf, the limits will be defined in the service unit or in the global systemd configuration.

  1. First create the directory in which the core dumps will be stored and change kernel.core_pattern to store the dumps in said directory. You can choose a different location to store the core dumps, as long as you find a partition with enough free space (at least of couple GB of disk space free). Moreover keep in mind that the users running the postmaster service (i.e., postgres and/or enterprisedb) will have to be able to write their core dumps in that directory.
# mkdir -p /var/coredumps
  1. To enable core dump collections and make that new persisted across reboots, execute:
# chmod 1777 /var/coredumps
# echo 'kernel.core_pattern=/var/coredumps/core-%e-%p' >> /etc/sysctl.conf
# sysctl -p
  1. Then, you can override the PostgreSQL service unit to define the core limit or, alternatively, specify a global default limit for all the services.
  • To set the limit only for the PostgreSQL service (substitute postgresql-16.service with your proper service name), run:
# export SYSTEMD_EDITOR=vim # or pick any other editor you want 
# systemctl edit postgresql-16.service

and paste the following snippet:

[Service]
LimitCORE=infinity

Then reload the service configuration:

# systemctl daemon-reload
  • To change the global default you can edit /etc/systemd/system.conf setting:
DefaultLimitCORE=infinity

and then restarting systemd

# systemctl daemon-reexec
  1. Restart PostgreSQL:
# systemctl stop postgresql-16
# systemctl start postgresql-16

At this point the core dumps should be enabled correctly. If you are hitting the bug (crash) and core dumps are still not generated, please continue reading to the next section.

Basic core dump verifications (troubleshooting)

In order to ensure that core dumps are configured properly (or if you are getting crashes and no core dumps are being generated), please check the following:

  1. sysctl kernel.core_pattern command needs to point out to /var/coredumps directory directly. Due to the fact that PostgreSQL core dumps are often big and problematic, EDB does not support any other way of saving core dumps. Please see Appendix B for explanation for various other core patterns.

  2. Validate that sysctl kernel.core_pipe_limit command returns 0 (no-limit, just in case)

  3. ls -ld /var/coredumps command should return sticky bit set drwxrwxrwt and world-writeable. Stick bit allows every user to write to his (and only his) files.

  4. Locate PostgreSQL postmaster PID and verify new limits are in effect:

ps auxw|grep postmaster
grep -E '^Limit|^Max core file size' /proc/<PROCMASTER_PID>/limits

The line we want to verify is Max core file size where the Soft Limit value is high enough to generate a core file:

Limit Soft Limit Hard Limit Units
Max core file size unlimited unlimited bytes

Both columns (soft/hard) should be unlimited. On hosts using the systemd service controller, the limit can also be verified using systemctl show -p LimitCORE <servicename>.service.

  1. Locate PostgreSQL postmaster PID again and check if it is having proper coredump_filters
ps auxw|grep postmaster
cat /proc/<PROCMASTER_PID>/coredump_filter

The Linux kernel defaults are coredump_filter = 0x33 (00110011). Sometimes one can have 0xff (11111111) probably due to TPAexec yaml config. It may be that this non-standard dump facility value causes issues. See man core(5) for documentation on this.

  • coredump_filter value of 0x31 inform kernel to NOT dump anonymous shared mappings[shared_buffers] thus the core dump should be much smaller
  • coredump_filter value of 0x33 (default) might include partial (and big) anonymous shared memory mappings and thus cut the core dump. This can be potential problem in case of big core dumps that are filling space.
  • If you are effected by nonstandard coredump_filter, please verify using systemctl show postgres.service | grep ExecStartPost for any line like ExecStartPost=/bin/bash -c 'echo 0xff > /proc/$MAINPID/coredump_filter' and modify it accordingly to 0x31 or 0x33

From there, the next step to is verify what the operating system does when a crash occurs.

Testing the core dump configuration

WARNING: the below steps are going to cause outage!

To be sure core dumps are correctly generated, you can verify if a core dump exists after sending a SIGSEGV signal to PostgreSQL:

  1. List PostgreSQL processes and their PIDs:
$ ps auxf | grep postgres

postgres 6839 0.0 6.7 356708 16440 ? S 09:39 0:00 /usr/pgsql-16/bin/postmaster -D /var/lib/pgsql/16/data
postgres 6841 0.0 0.7 211628 1792 ? Ss 09:39 0:00 \_ postgres: logger
postgres 6843 0.0 0.7 356708 1944 ? Ss 09:39 0:00 \_ postgres: checkpointer
postgres 6844 0.0 0.9 356708 2364 ? Ss 09:39 0:00 \_ postgres: background writer
postgres 6845 0.0 2.5 356708 6116 ? Ss 09:39 0:00 \_ postgres: walwriter
postgres 6846 0.0 1.1 357124 2856 ? Ss 09:39 0:00 \_ postgres: autovacuum launcher
postgres 6847 0.0 0.7 211624 1832 ? Ss 09:39 0:00 \_ postgres: stats collector
postgres 6848 0.0 1.0 357124 2604 ? Ss 09:39 0:00 \_ postgres: logical replication launcher
  1. Kill the postmaster process with SIGSEGV signal:
# kill -SIGSEGV 6839
  1. Check the directory with core dumps. The files will be named like:
# ls -ltrh /var/coredumps/
-rw_______ 1 root root 244K Feb 26 09:05 core-postmaster-6839
-rw_______ 1 root root 244K Feb 26 09:08 core-postmaster-6840
-rw_______ 1 root root 244K Feb 26 09:10 core-postmaster-6845

Generating backtraces (stack traces) from core dumps

Once you've enabled core dumps, if you experience a backend crash, a core dump will be generated by the operating system, and you'll be able to use gdb on it to collect useful debugging information. This information can be passed to the support and development team(s) for detailed analysis.

You will need gdb and debug symbols installed to be able to correctly read the core dump.

  • On RHEL-like systems, you can install the debug symbols using the debuginfo-install utility included in the yum-utils package.
# yum install gdb yum-utils
# debuginfo-install postgresql16-server
  • On Debian/Ubuntu-like distributions, one can install debug packages
# apt-get install gdb postgresql-16-dbgsym

or:

# apt-get install gdb postgresql-16-dbg

NOTE: We performed this test with various PostgreSQL versions. Please, check the name of the debug packages first.

NOTE: The installed version of the server runtime package and it's debug symbols packages need to exactly match down to the single digit, e.g. this is good situation

$ dpkg -l | grep postgresql-15
ii postgresql-15 15.5-1.pgdg110+1 amd64 The World's Most Advanced Open Source Relational Database
ii postgresql-15-dbgsym 15.5-1.pgdg110+1 amd64 debug symbols for postgresql-15

After having packages properly installed, execute gdb specifying the location of the postgres binaries and the core dump file, as shown below:

# ls -ltrh /var/coredumps/ # this will sort by time, select the most recent file below 
# gdb -q --batch -ex "bt full" /usr/pgsql-16/bin/postgres /var/coredumps/core-postmaster-6839 > /tmp/stacktrace.core-postmaster-6839.txt 2>&1

Attach the resulting (small) text file /tmp/stacktrace.core-postmaster-XXX.txt to the case. Please do not delete (often big) core dump as the development team might have further questions related to that core dump.

Appendix A: discussion of various core_pattern defaults

The contents of core_pattern can vary depending on the distribution used. If the value is:

  • 'core' - The core file will be created in the PGDATA directory. There are variations of this to set use meta data in the filename, or to use an absolute path instead of using a relative directory based on the working directory.
  • |/usr/share/apport/apport %p %s %c - apport is a Ubuntu application that typically saves core files in /var/crash
  • |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h - abrtd is a RedHat application for processing crashes. The default location for core files is /var/spool/abrt. In order to get core dumps you might need to add one or both of these settings to /etc/abrt/abrt-action-save-package-data.conf:
  • ProcessedUnpackaged = yes
  • OpenGPGCheck = no
  • /var/local/dumps/core.%e.%p - SLES dump location
  • Another option is having the core dumps passed through a UNIX pipe to a command. For instance on systemd systems: kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h will place the core dumps in /var/lib/systemd/coredump. From there they canbe listed, copied to stdout and even bring out gdb directly using coredumpctl.

To expedite and standardize the support process, EDB endorses only gathering core dumps via kernel.core_pattern=/var/coredumps/core-%e-%p pattern.

Was this article helpful?

0 out of 0 found this helpful