Analyzing memory usage of processes using /proc

Koichi Suzuki
Koichi Suzuki

Lasso already provides global memory status from the /proc/meminfo file, but in some cases we need to analyze the memory usage and allocation for specific process such as database backends. For this purpose, we need other files under the /proc filesystem. This article explains which files and how to do this per-process memory usage analysis.

What information to use

/proc filesystem contains information about processes. This information is stored under /proc/<pid> directory where <pid> is the ID of the process. Among other files under this directory, we can use the maps file. This is a text representation of the memory allocation of the process. Each line represents continuous memory chunks, for example:

56241c54a000-56241c614000 r--p 00000000 103:06 8133918                   /home/edb/pg15/bin/postgres
56241c614000-56241cb0b000 r-xp 000ca000 103:06 8133918                   /home/edb/pg15/bin/postgres
...(snip)...
56241d1c1000-56241d2b0000 rw-p 00000000 00:00 0                          [heap]
...(snip)...
7f9ed4e29000-7f9ed4e2c000 rw-p 00000000 00:00 0
7ffc2d844000-7ffc2d865000 rw-p 00000000 00:00 0                          [stack]
7ffc2d8d9000-7ffc2d8dd000 r--p 00000000 00:00 0                          [vvar]
7ffc2d8dd000-7ffc2d8df000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0                  [vsyscall]

Each field from left to right in a line describes:

  • Allocated address range
  • Permission
  • Offset
  • Device
  • iNode
  • Path

The line without path is typically a mmap'ed memory chunk.

We can use this to analyze memory status of each process.

Memory allocation in Linux and Database

In Linux, we typically use malloc() and free() glibc functions to allocate and de-allocate (or free) memory. In the database engine, these functions are wrapped with palloc() and pfree(), which provides MemoryContext information. We can free all the memory chunks allocated in a MemoryContext when it is no longer needed and this works very well to avoid memory leak.

On the other hand, glibc malloc() allocates the memory in two ways:

  • Extend heap area. This is done when requested size is less than a threshold. Threshold may be different depending on the Linux distribution. In the case of Ubuntu 22.04, for example, the observed threshold is 128 KB.
  • Allocate using mmap(). This is done if requested size is equal to or larger than the threshold above.

Behavior of free() is different depending upon how the memory is allocated:

  • If the memory is allocated to the heap, free() does not actually return the memory to the operating system. Returned memory chunk is marked as free or available area inside glibc and can be used for future malloc() calls. This area remains in the heap until the process terminates.
  • If the memory is allocated using mmap(), free() de-allocates the area and returns this to the operating system.

In this way, actual memory usage from operating system point of view can be different than from application point of view.

Collecting maps files

When we need to look into each process's memory usage, we need to analyze the maps file. Because /proc and underlying files are not real ones, we cannot simply collect them to a tarball. We need to copy maps files to regular files before making this to a tarball.

Here's an example script to collect maps files as regular files:

#!/bin/bash
DEST=$1
pwd=$(pwd)

mkdir -p $DEST
cd /proc
for i in [0-9]*
do
    cp $i/maps $pwd/$DEST/maps-$i
done

cd $pwd

Save content above as file backup_maps.sh. Then we can archive $PATH directory to a tarball.

Loading maps into a table

Number of whole lines of maps file for a single process can be in the order of 300,000 or more. Also, it is not practical to analyze such number of lines manually. You can see below how to load these lines into a database table for analysis.

Table structure can be like:

CREATE TABLE maps
(
    test_label	text,	/* Label for a test case */
    hostname	text,	/* Hostname */
    pid			int,	/* pid */
    chunk_size	bigint,	/* Mmeory chunk size calculated from the address range */
    start_addr	text,	/* Start address of the chunk */
    end_addr	text,	/* End address of the chunk */
    perms		text,	/* Permission */
    offset_val	text,	/* Offset */
    dev			text,	/* Device */
    inode		text,	/* Inode */
    pathname	text	/* Path to the module.  No path when it is allocated by mmap(). */
						/*	Heap and stack are indicated by [Heap] and [Stack] */
);

First three columns are not included in maps file. We need to supply these values from outside.

For each maps file, SQL statement to load the contents can be done with the following awk script:

#!/usr/bin/gawk
# Call sequence:
#
# gawk -f maps2table.awk test_label=xxx host_name=yyy pid=ppp <infile>
#
# This script reads /proc/*/maps file and convert this into SQL statement to import this to
# "maps" database table.
#
BEGIN {
	print "BEGIN;"
}
{
	chunk_region = $1;
	perms = $2;
	offset = $3;
	dev = $4;
	inode = $5;
	if (NF == 6)
		pathname = "'" $6 "'";
	else
		pathname = "NULL";
	split(chunk_region, aa, "-");
	start_addr = aa[1];
	end_addr = aa[2];
	chunk_size = strtonum("0x" end_addr) - strtonum("0x" start_addr);

	printf "INSERT INTO maps VALUES ('%s', '%s', %d, %d, '%s', '%s', '%s', '%s', '%s', '%s', %s);\n",
		test_label, host_name, pid, chunk_size, start_addr, end_addr, perms, offset, dev, inode, pathname;
}
END {
	print "COMMIT;"
}

Save content above as file maps2table.awk.

Then can use the following script to load all the maps files in /proc like:

#!/bin/bash
id=$1
host=$2
for i in maps-*
do
    gawk test_table=$id host_name=$host pid=${i##maps-} -f `which maps_table.awk` $i
done

Save content above as file maps2table_all.awk.

Here we assume that maps files are copied using the backup_maps.sh script above. Each maps file name is changed to maps-<pid> where <pid> is the PID of the process to which the maps file belongs to.

Loading to the table

With the SQL commands generated by maps2table.awk and maps2table_all.awk, we can load maps information to the database table like:

psql -f <sqlfile>

Analyzing the data

Once the data is loaded to the table, we can analyze the memory usage of processes in several ways using various SQL statements.

The following is the example to get heap and stack memory usage for each processes running EPAS in specific measure and server, ordered by the amount of allocated memory:

SELECT
    test_label,
    hostname,
    pid,
    lpad(to_char(sum(chunk_size), 'FM999,999,999,999'), 15) AS total_heap_and_stack
FROM
    maps
WHERE
    test_label = 'enterprise-db-RT95314-20230809_0145' AND
    pid IN
    (
        SELECT pid
        FROM maps
        WHERE
            test_label = 'enterprise-db-RT95314-20230809_0145' AND
            pathname LIKE '%post%'
    ) AND
    (pathname = '[stack]' OR pathname = '[heap]')
GROUP BY
    test_label,
    hostname,
    pid
ORDER BY
    sum(chunk_size) DESC,
    test_label,
    hostname,
    pid;

The result looks like:

             test_label              | hostname |   pid   | total_heap_and_stack 
-------------------------------------+----------+---------+----------------------
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1466247 |   7,298,633,728
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1466122 |   7,297,798,144
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1465744 |   7,297,544,192
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1463925 |   2,017,427,456
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1463923 |     880,758,784
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1463932 |     792,735,744
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1465797 |      57,286,656
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1464544 |      57,196,544
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1465036 |      56,877,056
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1465908 |      55,054,336
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1465041 |      54,763,520
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1464496 |      54,697,984
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1465827 |      54,673,408
...(snip)...
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1463948 |       1,691,648
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1463947 |       1,650,688
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1463949 |       1,650,688
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1463917 |       1,462,272
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1463920 |       1,413,120
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1463921 |       1,413,120
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1463919 |       1,114,112
 enterprise-db-RT95314-20230809_0145 |  myhost  |  225630 |         847,872
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1463916 |         847,872
 enterprise-db-RT95314-20230809_0145 |  myhost  | 1463918 |         847,872
 enterprise-db-RT95314-20230809_0145 |  myhost  |  225633 |         712,704
(1521 rows)

Combinming the result with pg_stat_activity

We can combine the above information with actual backend process using Lasso. Lasso tarball comes with the postgresql/running_activity.out file. This is actually pg_stat_activity contents and we can combine using PID in the memory consumption list and running_activities.out.

The running_activity.out file contents can also be put into a database table which saves a lot of time for the analysis. For the ticket's case, it was sufficient to combine manually because the number of backends consuming a significant amount of memory was low.

Combine the result with TopMemoryContext

Most of the memory used in the database backends, except for those allocated by accompanied libraries, is allocated with MemoryContext information as mentioned above. This is chained from TopMemoryContext and allocation status can be dumped to the database log using the following script:

#!/bin/bash
#
# Prints PID and TopMemoryContext information to PostgreSQL log.
#
# Can be useful for PostgreSQL up to version 13,
# where pg_log_backend_memory_contexts(pid) is not available.
#
sudo gdb -p $1 \
         -ex "handle SIGUSR1 SIGUSR2 nostop noprint" \
         -ex 'set unwindonsignal on' \
         -ex 'call fprintf(stderr, "**** My PID: %d ********\n", MyProcPid)' \
         -ex 'call MemoryContextStats(TopMemoryContext)' \
         -ex 'detach' \
         -ex 'quit'

Please note that the script prints its own process ID because the MemoryContextStats() function does not print a process ID.

Then, we can combine this information as well with the above.

Summary

We can analyze various aspect of database backend memory usage by using the /proc/<pid>/maps file. Because the amount of data for typical production database is huge, as large as around 300k lines, it is convenient to store the contents of this file in a database table for various analysis.

The result can be combined with Lasso output and TopMemoryContext information for further analysis.

Was this article helpful?

0 out of 0 found this helpful