Lasso already provides global memory status from the /proc/meminfo
file, but in some cases we need to analyze the memory usage and allocation for specific process such as database backends. For this purpose, we need other files under the /proc
filesystem. This article explains which files and how to do this per-process memory usage analysis.
/proc
filesystem contains information about processes. This information is stored under /proc/<pid>
directory where <pid>
is the ID of the process. Among other files under this directory, we can use the maps
file. This is a text representation of the memory allocation of the process. Each line represents continuous memory chunks, for example:
56241c54a000-56241c614000 r--p 00000000 103:06 8133918 /home/edb/pg15/bin/postgres
56241c614000-56241cb0b000 r-xp 000ca000 103:06 8133918 /home/edb/pg15/bin/postgres
...(snip)...
56241d1c1000-56241d2b0000 rw-p 00000000 00:00 0 [heap]
...(snip)...
7f9ed4e29000-7f9ed4e2c000 rw-p 00000000 00:00 0
7ffc2d844000-7ffc2d865000 rw-p 00000000 00:00 0 [stack]
7ffc2d8d9000-7ffc2d8dd000 r--p 00000000 00:00 0 [vvar]
7ffc2d8dd000-7ffc2d8df000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
Each field from left to right in a line describes:
- Allocated address range
- Permission
- Offset
- Device
- iNode
- Path
The line without path
is typically a mmap
'ed memory chunk.
We can use this to analyze memory status of each process.
In Linux, we typically use malloc()
and free()
glibc
functions to allocate and de-allocate (or free) memory. In the database engine, these functions are wrapped with palloc()
and pfree()
, which provides MemoryContext
information. We can free all the memory chunks allocated in a MemoryContext
when it is no longer needed and this works very well to avoid memory leak.
On the other hand, glibc
malloc()
allocates the memory in two ways:
- Extend
heap
area. This is done when requested size is less than a threshold. Threshold may be different depending on the Linux distribution. In the case of Ubuntu 22.04, for example, the observed threshold is 128 KB. - Allocate using
mmap()
. This is done if requested size is equal to or larger than the threshold above.
Behavior of free()
is different depending upon how the memory is allocated:
- If the memory is allocated to the
heap
,free()
does not actually return the memory to the operating system. Returned memory chunk is marked as free or available area insideglibc
and can be used for futuremalloc()
calls. This area remains in theheap
until the process terminates. - If the memory is allocated using
mmap()
,free()
de-allocates the area and returns this to the operating system.
In this way, actual memory usage from operating system point of view can be different than from application point of view.
When we need to look into each process's memory usage, we need to analyze the maps
file. Because /proc
and underlying files are not real ones, we cannot simply collect them to a tarball. We need to copy maps files to regular files before making this to a tarball.
Here's an example script to collect maps
files as regular files:
#!/bin/bash
DEST=$1
pwd=$(pwd)
mkdir -p $DEST
cd /proc
for i in [0-9]*
do
cp $i/maps $pwd/$DEST/maps-$i
done
cd $pwd
Save content above as file backup_maps.sh
. Then we can archive $PATH
directory to a tarball.
Number of whole lines of maps
file for a single process can be in the order of 300,000 or more. Also, it is not practical to analyze such number of lines manually. You can see below how to load these lines into a database table for analysis.
Table structure can be like:
CREATE TABLE maps
(
test_label text, /* Label for a test case */
hostname text, /* Hostname */
pid int, /* pid */
chunk_size bigint, /* Mmeory chunk size calculated from the address range */
start_addr text, /* Start address of the chunk */
end_addr text, /* End address of the chunk */
perms text, /* Permission */
offset_val text, /* Offset */
dev text, /* Device */
inode text, /* Inode */
pathname text /* Path to the module. No path when it is allocated by mmap(). */
/* Heap and stack are indicated by [Heap] and [Stack] */
);
First three columns are not included in maps
file. We need to supply these values from outside.
For each maps
file, SQL statement to load the contents can be done with the following awk
script:
#!/usr/bin/gawk
# Call sequence:
#
# gawk -f maps2table.awk test_label=xxx host_name=yyy pid=ppp <infile>
#
# This script reads /proc/*/maps file and convert this into SQL statement to import this to
# "maps" database table.
#
BEGIN {
print "BEGIN;"
}
{
chunk_region = $1;
perms = $2;
offset = $3;
dev = $4;
inode = $5;
if (NF == 6)
pathname = "'" $6 "'";
else
pathname = "NULL";
split(chunk_region, aa, "-");
start_addr = aa[1];
end_addr = aa[2];
chunk_size = strtonum("0x" end_addr) - strtonum("0x" start_addr);
printf "INSERT INTO maps VALUES ('%s', '%s', %d, %d, '%s', '%s', '%s', '%s', '%s', '%s', %s);\n",
test_label, host_name, pid, chunk_size, start_addr, end_addr, perms, offset, dev, inode, pathname;
}
END {
print "COMMIT;"
}
Save content above as file maps2table.awk
.
Then can use the following script to load all the maps
files in /proc
like:
#!/bin/bash
id=$1
host=$2
for i in maps-*
do
gawk test_table=$id host_name=$host pid=${i##maps-} -f `which maps_table.awk` $i
done
Save content above as file maps2table_all.awk
.
Here we assume that maps
files are copied using the backup_maps.sh
script above. Each maps
file name is changed to maps-<pid>
where <pid>
is the PID of the process to which the maps
file belongs to.
With the SQL commands generated by maps2table.awk
and maps2table_all.awk
, we can load maps
information to the database table like:
psql -f <sqlfile>
Once the data is loaded to the table, we can analyze the memory usage of processes in several ways using various SQL statements.
The following is the example to get heap
and stack
memory usage for each processes running EPAS in specific measure and server, ordered by the amount of allocated memory:
SELECT
test_label,
hostname,
pid,
lpad(to_char(sum(chunk_size), 'FM999,999,999,999'), 15) AS total_heap_and_stack
FROM
maps
WHERE
test_label = 'enterprise-db-RT95314-20230809_0145' AND
pid IN
(
SELECT pid
FROM maps
WHERE
test_label = 'enterprise-db-RT95314-20230809_0145' AND
pathname LIKE '%post%'
) AND
(pathname = '[stack]' OR pathname = '[heap]')
GROUP BY
test_label,
hostname,
pid
ORDER BY
sum(chunk_size) DESC,
test_label,
hostname,
pid;
The result looks like:
test_label | hostname | pid | total_heap_and_stack
-------------------------------------+----------+---------+----------------------
enterprise-db-RT95314-20230809_0145 | myhost | 1466247 | 7,298,633,728
enterprise-db-RT95314-20230809_0145 | myhost | 1466122 | 7,297,798,144
enterprise-db-RT95314-20230809_0145 | myhost | 1465744 | 7,297,544,192
enterprise-db-RT95314-20230809_0145 | myhost | 1463925 | 2,017,427,456
enterprise-db-RT95314-20230809_0145 | myhost | 1463923 | 880,758,784
enterprise-db-RT95314-20230809_0145 | myhost | 1463932 | 792,735,744
enterprise-db-RT95314-20230809_0145 | myhost | 1465797 | 57,286,656
enterprise-db-RT95314-20230809_0145 | myhost | 1464544 | 57,196,544
enterprise-db-RT95314-20230809_0145 | myhost | 1465036 | 56,877,056
enterprise-db-RT95314-20230809_0145 | myhost | 1465908 | 55,054,336
enterprise-db-RT95314-20230809_0145 | myhost | 1465041 | 54,763,520
enterprise-db-RT95314-20230809_0145 | myhost | 1464496 | 54,697,984
enterprise-db-RT95314-20230809_0145 | myhost | 1465827 | 54,673,408
...(snip)...
enterprise-db-RT95314-20230809_0145 | myhost | 1463948 | 1,691,648
enterprise-db-RT95314-20230809_0145 | myhost | 1463947 | 1,650,688
enterprise-db-RT95314-20230809_0145 | myhost | 1463949 | 1,650,688
enterprise-db-RT95314-20230809_0145 | myhost | 1463917 | 1,462,272
enterprise-db-RT95314-20230809_0145 | myhost | 1463920 | 1,413,120
enterprise-db-RT95314-20230809_0145 | myhost | 1463921 | 1,413,120
enterprise-db-RT95314-20230809_0145 | myhost | 1463919 | 1,114,112
enterprise-db-RT95314-20230809_0145 | myhost | 225630 | 847,872
enterprise-db-RT95314-20230809_0145 | myhost | 1463916 | 847,872
enterprise-db-RT95314-20230809_0145 | myhost | 1463918 | 847,872
enterprise-db-RT95314-20230809_0145 | myhost | 225633 | 712,704
(1521 rows)
We can combine the above information with actual backend process using Lasso. Lasso tarball comes with the postgresql/running_activity.out
file. This is actually pg_stat_activity
contents and we can combine using PID in the memory consumption list and running_activities.out
.
The running_activity.out
file contents can also be put into a database table which saves a lot of time for the analysis. For the ticket's case, it was sufficient to combine manually because the number of backends consuming a significant amount of memory was low.
Most of the memory used in the database backends, except for those allocated by accompanied libraries, is allocated with MemoryContext
information as mentioned above. This is chained from TopMemoryContext
and allocation status can be dumped to the database log using the following script:
#!/bin/bash
#
# Prints PID and TopMemoryContext information to PostgreSQL log.
#
# Can be useful for PostgreSQL up to version 13,
# where pg_log_backend_memory_contexts(pid) is not available.
#
sudo gdb -p $1 \
-ex "handle SIGUSR1 SIGUSR2 nostop noprint" \
-ex 'set unwindonsignal on' \
-ex 'call fprintf(stderr, "**** My PID: %d ********\n", MyProcPid)' \
-ex 'call MemoryContextStats(TopMemoryContext)' \
-ex 'detach' \
-ex 'quit'
Please note that the script prints its own process ID because the MemoryContextStats()
function does not print a process ID.
Then, we can combine this information as well with the above.
We can analyze various aspect of database backend memory usage by using the /proc/<pid>/maps
file. Because the amount of data for typical production database is huge, as large as around 300k lines, it is convenient to store the contents of this file in a database table for various analysis.
The result can be combined with Lasso output and TopMemoryContext
information for further analysis.