The system becomes totally unresponsive for a period of time. This may be caused by the operating system attempting to defragment huge memory pages. During this time the systems seems unresponsive and frozen, or the performance of the system (response times etc.) gets very erratic.
Transparent Huge Pages (THP) are enabled by default in Red Hat Enterprise Linux 6, 7, 8, 9 and CentOS 6 and 7 for all applications.
IMPORTANT: Transparent Huge Pages should not be confused with Huge Pages, which is a different feature.
The controls for THP are found in the sysfs (/sys
) tree under /sys/kernel/mm/transparent_hugepage
or /sys/kernel/mm/redhat_transparent_hugepage
, depending on the distribution and version. In the following we describe the first of these.
The values for /sys/kernel/mm/transparent_hugepage/enabled
can be one of the following:
-
always
- defragment every time huge pages are requested -
madvise
- defragment every time huge pages are requested withmadvise
-
never
- never defragment huge pages
To disable:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
# Depending on linux kernel version, you may need '0' instead of 'no'
echo no > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
Then, to prevent the changed values from being reset on server reboot you'll need to update the bootloader, typically grub:
On Red Hat based systems:
grubby --update-kernel=ALL --args='transparent_hugepage=never'
On Debian based systems (e.g. Ubuntu), edit /etc/default/grub
by appending transparent_hugepage=never
to the string set for GRUB_CMDLINE_LINUX
. After saving the file, run update-grub
to update the grub configuration.
Other Linux distributions (e.g. SUSE 11) have similar issues. They will need other means of making the changes permanent.
You can detect settings for transparent huge pages on your system as follows:
cat /sys/kernel/mm/transparent_hugepage/enabled
cat /sys/kernel/mm/transparent_hugepage/defrag
cat /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
In a simplified way, settings marked as 1
, yes
, on
, or [always] ... never
indicate that THP is enabled, while settings marked as 0
, no
, off
, or always ... [never]
indicate that THP is disabled. Note: the word inside []
is the effective setting.
But other values are possible for some of these kernel settings:
- always - an application requesting a huge page will stall on allocation failure and directly reclaim pages and compact memory in an effort to allocate a huge page immediately
-
defer - an application will wake
kswapd
in the background to reclaim pages and wakekcompactd
to compact memory so that a huge page is available in the near future (the huge page is then substituted later bykhugepaged
) -
madvise - equivalent to
always
for regions that have usedmadvise(MADV_HUGEPAGE)
-
defer+madvise - equivalent to
always
for regions that have usedmadvise(MADV_HUGEPAGE)
anddefer
for all other regions - never - self-explanatory
The primary benefit of Huge Pages is increased efficiency with the Translation Lookaside Buffer (TLB). When using larger pages, the TLB will have fewer entries to map larger amounts of virtual memory, so the TLB misses are less frequent and finding a buffer is faster. So Huge Pages have the potential to bring modest performance improvements in servers with large amounts of physical memory. Note: regular pages are commonly 4 kB in size, whereas Huge Pages are commonly 2 MB or 1 GB; but sizes may vary depending on kernel settings and hardware architecture.
However, when using Transparent Huge Pages, the kernel attempts to allocate the memory using Huge Pages whenever possible, even when the process does not need to request this in any way during the malloc
/ mmap
call.
Furthermore, Transparent Huge Pages expends extra effort to attempting to identify contiguous regular pages that it can then remap as Huge Pages into the TLB of the process. This is the defrag
activity of khugepaged
, that accounts such actions counted as pages_collapsed
.
But as the process uses its memory unknowingly of the page size, it might also free
sections of the Huge Page instead of the whole page, which forces the kernel to split the Huge Pages back into regular pages, which accounts for even further effort wasted.
Not only that, but Huge Pages are also swappable. For that, most current Linux kernels have to split the Huge Pages into regular pages and swap them (work has been done in the past few years to improve this situation).
The problems arise when free Huge Pages are not immediately available - typically due to a lack of physically contiguous memory. The system may either attempt to defragment allocated memory (move the allocated pages to free contiguous area for a Huge Page) or fall back to the regular 4 kB pages.
That is, the system may attempt to perform the compaction (defragmentation) either immediately when serving the memory allocation (always
) or in the background (defer
). Both approaches may have significant impact on performance. For always
this is fairly obvious, as the allocation will wait for the compaction (which is rather expensive) to complete. The defer
option moves the overhead to the background, which improves the situation, but the defragmentation and remapping of regular pages into Huge Pages may result in system-wide stalls negatively affecting performance.
Finally, as a side note, the main kernel address space itself is mapped with huge pages, reducing the TLB pressure from the Linux kernel code. In userland, no modifications to the applications are necessary (hence transparent). But there are ways to optimize its use. For applications that want to use huge pages, use of posix_memalign()
can also help ensure that large allocations are aligned to huge page (2 MB or 1 GB) boundaries.
In summary, PostgreSQL typically does not benefit greatly from Transparent Huge Pages, and the memory compaction and splitting often results in latency spikes or inconsistent/unpredictable duration of simple queries. The negative performance impact outweighs the small benefits, and it is therefore strongly recommended to disable THP (including khugepaged
) on PostgreSQL systems.
If desired, one can still use Huge Pages without THP by configuring the kernel to reserve memory as Huge Pages (which are never fragmented, defragmented, or swapped), and configuring PostgreSQL to request Huge Pages for shared memory allocations with huge_pages=try
(or on
).