This article focuses on providing information about the basic settings of PgPool for streaming replication; however, it does not include SSL configuration. Additionally, other complementary settings exist, but they will not be covered in this article.
The parameter below specifies the type of replication used in your PostgreSQL environment, allowing PgPool to understand which type of replication it is managing. You can configure other types of replication, such as logical replication; however, this article only covers streaming replication.
backend_clustering_mode = 'streaming_replication'
These are the basic settings for connecting to PgPool. You can define the port for connecting to PgPool, which will then redirect you to one of the PostgreSQL backends. Additionally, you can configure the directory where PgPool will create the socket file, and specify which IP address(es) it listens to for connections using listen_addresses
. Setting *
allows connections from any IP address to reach PgPool.
listen_addresses = '*'
port = 9999
unix_socket_directories = '/tmp'
In the example below, there are three PostgreSQL instances, each on a different machine. If your PostgreSQL instance shares the same machine as PgPool and you want to configure PgPool to connect via localhost
, use '127.0.0.1'
instead of 'localhost'
, as PgPool does not accept 'localhost'
as a value. Each backend must have a unique number at the end of its configuration settings. For example, all settings for the first backend end with 0
, for the second backend with 1
, and for the third backend with 2
.
Configuration example:
backend_hostname0 = '192.168.1.100'
backend_port0 = 5444
backend_weight0 = 1
backend_data_directory0 = '/var/lib/edb/as16/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '192.168.1.101'
backend_port1 = 5444
backend_weight1 = 1
backend_data_directory1 = '/var/lib/edb/as16/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_hostname2 = '192.168.1.102'
backend_port2 = 5444
backend_weight2 = 1
backend_data_directory2 = '/var/lib/edb/as16/data'
backend_flag2 = 'ALLOW_TO_FAILOVER'
You can access PgPool using HBA authentication, which can be configured in a file called pool_hba.conf
. This file must be located in the PgPool configuration directory (e.g., /etc/pgpool-II/
or /etc/sysconfig/edb/
) and should be owned by the OS user, typically postgres
or enterprisedb
. The pool_hba.conf
file works similarly to PostgreSQL's pg_hba.conf
. A sample file, pool_hba.conf.sample
, can also be found in the configuration directory. Alternatively, you can disable HBA authentication to connect directly with simple clear text frontend authentication.
To enable HBA authentication, configure the following parameter:
enable_pool_hba = on
Examples of entries in the pool_hba.conf
:
host all all 0.0.0.0/0 scram-sha-256
host replication all 192.168.1.100/24 trust
For HBA authentication, you also need to provide a password file containing the usernames and passwords for the md5
and scram-sha-256
authentication methods. You must specify the full path to the password file. If the full path is not specified, PgPool will look for the file in the configuration directory. This file must be owned by the postgres
or enterprisedb
user.
pool_passwd = 'pool_passwd'
In the pool_passwd
file, you must include a line for each user and their password. The password can be in plain text, or it can be encrypted using MD5
or AES
encryption. Any changes to this file require a PgPool reload.
For plain text:
postgres:12345678
For MD5
encryption using EDB PgPool, you must use the pg_md5
encryption binary provided by PgPool (the binary is also available in the community version of PgPool but is typically located in a different directory, commonly /usr/bin/
). You should invoke the pg_md5
binary with the options -m
, -f
for the PgPool configuration file, -u
for the user whose password you want to encrypt, and -p
to provide the password. This will automatically insert the encrypted password line into the pool_passwd
file:
/usr/edb/pgpool4.5/bin/pg_md5 -m -f /etc/sysconfig/edb/pgpool4.5/PgPool.conf -u postgres -p
-The output in the file:
postgres:md5067ac55252970a27e3158d918448c59c
For AES
encryption (scram-sha-256
) using EDB PgPool, you must use the pg_enc
encryption binary provided by PgPool (also available in the community version, typically located in /usr/bin/
). You should invoke the pg_enc
binary with the options -m
, -f
for the PgPool configuration file, -u
for the user whose password you want to encrypt, and -p
to provide the password. Unlike MD5
, it requires an additional option -K
to provide a text to serve as the encryption key for generating the password. This will automatically insert the line into the pool_passwd
file:
/usr/edb/pgpool4.5/bin/pg_enc -m -f /etc/sysconfig/edb/pgpool4.5/PgPool.conf -u postgres -p -K my_key
-The output in the file:
postgres:AESIsc1LigF+Gr4FDQp2RFjuQ==
In addition, for scram-sha-256
authentication to work, a file named .PgPoolkey
must exist in the home directory of the postgres
or enterprisedb
user on the PgPool host. This file should contain the encryption key (e.g., my_key
in this example) that will be used to decrypt the password for every client connection attempt. The file must have 600
permissions. If the postgres
or enterprisedb
user does not have a home directory on the PgPool node after the PgPool software installation, you will need to create it manually.
Full example:
-- As root:
mkdir -p /var/lib/edb
chown enterprisedb:enterprisedb
su - enterprisedb
echo "my_key" > .pgpoolkey
chmod 600 .pgpoolkey
By configuring the parameters below, you can authenticate in PgPool without referencing pool_hba.conf
and pool_passwd
. This will allow you to connect directly to PostgreSQL using plain text authentication:
enable_pool_hba = off
pool_passwd = ''
allow_clear_text_frontend_auth = on
The following parameters are configured in PgPool to enable identification of the primary node in streaming replication when executing the SHOW POOL_NODES;
command. If these parameters are not set, PgPool will assume that all backends are standby nodes, as it will be unable to distinguish the primary node. Additionally, leaving the sr_check_password
parameter empty will cause PgPool to first attempt to retrieve the password for the sr_check_user
from the pool_passwd
file before attempting a connection without a password.
Example:
sr_check_period = 10
sr_check_user = 'enterprisedb'
sr_check_password = '12345678'
sr_check_database = 'postgres'
Every ten seconds, PgPool connects as the enterprisedb
user using the password 12345678
on the postgres
database to check the streaming replication status of each node.
SHOW POOL_NODES;
node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+---------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | 192.168.1.100 | 5444 | up | unknown | 0.333333 | standby | unknown | 0 | false | 0 | | | 2024-10-22 16:57:38
1 | 192.168.1.101 | 5444 | up | unknown | 0.333333 | primary | unknown | 1 | true | 0 | | | 2024-10-22 16:57:38
2 | 192.168.1.102 | 5444 | up | unknown | 0.333333 | standby | unknown | 0 | false | 0 | | | 2024-10-22 16:57:38
The following parameters verify if PostgreSQL is running and populate the pg_status
and pg_role
columns in the SHOW POOL_NODES;
command. Note that the status
column indicates whether the backend is attached (up
) or detached (down
) from PgPool, while pg_status
confirms whether PostgreSQL is running. You can check the health check statistics with the command SHOW POOL_HEALTH_CHECK_STATS;
. This configuration is similar to the ones for streaming replication, with the addition of health_check_timeout
, which defines the timeout for a hanging health check; health_check_max_retries
, which specifies the number of retries for a failed health check (shown as round in the log file); and health_check_retry_delay
, which is the interval between each retry of health_check_max_retries
. Additionally, leaving the health_check_password
parameter empty will cause PgPool to first attempt to retrieve the password for the health_check_user
from the pool_passwd
file before attempting a connection without a password:
health_check_period = 10
health_check_timeout = 20
health_check_user = 'enterprisedb'
health_check_password = '12345678'
health_check_database = 'postgres'
health_check_max_retries = 3
health_check_retry_delay = 5
SHOW POOL_NODES;
node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+---------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | 192.168.1.100 | 5444 | up | up | 0.333333 | standby | standby | 0 | false | 0 | | | 2024-10-22 17:07:20
1 | 192.168.1.101 | 5444 | up | up | 0.333333 | primary | primary | 0 | false | 0 | | | 2024-10-22 17:07:20
2 | 192.168.1.102 | 5444 | up | up | 0.333333 | standby | standby | 0 | true | 0 | | | 2024-10-22 17:07:20
(3 rows)
IMPORTANT: If you are using integration between EFM and PgPool, you should not configure the health check parameters, as this may cause conflicts with EFM. Allow EFM to handle the health checks instead of PgPool.
The parameter below enables connection pooling. Enabling or disabling it requires a PgPool restart:
connection_cache = on
The following parameters control the connection availability in the pool. The parameter num_init_children
defines the number of child processes that PgPool will spawn for client connections, while the parameter max_pool
defines the pool size for each child. Thus, the total pool size is (num_init_children * max_pool
). It's 128 in the example case:
num_init_children = 32
max_pool = 4
When set to on, the parameter below distributes incoming SELECT
statements at the session level among the configured backends:
load_balance_mode = on
By configuring the following parameter, load balancing is decided at the statement level instead of the session level:
statement_level_load_balance = on
Load balancing is also influenced by the backend_weight
parameter. If you set this parameter to zero for a backend, that backend will not receive any incoming SELECT
statements. This is useful for configuring the primary node to avoid receiving any SELECT
.
Consider the following example:
backend_weight0 = 1 ( standby node )
backend_weight1 = 0 ( primary node )
backend_weight2 = 1 ( standby node )
load_balance_mode = on
statement_level_load_balance = on
Run the command SELECT 1;
ten times then note the select_cnt
in SHOW POOL_NODES;
:
SHOW POOL_NODES;
node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+---------------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | 192.168.1.100 | 5444 | up | unknown | 0.500000 | standby | unknown | 6 | false | 0 | | | 2024-10-22 17:43:29
1 | 192.168.1.101 | 5444 | up | unknown | 0.000000 | primary | unknown | 0 | false | 0 | | | 2024-10-22 17:43:29
2 | 192.168.1.102 | 5444 | up | unknown | 0.500000 | standby | unknown | 4 | true | 0 | | | 2024-10-22 17:43:29
(3 rows)
In the example above, the primary node has an lb_weight
of 0
(configured in the parameter backend_weight1
), which prevents it from receiving SELECT
statements, while both standby nodes have an lb_weight
of 0.500000
, indicating that each has a 50% chance of receiving the next incoming SELECT
. However, it is not a perfect balance, as seen in the example where node 0 received 6 SELECT
statements and node 2 received 4. This discrepancy occurs because the load balancing is probabilistic, not deterministic, meaning the nodes won't always balance out perfectly in terms of received statements.