When planning for a multi-tenant Postgres instance, there are two possible strategies to consider. One being database per tenant (customer) and the other, one schema per tenant. We will refer to the former as "Database Tenancy" and the latter as "Schema Tenancy".
Let's consider some aspects about both approaches, considering the security, scalability, maintenance, disaster recovery, replication and major upgrades.
Security-wise, both approaches are similar, assuming each one of your customers will have one or more database users and those users won't be shared between customers.
If you use Database Tenancy (i.e., each customer has a different database), then you can create the user, the database and then revoke all privileges from all other users:
testdb=# CREATE USER customer1 WITH PASSWORD 'secret';
CREATE ROLE
testdb=# CREATE DATABASE customer1 OWNER customer1;
CREATE DATABASE
testdb=# REVOKE ALL PRIVILEGES ON DATABASE customer1 FROM PUBLIC;
REVOKE
As the customer1
owns the database, it can connect and write to the database:
$ psql -h localhost -p 5432 -d customer1 -U customer1
Password for user customer1:
psql (14.2, server 14.0.0-beta1.1 (Debian 14.0.0-beta1.1.+deb10))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.
customer1=> CREATE TABLE tbl (msg TEXT);
CREATE TABLE
But trying to connect with a different non-superuser user, will error out with:
$ psql -h localhost -p 5432 -d customer1 -U customer2
Password for user customer2:
psql: error: connection to server at "localhost" (::1), port 5432 failed: FATAL: permission denied for database "customer1"
DETAIL: User does not have CONNECT privilege.
Assuming in my example that the testdb
database will be the single database in the instance, and each customer will have a schema inside the same database, then similar behavior can be accomplished with Schema Tenancy, for example:
testdb=# CREATE SCHEMA customer1 AUTHORIZATION customer1;
CREATE SCHEMA
testdb=# REVOKE ALL PRIVILEGES ON SCHEMA customer1 FROM PUBLIC;
REVOKE
testdb=# GRANT ALL PRIVILEGES ON SCHEMA customer1 TO customer1;
GRANT
testdb=# ALTER DEFAULT PRIVILEGES IN SCHEMA customer1 REVOKE ALL PRIVILEGES ON TABLES FROM PUBLIC;
ALTER DEFAULT PRIVILEGES
testdb=# ALTER DEFAULT PRIVILEGES IN SCHEMA customer1 GRANT ALL PRIVILEGES ON TABLES TO customer1;
ALTER DEFAULT PRIVILEGES
Once connected, the user can set the search_path
to be able to refer to tables without needing to use the schema name, which might be helpful for applications. Any table created without specifying a schema name will be stored under the first schema in the search_path
:
testdb=> SET search_path = customer1, public;
SET
testdb=> CREATE TABLE test_table (id INTEGER);
CREATE TABLE
testdb=> \dt+ customer1.*
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
customer1 | test_table | table | customer1 | permanent | heap | 0 bytes |
(1 row)
testdb=> INSERT INTO test_table VALUES (100);
INSERT 0 1
testdb=> INSERT INTO test_table VALUES (200);
INSERT 0 1
testdb=> INSERT INTO test_table VALUES (300);
INSERT 0 1
testdb=> SELECT * FROM test_table;
id
100
200
300
(3 rows)
A second customer, despite being able to connect to the "all-in-one" database, won't be able to access any object inside the schema belonging to a different customer:
$ psql -h localhost -p 5432 -d testdb -U customer2 -c "SELECT * FROM customer1.test_table"
Password for user customer2:
ERROR: permission denied for schema customer1
LINE 1: SELECT * FROM customer1.test_table
^
However, even though customer2
can't access any data from tables in the customer1
schema, customer2
will still be able to list all schemas and list all objects within those schemas.
The decision on what's the best approach for you depends on:
1- How many tables do you expect per tenant?
2- How many tenants do you expect?
These questions are important because of the following points:
-
Storing too many tables (regardless of which schema) in a single Postgres database can cause the Postgres catalog to become too large and even bloated. From experience we have seen performance issues caused by catalog bloat when the database has more than 10k tables.
-
Each database (regardless of if there are connections on it or not) costs 1MB on the server RAM. So if you have 1024 databases, then Postgres will already be consuming 1GB of RAM only for the databases existence on the instance.
If the number of database tenants grow too much, one option would be to increase the server RAM or even create a secondary server to receive new tenants. Or you can have multiple servers from start, one per region or customer category, for example.
Some security policies might require you to provide the customer data to your customer any time they request, or when the contract between you and your customer reaches an end, etc. If you choose Database Tenancy, then generating a complete dump of the customer data would be as easy as:
pg_dump -d customer1 -f customer1.dmp
It might also be easier to have a "template" database with all tables pre-created, and when you need to add a new customer, you can just:
CREATE DATABASE customer1 OWNER customer1 TEMPLATE mytemplate;
Then Postgres will create the customer1
database as a copy of the mytemplate
database.
The same technique above can be used for "forking" an existing customer database, for example:
CREATE DATABASE customer1_dev OWNER devteam TEMPLATE customer1;
Which can be helpful, for example, for developers to implement something on the customer1_dev
database, which later you can deploy to the customer1
database. However, please bear in mind that the command above requires first that there is no connection to the customer1
database.
For High Availability purposes, attaching a physical standby to this Postgres instance would work regardless of the tenancy strategy you choose, because the physical standby holds an exact physical copy of the Postgres data.
However, logical replication works per database. So if for example you decide to create a logical subscriber for any reason (for example to implement a reporting server or even a Data Warehouse solution), then you need to bear in mind that the logical replication configuration should be done per each database you want to logically replicate, and each database will correspond to a replication stream over the network.
It's strongly recommended to have a physical standby to this instance, which will provide High Availability considering all the tenants.
It's also crucial that you setup a backup server using Barman, which will keep physical copies of the entire Postgres instance according to a retention policy you specify.
If for example one of your customers inadvertently deletes some data from inside their own database, then you need to restore the entire instance (all tenants) to a separate recovery server, then grab the lost data from the recovery server and re-add it into the production server.
Having Database Tenancy allows you to create separate database dumps using pg_dump
as explained above. Depending on how often you perform those per-database dumps, restoring one of them might be much faster because it contains only the tenant data. On the other hand, the database dump provides only a specific point in time view of the data, while the whole physical backup maintained by Barman allows for Point-In-Time Recovery if needed. The recommendation if you choose Database Tenancy is to keep both Barman and per-database dumps with some periodicity.
At some point you will need to perform a Postgres major upgrade on this server.
If you choose the Schema Tenancy strategy, then you can easily setup a logical replication between your "all-in-one" database and the logical subscriber that should be running the target Postgres major you would like to upgrade to.
With the Database Tenancy strategy, then logical replication is still possible, but might not be viable depending on how many databases you currently have on the instance.
While using logical replication allows for a more flexible and "near zero" downtime major upgrade, regardless of the tenancy strategy you choose, you can always perform a major upgrade in-place using the pg_upgrade
tool, which will upgrade all databases. But the upgrade using pg_upgrade
should be done offline and might require a longer downtime than a major upgrade using logical replication.
In summary, the choice of a tenancy strategy between Database Tenancy or Schema Tenancy depends on how many tenants you expect and how many user tables. Bear in mind that too many database weighs on RAM while too many tables on the same database can impact on performance. Both have similar security and maintenance aspects, though. Database Tenancy might impose a challenge if you plan to use logical replication on too many databases, but on the other hand it might be the easier to maintain and provides some isolation benefits depending on your business requirements.