Data partitioning through Declarative Partitioning

Caterina Magini
Caterina Magini

This article gives a short overview on how to implement data partitioning using declarative partitioning, a feature that is available from PostgreSQL 10.

PostgreSQL 10 introduces native support for table partitioning. Declarative table partitioning is similar to table inheritance and reuses much of the existing infrastructure, but there are some differences. The parent (or master) is called partitioned table and is always empty. Children are called partitions and contain all of the actual data. Each partition has an implicit partitioning constraint (sometimes called bound). Multiple inheritance is not allowed, and partitioning and inheritance cannot be mixed. Partitions cannot have extra columns and may not allow NULL values unless the parent does. Tuples inserted into the parent are automatically routed to the correct partition, so triggers or rules to route the data to appropriate partition are not needed.

The supported partitioning methods are by range and list (starting with PostgreSQL 10), and hash (starting with PostgreSQL 11):

  • RANGE - table is partitioned into “ranges” defined by a key column or a set of columns, with no overlap between the ranges of values assigned to different partitions.

  • LIST - table is partitioned by explicitly list which key values appear in each partition, and are limited to a single column.

  • HASH - the table is partitioned by specifying a modulus and a remainder for each partition. Hash type partitions distribute the rows based on the hash value of the partition key. The remainder of the hash value when divided by a specified modulus is used to calculate which partition the row goes into (or can be found in). This can be used to divide rows as equally as possible into multiple partitions. Ideally, data is distributed uniformly across all partitions.

How does it work?

Declarative table partitioning reduces the amount of administrative and maintenance work required to partition data in PostgreSQL. It achieves this result by providing an improved syntax to create partitioned tables and their child partitions. The following sections provide a short example to be used for testing declarative partitioning.

Partitioned table

Create a partitioned table using the PARTITION BY clause, which includes the partitioning method (RANGE in the following example) and the list of columns to use as the "partition key".

CREATE TABLE orders (
id INT NOT NULL,
address TEXT NOT NULL,
order_date TIMESTAMP NOT NULL
) PARTITION BY RANGE (order_date);

Partitions

Create partitions using the PARTITION OF clause for CREATE TABLE. Each partition's definition must specify the bounds that correspond to the partitioning method. For RANGE partitioning this bound is the lower and upper range of values which the partition is to allow. For LIST partitioning this is a list of values the partition is to store, and for HASH partitioning this is the modulus and remainder of the hashes of the partition key.

Note that if the boundaries of the new partitions overlap with that of any existing partitions, an error will occur.

CREATE TABLE orders_y2018m01 PARTITION OF orders
FOR VALUES FROM ('2018-01-01 00:00:00') TO ('2018-02-01 00:00:00');
CREATE TABLE orders_y2018m02 PARTITION OF orders
FOR VALUES FROM ('2018-02-01 00:00:00') TO ('2018-03-01 00:00:00');
[...]
CREATE TABLE orders_y2018m12 PARTITION OF orders
FOR VALUES FROM ('2018-12-01 00:00:00') TO ('2019-01-01 00:00:00');

Note: It's important to remember that the upper bound of a RANGE partition is non-inclusive and the lower bound is inclusive. In the orders_y2018m12 partition above only timestamps from December 2018 are allowed.

Indexes

Optionally, you may wish to create indexes on the key column(s), as well as any other indexes. Starting from PostgreSQL 11, CREATE INDEX can be executed only once, directly on the partitioned table: this operation automatically creates an index on each partition, and also manages the creation of the index on any new partitions being added to the partitioned table.

In PostgreSQL 10 this step has to be repeated for each partition:

-- PostgreSQL11
CREATE INDEX ON orders (order_date);
-- PostgreSQL10
CREATE INDEX ON orders_y2018m01 (order_date);
CREATE INDEX ON orders_y2018m02 (order_date);
[...]
CREATE INDEX ON orders_y2018m12 (order_date);

The same goes for UNIQUE,PRIMARY KEY,EXCLUDE or FOREIGN KEY constraints: before PostgreSQL 11 these constraints had to be defined on each individual partition.

Note: PostgreSQL 11 does support UNIQUE and PRIMARY KEY indexes on partitioned tables, however all partition key columns must be part of the index. There is no "global" index.

Partition pruning

Finally, PostgreSQL 11 introduces the enable_partition_pruning configuration parameter in postgresql.conf (enabled by default). This improves the performance of both the query planner and the executor. It can be enabled during a session:

SET enable_partition_pruning = on; -- the default
SELECT count(*) FROM orders WHERE order_date >= DATE '2018-03-01';

Without partition pruning, the above query would scan each of the partitions of the orders table. With partition pruning enabled, the planner will determine the minimum set of partitions which overlap with the query's WHERE clause and "prune" the remainder, which results in surplus partitions not existing in the final query plan.

PostgreSQL 10 had a similar feature named constraint_exclusion. This does a similar job to partition pruning, only it's much less efficient as it must look at each partition's bound in order to determine if the partition must be scanned or not. Partition pruning is more efficient as it looks at the partitioned table's metadata rather than the partition's bound and in a single operation determines which partitions match the query's WHERE clause. This single operation allows partitioning to scale better with a larger number of partitions. Although in PostgreSQL 11, other limitations still exist which cause partitioning not to scale well with more than several hundreds of partitions.

Partition pruning not only allows partitions to be pruned during query planning, but it also allows pruning during the query's execution. This is useful as pruning can be done using values which were unknown during query planning, for example, values from immutable functions or actual column values from other parts of the plan. This is not possible with constraint exclusion.

In addition to the new features already mentioned, PostgreSQL 11 provides many other solutions, such as the introduction of the DEFAULT partition or the ability to update the partitioning key, moving rows automatically to the correct partition table.

PostgreSQL version 12 will include even more performance improvements in the partitioning space, but the improvements released in PostgreSQL 11 already make declarative partitioning a very powerful feature in PostgreSQL.

NOTE: 2ndQPostgres 11 includes most of the performance improvement patches written for PostgreSQL 12 in the partitioning area.

Was this article helpful?

0 out of 0 found this helpful