Data partitioning through inheritance

Caterina Magini
Caterina Magini

There are two ways in which to create partitioned tables in PostgreSQL. This article describes one of these two methods, data partitioning via table inheritance, and is mainly intended for PostgreSQL 8.1 to 9.6 users.

Data partitioning is a method of splitting the large (based on number of records, not number of column) tables into many smaller ones. For databases with extremely large tables, partitioning is a wonderful trick to dramatically improve database performance and make maintenance much easier.

The table partitioning system in PostgreSQL was first added in PostgreSQL 8.1 and was based on relation inheritance.

This method used constraints to define the partitions and rules or triggers to route the data to appropriate partition. We have to wait until PostgreSQL 10 to see declarative partitioning, which is much easier to set up and requires almost no maintenance.

Partitioning operates by having a parent table that will ultimately become empty, and child tables that INHERIT the parent table. When the parent table is queried, the child tables are also searched for data due to the INHERIT applied to the child tables. However, since child tables only contain subsets of the parent’s data, we add a CONSTRAINT on the table that does a CHECK and verifies that the data matches what’s allowed in the table. The constraint does two things: first it refuses data that doesn’t belong, and second it tells the query planner that only data matching this CHECK CONSTRAINT is allowed in this table, so if searching for data that doesn’t match the table, don’t even bother searching it (constraint_exclusion parameter). Lastly, we apply a trigger to the parent table that executes a stored procedure that decides which child table to put the data.

Below, a basic example on how to implement partitioning using inheritance.

First, create an empty master table, from which all of the child tables will inherit. This table will contain no data.

CREATE TABLE orders (
id INT NOT NULL,
address TEXT NOT NULL,
order_date TIMESTAMP NOT NULL
);

Then, create several** child tables** that each inherit from the master table, using non-overlapping CHECK constraints:

CREATE TABLE orders_2019 (
CHECK (order_date >= DATE '2019-01-01'
AND order_date < DATE '2020-01-01')
) INHERITS (orders);

CREATE TABLE orders_2018 (
CHECK (order_date >= DATE '2018-01-01'
AND order_date < DATE '2019-01-01')
) INHERITS (orders);

The next step is to create indexes on the partitioning key columns for each child table, to improve performance:

CREATE INDEX orders_2019_idx ON orders_2019(order_date);
CREATE INDEX orders_2018_idx ON orders_2018(order_date);

While SELECT, UPDATE, DELETE work seamlessly for queries on the parent table, the same is not the case with INSERT. The insertion location is managed by a suitable trigger function: this will be called before data will be inserted to master table, and should, without raising errors, redirect (route) the insert to appropriate partition. For this reason, it is generally called a router trigger.

Here you can find a simple and didactic example of a trigger function:

CREATE OR REPLACE FUNCTION orders_insert() RETURNS TRIGGER AS
$$
BEGIN
IF (NEW.order_date >= DATE '2019-01-01' AND NEW.order_date < DATE '2020-01-01') THEN
INSERT INTO orders_2019 VALUES (NEW.*);
ELSIF (NEW.order_date >= DATE '2018-01-01' AND NEW.order_date < DATE '2019-01-01') THEN
INSERT INTO orders_2018 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range: check orders_insert() function!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

And then create a BEFORE INSERTtrigger which calls the trigger function:

CREATE TRIGGER orders_insert_trigger
BEFORE INSERT ON orders
FOR EACH ROW EXECUTE PROCEDURE orders_insert();

NOTE: If you need to UPDATE the field order_date (partition constraint) you would need to write an appropriate UPDATE trigger that deletes the record in the original partition and inserts the record in the target partition.

Make sure that the constraint_exclusion parameter is enabled in postgresql.conf for better performance. This option, when set to "on" or "partition", makes PostgreSQL skip partitions that cannot contain matching rows during planning.

In the above example, remember to redefine the trigger function when the set of child tables changes.

As a final note, table inheritance for a long time has given developers a viable way to partition unwieldy tables in PostgreSQL. It also provides useful flexibility: for example, child tables are permitted to have extra columns not present in the parent table and data can be divided in a manner of the user’s choosing, but it requires a non-trivial amount of manual work.

Was this article helpful?

0 out of 0 found this helpful