The goal is to migrate a table from non-partitioned to partitioned, without causing a disruption. This is possible to be performed across BDR nodes in a BDR cluster.
For our example below, consider a 3-node BDR cluster.
On node1, create a table with some data. Note how we have a PK on field id
:
CREATE TABLE tbl (
id INTEGER NOT NULL,
dt TIMESTAMP WITHOUT TIME ZONE NOT NULL,
msg TEXT,
PRIMARY KEY (id)
);
INSERT INTO tbl
SELECT x, '2022-11-30 12:00:00', 'Past month ' || x
FROM generate_series(1, 1000000) x;
INSERT INTO tbl
SELECT x, '2022-12-01 12:00:00', 'This month ' || x
FROM generate_series(1000001, 2000000) x;
SELECT bdr.wait_slot_confirm_lsn(NULL, NULL);
These changes will get replicated to all other nodes, and that's our initial state.
So this is the table structure on all nodes at the moment:
bdrdb=# \d tbl
Table "public.tbl"
Column | Type | Collation | Nullable | Default
id | integer | | not null |
dt | timestamp without time zone | | not null |
msg | text | | |
Indexes:
"tbl_pkey" PRIMARY KEY, btree (id)
Now we need to create an unique index on the table, which should determine the replica identity. Even if the PK is different among the nodes, the replica identify should be the same. Creating the index can be done individually on all nodes, concurrently to not require a strong lock:
SET bdr.ddl_replication = off;
CREATE UNIQUE INDEX CONCURRENTLY ON tbl (id, dt);
RESET bdr.ddl_replication;
At this point, the table structure is:
bdrdb=# \d tbl
Table "public.tbl"
Column | Type | Collation | Nullable | Default
id | integer | | not null |
dt | timestamp without time zone | | |
msg | text | | |
Indexes:
"tbl_pkey" PRIMARY KEY, btree (id)
"tbl_id_dt_idx" UNIQUE, btree (id, dt)
Then to use the new unique index as a replica identity:
BEGIN;
SET LOCAL bdr.ddl_replication = off;
ALTER TABLE tbl REPLICA IDENTITY USING INDEX tbl_id_dt_idx;
COMMIT;
While the ALTER TABLE
above should be fast, it still imposes a strong lock on the table. So this is why we are suggesting to do this individually on all nodes, one by one.
After that, the table structure on all nodes will be:
bdrdb=# \d tbl
Table "public.tbl"
Column | Type | Collation | Nullable | Default
id | integer | | not null |
dt | timestamp without time zone | | not null |
msg | text | | |
Indexes:
"tbl_pkey" PRIMARY KEY, btree (id)
"tbl_id_dt_idx" UNIQUE, btree (id, dt) REPLICA IDENTITY
Now, on node2, the idea is to perform, in a single transaction:
1- Disable DDL replication;
2- Drop the table;
3- Create the table as partitioned;
4- Create the partitions;
5- Create the unique index matching the PK;
6- Change the replica identity to use the unique index;
7- Enable DDL replication (it's required for using bdr.resynchronize_table_from_node(...)
);
8- Run bdr.resynchronize_table_from_node(...)
to copy the table data from one of the nodes.
The complete transaction would be as follows:
BEGIN;
SET LOCAL bdr.ddl_replication = off;
DROP TABLE tbl;
CREATE TABLE tbl (
id INTEGER,
dt TIMESTAMP WITHOUT TIME ZONE,
msg TEXT,
PRIMARY KEY (id, dt)
)
PARTITION BY RANGE (dt);
CREATE TABLE tbl_p202211
PARTITION OF tbl
FOR VALUES FROM ('2022-11-01') TO ('2022-12-01');
CREATE TABLE tbl_p202212
PARTITION OF tbl
FOR VALUES FROM ('2022-12-01') TO ('2023-01-01');
CREATE UNIQUE INDEX ON tbl (id, dt);
ALTER TABLE tbl REPLICA IDENTITY USING INDEX tbl_id_dt_idx;
SET LOCAL bdr.ddl_replication = on;
SELECT bdr.resynchronize_table_from_node(
node_name := 'rt87174n1',
relation := 'public.tbl'::regclass
);
COMMIT;
After that, this is the structure on node 2 only:
bdrdb=# \d+ tbl
Partitioned table "public.tbl"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
id | integer | | not null | | plain | |
dt | timestamp without time zone | | not null | | plain | |
msg | text | | | | extended | |
Partition key: RANGE (dt)
Indexes:
"tbl_pkey" PRIMARY KEY, btree (id, dt)
"tbl_id_dt_idx" UNIQUE, btree (id, dt) REPLICA IDENTITY
Partitions: tbl_p202211 FOR VALUES FROM ('2022-11-01 00:00:00') TO ('2022-12-01 00:00:00'),
tbl_p202212 FOR VALUES FROM ('2022-12-01 00:00:00') TO ('2023-01-01 00:00:00')
And the data is all there, partitioned:
bdrdb=# SELECT count(*) FROM tbl;
count
2000000
(1 row)
bdrdb=# SELECT count(*) FROM ONLY tbl_p202211;
count
1000000
(1 row)
bdrdb=# SELECT count(*) FROM ONLY tbl_p202212;
count
1000000
(1 row)
Now if, on node 1 (which is non-partitioned yet), I try to insert a row:
INSERT INTO tbl VALUES (2000001, now(), 'From node 1 non-partitioned');
The row gets replicated to both nodes:
bdrdb=# SELECT * FROM tbl WHERE id = 2000001;
id | dt | msg
2000001 | 2022-12-01 19:40:54.169418 | From node 1 non-partitioned
(1 row)
On node 2, the row gets routed to the proper partition:
bdrdb=# SELECT * FROM ONLY tbl_p202211 WHERE id = 2000001;
id | dt | msg
(0 rows)
bdrdb=# SELECT * FROM ONLY tbl_p202212 WHERE id = 2000001;
id | dt | msg
2000001 | 2022-12-01 19:40:54.169418 | From node 1 non-partitioned
(1 row)
In the opposite direction, if I insert another row in node 2 (which is partitioned):
INSERT INTO tbl VALUES (2000002, now(), 'From node 2 partitioned');
The row gets properly replicated to the other nodes:
bdrdb=# SELECT * FROM tbl WHERE id = 2000002;
id | dt | msg
2000002 | 2022-12-01 19:43:47.59713 | From node 2 partitioned
(1 row)
Let's also try an UPDATE from node 1:
UPDATE tbl SET msg = 'Changed' WHERE id = 2000002;
It replicates fine to both node 2 (partitioned) and node 3 (non-partitioned):
bdrdb=# SELECT * FROM tbl WHERE id = 2000002;
id | dt | msg
2000002 | 2022-12-01 19:43:47.59713 | Changed
(1 row)
Let's try the UPDATE again, this time from node 2 (partitioned):
UPDATE tbl SET msg = 'Changed again' WHERE id = 2000002;
It replicates fine to the other 2 nodes, where the table is non-partitioned yet:
bdrdb=# SELECT * FROM tbl WHERE id = 2000002;
id | dt | msg
2000002 | 2022-12-01 19:43:47.59713 | Changed again
(1 row)
So replication is still working, besides the fact that the table is partitioned on one node and not partitioned in the others. This is possible because:
1- BDR supports Asymmetric Partitioning;
2- Logical replication uses the replica identity, even if the PK is different;
3- BDR allows to resynchronize the table data, even if the partitioning is different between the source and target nodes.
You can repeat this non-partitioned to partitioned migration one node at a time, until you have the table partitioned in all nodes.