Removing duplicates from pg_depend

Luciano Botti
Luciano Botti

This article provides a way to fix the pg_depend catalog function in order to reestablish replication on a BDR cluster.

In some specific scenarios, where the BDR Autopartition worker cannot create the partition due to a lock and retries, the pg_depend is stored duplicated.

When does this procedure needs to be executed?

  • When using a previous BDR version than 3.7.16 (fix provided since that version)
  • And also having the following error message in the database logs:
"ERROR: unexpected number of extension dependency records"

Important!!!

Before proceeding with the workaround of removing the duplicates from pg_depend, you will need to:

a)- Schedule an application downtime;

b)- As root, stop Postgres on the affected node. In our example, we are using node01 as the affected node;

c)- Remove pglogical,bdr from the shared_preload_libraries in Postgres configuration;

d)- Start Postgres again; this time, it will not start the pglogical/BDR workers, which is a pre-condition needed for the below procedure.

Procedure to TEST the removal of the duplicate dependency

We are providing first a TEST procedure to be used in a similar way for the removal of the rows from the custom created catalog tables.

1- We need to check what's the oid for pg_class and pg_extension:

bdrdb=# SELECT oid, relname FROM pg_class WHERE relname IN ('pg_class', 'pg_extension');
oid | relname
1259 | pg_class
3079 | pg_extension
(2 rows)

Those values are going to be used on other steps.

2- We are creating and loading the same data that is in pg_depend and pg_class into custom tables, in order to test this procedure before doing it in the real catalog tables:

SELECT * INTO pg_depend_e_node01 FROM pg_depend WHERE deptype = 'e';
SELECT oid,* INTO pg_class_node01 FROM pg_class;

NOTE: BDR is disabled on the node, so tables will not be replicated on other nodes.

3- After loading your pg_depend and pg_class data into custom catalog tables, we can use the following query to check the duplicated record (note the custom pg_depend and pg_class table names and the OIDs values from above):

SELECT c.relname,
d.classid,
d.objid,
d.objsubid,
d.refclassid,
d.refobjid,
d.refobjsubid,
d.deptype,
count(*)
FROM pg_depend_e_node01 d
JOIN pg_class_node01 c ON c.oid = d.objid
WHERE d.deptype = 'e'
AND d.classid = 1259
AND d.refclassid = 3079
AND c.relname LIKE 'conflict_history%'
GROUP BY c.relname,
d.classid,
d.objid,
d.objsubid,
d.refclassid,
d.refobjid,
d.refobjsubid,
d.deptype
HAVING count(*) > 1
ORDER BY 1;

That query returns all problematic tables, in our test example case just only 1:

relname | classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptyp
e | count
conflict_history_part_1147479802_547832982 | 1259 | 18516 | 0 | 3079 | 17675 | 0 | e
| 1

You can confirm that situation by checking the table bdr.autopartition_local_work_queue_status on node node01, it should have that work still pending:

workid relname partition workkind sql status
40 bdr.conflict_history conflict_history_part_1147479802_547832982 d SELECT bdr.autopartition_drop_partition('bdr.conflict_history_part_1147479802_547832982'); \N

4- After identifying the duplicates, we rely on using the ctid hidden system column to actually choose one of the row versions to delete. We can see both row versions with this query:

SELECT c.relname,
d.classid,
d.objid,
d.objsubid,
d.refclassid,
d.refobjid,
d.refobjsubid,
d.deptype,
count(*),
min(d.ctid),
max(d.ctid)
FROM pg_depend_e_node01 d
JOIN pg_class_node01 c ON c.oid = d.objid
WHERE d.deptype = 'e'
AND d.classid = 1259
AND d.refclassid = 3079
AND c.relname LIKE 'conflict_history%'
GROUP BY c.relname,
d.classid,
d.objid,
d.objsubid,
d.refclassid,
d.refobjid,
d.refobjsubid,
d.deptype
HAVING count(*) > 1
ORDER BY 1;

In our example case, the output will be (note the min and max columns):

relname | classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
| count | min | max
conflict_history_part_1147479802_547832982 | 1259 | 18516 | 0 | 3079 | 17675 | 0 | e
| 2 | (6,23) | (6,34)
(1 row)

5- Then, we can try the DELETE statement (over the testing custom catalog tables). As postgres user on the bdrdb database on node node01, start a new transaction:

BEGIN;

6- Execute the DELETE:

WITH cte AS (
SELECT c.relname,
d.classid,
d.objid,
d.objsubid,
d.refclassid,
d.refobjid,
d.refobjsubid,
d.deptype,
count(*),
min(d.ctid),
max(d.ctid)
FROM pg_depend_e_node01 d
JOIN pg_class_node01 c ON c.oid = d.objid
WHERE d.deptype = 'e'
AND d.classid = 1259
AND d.refclassid = 3079
AND c.relname LIKE 'conflict_history%'
GROUP BY c.relname,
d.classid,
d.objid,
d.objsubid,
d.refclassid,
d.refobjid,
d.refobjsubid,
d.deptype
HAVING count(*) > 1
)
DELETE
FROM pg_depend_e_node01
USING cte
WHERE cte.deptype = 'e'
AND cte.classid = 1259
AND cte.refclassid = 3079
AND cte.min = ctid;
DELETE 1

IMPORTANT: If you don't see the correct number of rows being deleted, then abort with ROLLBACK;!!

7- Before finishing the transaction, execute the check query again:

WITH cte AS (
SELECT c.relname,
d.classid,
d.objid,
d.objsubid,
d.refclassid,
d.refobjid,
d.refobjsubid,
d.deptype,
count(*),
min(d.ctid),
max(d.ctid)
FROM pg_depend_e_node01 d
JOIN pg_class_node01 c ON c.oid = d.objid
WHERE d.deptype = 'e'
AND d.classid = 1259
AND d.refclassid = 3079
AND c.relname LIKE 'conflict_history%'
GROUP BY c.relname,
d.classid,
d.objid,
d.objsubid,
d.refclassid,
d.refobjid,
d.refobjsubid,
d.deptype
HAVING count(*) > 1
)
SELECT d.*
FROM pg_depend_e_node01 d
JOIN cte c ON c.min = d.ctid;

It should return no duplicates now:

classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
(0 rows)

8- Commit the previous transaction:

COMMIT;

Translating the query to use the REAL catalog table names

1- Once you have TEST the previous procedure you need to perform similar steps, but with the REAL catalog tables:

SELECT c.relname,
d.classid,
d.objid,
d.objsubid,
d.refclassid,
d.refobjid,
d.refobjsubid,
d.deptype,
count(*),
min(d.ctid),
max(d.ctid)
FROM pg_depend d
JOIN pg_class c ON c.oid = d.objid
WHERE d.deptype = 'e'
AND d.classid = 1259
AND d.refclassid = 3079
AND c.relname LIKE 'conflict_history%'
GROUP BY c.relname,
d.classid,
d.objid,
d.objsubid,
d.refclassid,
d.refobjid,
d.refobjsubid,
d.deptype
HAVING count(*) > 1;

IMPORTANT: You can use the above query to figure out whether or not the node is affected by the bug.

2- Translating the DELETE query to use REAL catalog table names is easy, just paste the query above into the cte:

WITH cte AS (
SELECT c.relname,
d.classid,
d.objid,
d.objsubid,
d.refclassid,
d.refobjid,
d.refobjsubid,
d.deptype,
count(*),
min(d.ctid),
max(d.ctid)
FROM pg_depend d
JOIN pg_class c ON c.oid = d.objid
WHERE d.deptype = 'e'
AND d.classid = 1259
AND d.refclassid = 3079
AND c.relname LIKE 'conflict_history%'
GROUP BY c.relname,
d.classid,
d.objid,
d.objsubid,
d.refclassid,
d.refobjid,
d.refobjsubid,
d.deptype
HAVING count(*) > 1
)
DELETE
FROM pg_depend
USING cte
WHERE cte.deptype = 'e'
AND cte.classid = 1259
AND cte.refclassid = 3079
AND cte.min = ctid;

3- After carefully deleting the duplicated rows from pg_depend, on node node01:

a- Stop Postgres;

b- Add pglogical,bdr back to shared_preload_libraries;

c- Start Postgres again.

Was this article helpful?

0 out of 0 found this helpful