How to properly part and drop a BDR node

William Ivanski
William Ivanski

The purpose of this article is to demonstrate a scenario where a node needs to be parted and dropped from a BDR cluster. We discuss the concepts, options available and best practices about parting and dropping BDR nodes.

Current cluster

The following query shows the nodes that belong to the cluster:

SELECT node_name,
peer_state_name
FROM bdr.node_summary
ORDER BY 1;

Output would be, for example:

node_name | peer_state_name 
node1 | ACTIVE
node2 | ACTIVE
node2ls | ACTIVE
node3 | ACTIVE
node4so | ACTIVE
(5 rows)

Even though node2 is still part of the catalog, and even if the state says ACTIVE (which can be misguiding), actually it's down. We can confirm if a node is online or not by checking Raft Consensus, with the following query:

SELECT node_name,
state
FROM bdr.group_raft_details
ORDER BY 1;

Output would be:

node_name | state 
node1 | RAFT_LEADER 
node2 | 
node2ls | RAFT_FOLLOWER 
node3 | RAFT_FOLLOWER 
node4so | RAFT_FOLLOWER 
(5 rows)

Note how node2 doesn't have a Raft state, which means it's not communicating with Raft, hence it's offline.

Part the node

Now let's assume, for the purpose of this article, that node2 crashed and is not recoverable. So we need to remove this node from the cluster.

The correct way to do this would be:

SELECT bdr.part_node(
node_name := 'node2',
wait_for_completion := true,
force := false
);

On the parameters:

  • wait_for_completion: The part operation happens in background and involves multiple Raft communication among the remaining nodes in the cluster. At this point, the part operation also triggers the PART_CATCHUP step, which means that the remaining nodes are exchanging data from the node being parted, that happen to be available on their own databases. So it's important to set wait_for_completion := true, otherwise we might risk doing the next step, i.e., the drop, without the part had successfully finished, which can cause severe catalog issues and even data inconsistency.

  • force: The regular part is an operation orchestrated by Raft, and one of the steps is the PART_CATCHUP as explained earlier. If we set force := true, then the part operation will skip the PART_CATCHUP, and will also be a local operation, that will not happen on the other nodes. This immediately causes catalog inconsistency (because other nodes will still see the node as ACTIVE instead of PARTED), and as the PART_CATCHUP step is being skipped, then we risk data inconsistency too. So using force := true is never recommended, except cases where it's not possible to achieve Raft majority, or there are issues with Raft, or other catalog issues. Only the Support and PGD Teams are positioned to recommend to use force := true depending on the case.

The output of the part execution above should be:

NOTICE: node node2 has been removed from the bdr group
part_node

(1 row)

You can use the following query to confirm if the node has been parted correctly on all other nodes in the cluster:

WITH node_statuses AS (
SELECT jsonb_array_elements(
bdr.run_on_all_nodes($$
SELECT node_name,
peer_state_name
FROM bdr.node_summary s
WHERE node_name = 'node2'
$$)::jsonb
) AS j
)
SELECT s.node_name AS response_from,
n.j->'response'->'command_tuples'->0->>'node_name' AS node_name,
n.j->'response'->'command_tuples'->0->>'peer_state_name' AS peer_state_name
FROM node_statuses n
INNER JOIN bdr.node_summary s
ON s.node_id = (n.j->>'node_id')::oid
ORDER BY 1;

Output should be:

response_from | node_name | peer_state_name
node1 | node2 | PARTED
node2ls | node2 | PARTED
node3 | node2 | PARTED
node4so | node2 | PARTED
(4 rows)

Which means that the part was successful.

Drop the node

After we have parted the node, it's important to mention that, if we want to rejoin the node using the same name, then we can simply do so, even if the node with the same name is in the catalog as PARTED.

However, in some situations it might be interesting to drop the node that was parted, i.e., remove all information about this node from the catalog.

After we have parted the node, as described above, the cluster now looks like this:

node_name | peer_state_name
node1 | ACTIVE
node2 | PARTED
node2ls | ACTIVE
node3 | ACTIVE
node4so | ACTIVE
(5 rows)

The correct way to drop the node is as follows:

SELECT bdr.drop_node(
node_name := 'node2',
cascade := true,
force := false
);

On the parameters:

  • cascade: On BDR 3.6 and 3.7, this parameter was mandatory to be set to true, in order to drop the underlying pglogical catalog about the node. On PGD 4, this parameter is there for compatibility purposes. Always set it to true.

  • force: If for any reason the node is being used by any of the workers, trying to drop the node will not be possible if you use force := false. However, if that's the case, then you need to contact Support to troubleshoot this problem. Then Support can instruct you to use force := true only after making sure that it's safe.

Note that drop node is a local operation. So you need to run the query above on all nodes in the cluster. You can optionally use bdr.run_on_all_nodes() to drop the node on all nodes at the same time. First confirm the node is there as PARTED on all nodes, with the query:

WITH node_statuses AS (
SELECT jsonb_array_elements(
bdr.run_on_all_nodes($$
SELECT node_name,
peer_state_name
FROM bdr.node_summary s
WHERE node_name = 'node2'
$$)::jsonb
) AS j
)
SELECT s.node_name AS response_from,
n.j->'response'->'command_tuples'->0->>'node_name' AS node_name,
n.j->'response'->'command_tuples'->0->>'peer_state_name' AS peer_state_name
FROM node_statuses n
INNER JOIN bdr.node_summary s
ON s.node_id = (n.j->>'node_id')::oid
ORDER BY 1;

Output should be as follows:

response_from | node_name | peer_state_name
node1 | node2 | PARTED
node2ls | node2 | PARTED
node3 | node2 | PARTED
node4so | node2 | PARTED
(4 rows)

Now execute on any Master node that's still remaining in the cluster:

SELECT jsonb_pretty(
bdr.run_on_all_nodes($$
SELECT bdr.drop_node(
node_name := 'node2',
cascade := true,
force := false
)
$$)::jsonb
);

On any node, you can see the node is not in the catalog anymore:

node_name | peer_state_name
node1 | ACTIVE
node2ls | ACTIVE
node3 | ACTIVE
node4so | ACTIVE
(4 rows)

Now use the node statuses query above to confirm the dropped node does not exist anymore on all nodes in the cluster:

response_from | node_name | peer_state_name
node1 | |
node2ls | |
node3 | |
node4so | |
(4 rows)

Which means that the node was successfully dropped.

Was this article helpful?

0 out of 0 found this helpful