Using galloc sequences in BDR3

Luciano Botti
Luciano Botti
  • Updated

In this article we will explain how to use Globally-allocated sequences (or simply galloc) through a simple example.

Introduction

Globally-allocated sequences (galloc) let you allocate a range of a sequence to a BDR node. As this ranges became reserved for future usage, it provides a transparent way to generate unique ids using sequences across the whole BDR group.

Galloc sequences support all sequence data types provided by PostgreSQL: smallint, integer and bigint. Based on the datatype of the sequence, each node is given a range or chunk to use:

  • smallint - seq_chunk_size = 1000
  • integer - seq_chunk_size = 1000000
  • bigint - seq_chunk_size = 1000000000

When is the allocation happening?

When a BDR node uses the sequence for the first time, the node will allocate 2 chunks of seq_chunk_size, one for the current use plus a reserved chunk for future usage. This allocation process is done through a consensus call that needs the majority of the nodes to be present in the cluster.

After a node consumes all sequence values from the chunk is currently using, it will request for the next chunk available within the BDR cluster. In other words, when the node runs out of sequence values, will automatically perform a regular BDR RAFT consensus call again in order to maintain the reserved chunk size.

Usage example on global tables

Let's start with a simple table: public.categories. This table is defined as follows:

bdrdb=# \d public.categories
Table "public.categories"
Column | Type | Collation | Nullable | Default
category | integer | | not null | nextval('categories_category_seq'::regclass)
categoryname | character varying(50) | | not null |
Indexes:
"categories_pkey" PRIMARY KEY, btree (category)

The table has a few rows for now:

bdrdb=# SELECT * FROM public.categories;
category | categoryname
1 | Action
2 | Animation
3 | Children
(3 rows)

In this example the table public.categories belongs to bdrglobal replication set and all rows are replicated across other nodes in the cluster. The sequence owned by this table is public.categories_category_seq.

Pre-requisites

For galloc sequences to work there are a few pre-requisites:

  • The sequence should have the same starting value in all nodes.
  • The sequence kind should be changed to galloc.

In our example, as we have a few records on our table, we will need to restart the sequence from the last used value. So first we will need to run a DDL to change the starting value of the sequence on all BDR nodes:

ALTER SEQUENCE public.categories_category_seq START 3 RESTART;

And then change the sequence kind to galloc :

SELECT bdr.alter_sequence_set_kind('public.categories_category_seq', 'galloc');

NOTE: Is important to run the sentences in all nodes of the cluster.

Using the galloc sequence in Node1

Let's start using the sequence by inserting 3 rows in table public.categories from a BDR node called Node1:

INSERT INTO public.categories (categoryname) VALUES ('Node1 Test 1');
INSERT INTO public.categories (categoryname) VALUES ('Node1 Test 2');
INSERT INTO public.categories (categoryname) VALUES ('Node1 Test 3');

bdrdb=# select * from public.categories;
category | categoryname 
1 | Action
2 | Animation
3 | Children
4 | Node1 Test 1
5 | Node1 Test 2
6 | Node1 Test 3
(6 rows)

When a BDR node uses the sequence implicitly or explicitly by calling nextval(), the node will allocate 2 chunks of seq_chunk_size: the current one, which is already being used, plus a reserve chunk.

Notice how the first row consumed the sequence value 4. As it was the first use of the sequence, Node1 asked BDR global RAFT consensus for 2 chunks for the sequence public.categories_category_seq. These 2 chunks of values were granted to node Node1 and together they determine a range of 2 million values, from 4 to 2000003.

The next consumed values were 5 and 6, but these values are already inside the 2 chunks the node Node1 acquired.

While the node is consuming sequences values inside the range, it does not need to ask RAFT consensus for new chunks.

More information about the current state of the sequence could be obtained by the view bdr.sequence_alloc. This view provides the global and latest state for the sequence, which will be used when allocating for the next chunk:

bdrdb=# select * from bdr.sequence_alloc WHERE seqid = 'public.categories_category_seq'::regclass;
seqid | seq_chunk_size | seq_allocated_up_to | seq_nallocs | seq_last_alloc | seq_relname 
categories_category_seq | 1000000 | 2000003 | 2 | 2020-05-21 19:52:52.79444+00 | categories_category_seq
(1 row)

Testing the sequence in other nodes

Let's now connect to any other BDR node like Node2 and try to add 3 new rows:

INSERT INTO public.categories (categoryname) VALUES ('Node2 Test 1');
INSERT INTO public.categories (categoryname) VALUES ('Node2 Test 2');
INSERT INTO public.categories (categoryname) VALUES ('Node2 Test 3');

bdrdb=# select * from public.categories;
category | categoryname 
1 | Action
2 | Animation
3 | Children
4 | Node1 Test 1
5 | Node1 Test 2
6 | Node1 Test 3
2000004 | Node2 Test 1
2000005 | Node2 Test 2
2000006 | Node2 Test 3
(9 rows)

Notice how the first value inserted by node Node2 is 2000004. So 2 things happened:

  • The 3 rows we have just inserted, should be replicated back to Node1 and to all other nodes in the cluster.
  • When node Node2 inserted the first row, it allocated 2 chunks of values for sequence public.categories_category_seq from BDR RAFT consensus. Those 2 chunks would vary from seq_allocated_up_to + 1 and represent 2 million values.

For instance, node Node2 just allocated a range of values from 2000004 to 4000003. This can be confirmed by the global table bdr.sequence_alloc:

bdrdb=# SELECT * FROM bdr.sequence_alloc WHERE seqid = 'public.categories_category_seq'::regclass;
seqid | seq_chunk_size | seq_allocated_up_to | seq_nallocs | seq_last_alloc
categories_category_seq | 1000000 | 4000003 | 4 | 2020-05-21 20:02:15.957835+00
(1 row)

So for sequence public.categories_category_seq, currently we have:

To see the ranges currently assigned to given sequence on each node, use these queries:

  • Node Node1 is using range from 4 to 2000003.
SELECT last_value AS range_start, log_cnt AS range_end FROM categories_category_seq WHERE ctid = '(0,2)'; -- first range
range_start | range_end
4 | 1000003
(1 row)

SELECT last_value AS range_start, log_cnt AS range_end FROM categories_category_seq WHERE ctid = '(0,3)'; -- second range
range_start | range_end
1000004 | 2000003
(1 row)
  • Node Node2 is using range from 2000004 to 4000003.
SELECT last_value AS range_start, log_cnt AS range_end FROM categories_category_seq WHERE ctid = '(0,2)'; -- first range
range_start | range_end
2000004 | 3000003
(1 row)

SELECT last_value AS range_start, log_cnt AS range_end FROM categories_category_seq WHERE ctid = '(0,3)'; -- second range
range_start | range_end
3000004 | 4000003

NOTE You can't combine it to single query (like WHERE ctid IN ('(0,2)', '(0,3)')) as that will still only show first range.

The next node that tries to consume a value from this sequence will need to ask BDR RAFT consensus, and will receive a range from 4000004 to 6000003. The next node will allocate a range from 6000004 to 8000003, and so on.

What if we end consuming a complete range?

When any node consumes all sequence values from the chunk it is currently using, it will perform a regular BDR RAFT consensus call again, and will get the next available chunk of 1 million values, restoring the reserved chunk size.

In our last example Node1 was using the range from 4 to 2000003 and Node2 was using range from 2000004 to 4000003.

Let's see the situation after consuming completely the first chunk on Node1.

INSERT INTO public.categories (categoryname) select 'Node1 Test '||s.t
FROM generate_series(4,1000004) as s(t);
bdrdb=# select * from public.categories WHERE category > 1000002 ;
category | categoryname 
1000003 | Node1 Test 1000000
1000004 | Node1 Test 1000001
1000005 | Node1 Test 1000002
1000006 | Node1 Test 1000003
1000007 | Node1 Test 1000004
2000004 | Node2 Test 1
2000005 | Node2 Test 2
2000006 | Node2 Test 3
(8 rows)

After consuming the last value of the chunk (1000003) the node performs a new BDR consensus call and get the next available chunk for the cluster, and continue using the reserved chunk size for that node from value 1000004 to 2000003. The new reserved chunk for Node1 was allocated from 4000004 to 5000003 as we can see by running the following sentence:

bdrdb=# SELECT * FROM bdr.sequence_alloc WHERE seqid = 'public.categories_category_seq'::regclass;
seqid | seq_chunk_size | seq_allocated_up_to | seq_nallocs | seq_last_alloc
categories_category_seq | 1000000 | 5000003 | 5 | 2020-05-21 20:10:05.328365+00
(1 row)

The new situation for the sequence public.categories_category_seq is:

  • Node1 is using range from 4 to 2000003;
  • Node2 is using range from 2000004 to 4000003;
  • Node1 is using range from 4000004 to 5000003;

Conclusion

Through these practical examples, we studied the behavior of galloc (Globally Allocated) sequences. Now is time to leverage this feature against pros and cons.

The first drawback about using galloc sequences is that, during BDR RAFT consensus calls, there is a slight delay in getting the next value for the sequence, because the node is communicating with other nodes to settle up the RAFT consensus decision. But for sequences of type integer this happens once every 1 million values, per node, per sequence. For sequences of type bigint, this happens once every 1 billion values. And even being a delay it should be very quick, so it shouldn't represent a major problem.

The second drawback is that initially all nodes should share the same sequence value.

Also, it is not hard to see how those sequence values won't be ordered along the time, as nodes are consuming values from different ranges at the same time. This is the third and most important drawback about using galloc sequences.

Despite those 3 drawbacks that might not represent any problem on some clusters, galloc sequences offer benefits that are hopefully worthwhile:

  • Support for smaller integer data types.

  • Multiple nodes can insert values into the same table using the same sequence at the same time, regardless of whether the table is globally replicated or not, because each node operates at a different range (or chunk);

  • Range or chunk allocation is completely automatic and managed by BDR RAFT. Once defined as being of type galloc, such sequences don't require maintenance, even in the event of application failover.

  • Because galloc sequences guarantees that unique values are used only by a single node in the cluster, then applications can connect to multiple nodes at the same time. This helps to balance the load between the nodes of the cluster.

Related to

Was this article helpful?

0 out of 0 found this helpful