In this article we will explain how to use Globally-allocated sequences (or simply galloc) through a simple example.
Globally-allocated sequences (galloc) let you allocate a range of a sequence to a BDR node. As this ranges became reserved for future usage, it provides a transparent way to generate unique ids using sequences across the whole BDR group.
Galloc sequences support all sequence data types provided by PostgreSQL: smallint, integer and bigint. Based on the datatype of the sequence, each node is given a range or chunk to use:
- smallint -
seq_chunk_size = 1000
- integer -
seq_chunk_size = 1000000
- bigint -
seq_chunk_size = 1000000000
When a BDR node uses the sequence for the first time, the node will allocate 2
chunks of seq_chunk_size
, one for the current use plus a reserved chunk for
future usage. This allocation process is done through a consensus call that needs
the majority of the nodes to be present in the cluster.
After a node consumes all sequence values from the chunk is currently using, it will request for the next chunk available within the BDR cluster. In other words, when the node runs out of sequence values, will automatically perform a regular BDR RAFT consensus call again in order to maintain the reserved chunk size.
Let's start with a simple table: public.categories
. This table is defined
as follows:
bdrdb=# \d public.categories
Table "public.categories"
Column | Type | Collation | Nullable | Default
category | integer | | not null | nextval('categories_category_seq'::regclass)
categoryname | character varying(50) | | not null |
Indexes:
"categories_pkey" PRIMARY KEY, btree (category)
The table has a few rows for now:
bdrdb=# SELECT * FROM public.categories;
category | categoryname
1 | Action
2 | Animation
3 | Children
(3 rows)
In this example the table public.categories
belongs to bdrglobal
replication
set and all rows are replicated across other nodes in the cluster. The sequence
owned by this table is public.categories_category_seq
.
For galloc
sequences to work there are a few pre-requisites:
- The sequence should have the same starting value in all nodes.
- The sequence kind should be changed to
galloc
.
In our example, as we have a few records on our table, we will need to restart the sequence from the last used value. So first we will need to run a DDL to change the starting value of the sequence on all BDR nodes:
ALTER SEQUENCE public.categories_category_seq START 3 RESTART;
And then change the sequence kind to galloc
:
SELECT bdr.alter_sequence_set_kind('public.categories_category_seq', 'galloc');
NOTE: Is important to run the sentences in all nodes of the cluster.
Let's start using the sequence by inserting 3 rows in table public.categories
from a BDR node called Node1
:
INSERT INTO public.categories (categoryname) VALUES ('Node1 Test 1');
INSERT INTO public.categories (categoryname) VALUES ('Node1 Test 2');
INSERT INTO public.categories (categoryname) VALUES ('Node1 Test 3');
bdrdb=# select * from public.categories;
category | categoryname
1 | Action
2 | Animation
3 | Children
4 | Node1 Test 1
5 | Node1 Test 2
6 | Node1 Test 3
(6 rows)
When a BDR node uses the sequence implicitly or explicitly by calling nextval()
,
the node will allocate 2 chunks of seq_chunk_size
: the current one, which is
already being used, plus a reserve chunk.
Notice how the first row consumed the sequence value 4
. As it was the first use
of the sequence, Node1
asked BDR global RAFT consensus for 2 chunks for the sequence
public.categories_category_seq
. These 2 chunks of values were granted to
node Node1
and together they determine a range of 2 million values,
from 4
to 2000003
.
The next consumed values were 5
and 6
, but these
values are already inside the 2 chunks the node Node1
acquired.
While the node is consuming sequences values inside the range, it does not need to ask RAFT consensus for new chunks.
More information about the current state of the sequence could be obtained by the view bdr.sequence_alloc
.
This view provides the global and latest state for the sequence, which will be used when allocating for the next chunk:
bdrdb=# select * from bdr.sequence_alloc WHERE seqid = 'public.categories_category_seq'::regclass;
seqid | seq_chunk_size | seq_allocated_up_to | seq_nallocs | seq_last_alloc | seq_relname
categories_category_seq | 1000000 | 2000003 | 2 | 2020-05-21 19:52:52.79444+00 | categories_category_seq
(1 row)
Let's now connect to any other BDR node like Node2
and try to add 3 new rows:
INSERT INTO public.categories (categoryname) VALUES ('Node2 Test 1');
INSERT INTO public.categories (categoryname) VALUES ('Node2 Test 2');
INSERT INTO public.categories (categoryname) VALUES ('Node2 Test 3');
bdrdb=# select * from public.categories;
category | categoryname
1 | Action
2 | Animation
3 | Children
4 | Node1 Test 1
5 | Node1 Test 2
6 | Node1 Test 3
2000004 | Node2 Test 1
2000005 | Node2 Test 2
2000006 | Node2 Test 3
(9 rows)
Notice how the first value inserted by node Node2
is 2000004
. So 2 things
happened:
- The 3 rows we have just inserted, should be replicated back to
Node1
and to all other nodes in the cluster. - When node
Node2
inserted the first row, it allocated 2 chunks of values for sequencepublic.categories_category_seq
from BDR RAFT consensus. Those 2 chunks would vary fromseq_allocated_up_to + 1
and represent 2 million values.
For instance, node Node2
just allocated a range of values from 2000004
to
4000003
. This can be confirmed by the global table bdr.sequence_alloc
:
bdrdb=# SELECT * FROM bdr.sequence_alloc WHERE seqid = 'public.categories_category_seq'::regclass;
seqid | seq_chunk_size | seq_allocated_up_to | seq_nallocs | seq_last_alloc
categories_category_seq | 1000000 | 4000003 | 4 | 2020-05-21 20:02:15.957835+00
(1 row)
So for sequence public.categories_category_seq
, currently we have:
To see the ranges currently assigned to given sequence on each node, use these queries:
- Node
Node1
is using range from4
to2000003
.
SELECT last_value AS range_start, log_cnt AS range_end FROM categories_category_seq WHERE ctid = '(0,2)'; -- first range
range_start | range_end
4 | 1000003
(1 row)
SELECT last_value AS range_start, log_cnt AS range_end FROM categories_category_seq WHERE ctid = '(0,3)'; -- second range
range_start | range_end
1000004 | 2000003
(1 row)
- Node
Node2
is using range from2000004
to4000003
.
SELECT last_value AS range_start, log_cnt AS range_end FROM categories_category_seq WHERE ctid = '(0,2)'; -- first range
range_start | range_end
2000004 | 3000003
(1 row)
SELECT last_value AS range_start, log_cnt AS range_end FROM categories_category_seq WHERE ctid = '(0,3)'; -- second range
range_start | range_end
3000004 | 4000003
NOTE You can't combine it to single query (like WHERE ctid IN ('(0,2)', '(0,3)')) as that will still only show first range.
The next node that tries to consume a value from this sequence will need to ask
BDR RAFT consensus, and will receive a range from 4000004
to 6000003
. The next
node will allocate a range from 6000004
to 8000003
, and so on.
When any node consumes all sequence values from the chunk it is currently using, it will perform a regular BDR RAFT consensus call again, and will get the next available chunk of 1 million values, restoring the reserved chunk size.
In our last example Node1
was using the range from 4
to 2000003
and Node2
was using range from 2000004
to 4000003
.
Let's see the situation after consuming completely the first chunk on Node1
.
INSERT INTO public.categories (categoryname) select 'Node1 Test '||s.t
FROM generate_series(4,1000004) as s(t);
bdrdb=# select * from public.categories WHERE category > 1000002 ;
category | categoryname
1000003 | Node1 Test 1000000
1000004 | Node1 Test 1000001
1000005 | Node1 Test 1000002
1000006 | Node1 Test 1000003
1000007 | Node1 Test 1000004
2000004 | Node2 Test 1
2000005 | Node2 Test 2
2000006 | Node2 Test 3
(8 rows)
After consuming the last value of the chunk (1000003
) the node performs a new
BDR consensus call and get the next available chunk for the cluster, and continue
using the reserved chunk size for that node from value 1000004
to 2000003
.
The new reserved chunk for Node1
was allocated from 4000004
to 5000003
as
we can see by running the following sentence:
bdrdb=# SELECT * FROM bdr.sequence_alloc WHERE seqid = 'public.categories_category_seq'::regclass;
seqid | seq_chunk_size | seq_allocated_up_to | seq_nallocs | seq_last_alloc
categories_category_seq | 1000000 | 5000003 | 5 | 2020-05-21 20:10:05.328365+00
(1 row)
The new situation for the sequence public.categories_category_seq
is:
-
Node1
is using range from4
to2000003
; -
Node2
is using range from2000004
to4000003
; -
Node1
is using range from4000004
to5000003
;
Through these practical examples, we studied the behavior of galloc
(Globally
Allocated) sequences. Now is time to leverage this feature against pros and cons.
The first drawback about using galloc
sequences is that, during BDR RAFT
consensus calls, there is a slight delay in getting the next value for the
sequence, because the node is communicating with other nodes to settle up the
RAFT consensus decision. But for sequences of type integer
this happens once
every 1 million values, per node, per sequence. For sequences of type bigint
,
this happens once every 1 billion values. And even being a delay it should be very
quick, so it shouldn't represent a major problem.
The second drawback is that initially all nodes should share the same sequence value.
Also, it is not hard to see how those sequence values won't be ordered along the
time, as nodes are consuming values from different ranges at the same time. This
is the third and most important drawback about using galloc
sequences.
Despite those 3 drawbacks that might not represent any problem on some clusters,
galloc
sequences offer benefits that are hopefully worthwhile:
-
Support for smaller integer data types.
-
Multiple nodes can insert values into the same table using the same sequence at the same time, regardless of whether the table is globally replicated or not, because each node operates at a different range (or chunk);
-
Range or chunk allocation is completely automatic and managed by BDR RAFT. Once defined as being of type
galloc
, such sequences don't require maintenance, even in the event of application failover. -
Because
galloc
sequences guarantees that unique values are used only by a single node in the cluster, then applications can connect to multiple nodes at the same time. This helps to balance the load between the nodes of the cluster.
Related to