Performing ALTER TABLE ALTER COLUMN with minimal Downtime

Lohit Gupta
Lohit Gupta

Problem

To modify the column datatype of a Postgres table, we normally use below command:

ALTER TABLE tableName
ALTER COLUMN columnName TYPE newDataType [USING (expression)];

The USING clause is optional and it is needed if explicit or a custom typecasting mechanism is needed to change the column datatype.

When you execute this command, internally the database will try to rewrite the entire table with the new column type at a new block space which creates an ACCESS EXCLUSIVE lock on the entire table. If the table size is very large, the duration of the exclusive lock on the table will also be considerably large and it could require a downtime on the application's side to perform this change.

For highly critical workloads where there is no possibility of such large downtimes, this activity needs a different approach.

Solution

The main problem which causes the large downtime with this command is that the database needs an Exclusive lock on the table while performing the full table rewrite with the new column datatype.

So, instead of using the ALTER COLUMN statement, we can change our strategy and use ALTER TABLE ... ADD COLUMN to add a column with the new datatype. This will perform row level updates on the new column so that all the data from the old column is transferred to the new one. This change can be done on a live table, and, using a trigger, we can capture live changes and transfer them to the new column as well.

After all the live changes have been captured in the new column and verified, we then rename the columns and drop the old column inside a transaction, which requires minimal downtime.

Using this approach gives us the liberty of modifying the table structure on a live table with minimal downtime.

Below is an example of how this approach is used to update the column datatype on a live table.

First, we will create a live table and insert some data in it -

# Creating a live table -
CREATE TABLE assets (
id serial PRIMARY KEY,
name TEXT NOT NULL,
scription TEXT,
location_id bigint,
location TEXT,
acquired_date DATE NOT NULL
);

# Inserting data -

INSERT INTO assets(name,location_id,location,acquired_date)
VALUES('Server','100010','Server room','2023-07-01'),
('UPS','100020','Server room','2023-07-01')
....
;

Now, let's try to update the datatype for location_id column from bigint to numeric. First, we will add a new column on the table with numeric datatype and create an index on the old column so the row updates become faster.

ALTER TABLE assets ADD COLUMN location_id_new numeric(22,0);

CREATE INDEX CONCURRENTLY idx_assets_location_id on assets(location_id) where location_id is not null;

NOTE: it is important to create a new column NULL-able, without a default value. Otherwise, PG will try to populate the new column with default values to satisfy the NOT NULL constraint and we will face the initial problem of prolonged exclusive locking of the table and a full table re-write. If the NOT NULL constraint is needed on the column, it can be added after the existing data is updated and column switchover is complete.

We will now create a Trigger and a Trigger function, which will be used to capture live updates and also be used to transfer data to the new column.

CREATE OR REPLACE FUNCTION copy_location_id()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
BEGIN
NEW.location_id_new = NEW.location_id;
RETURN NEW;
END;
$$
;

CREATE TRIGGER trg_location BEFORE INSERT OR UPDATE
ON assets
FOR EACH ROW
EXECUTE PROCEDURE copy_location_id()
;

After the objects are created, we will perform an analyze on the table so that the query planner uses updated stats for this table.

analyze assets;

Now, we are ready to capture live updates on the table. Any insert or update statement performed on the table will trigger the copy_location_id() function, which will add data from the old column to the new column. Any delete statement can be ignored as the row will be deleted from the table.

This operation will only require a row level lock for the row getting updated, as compared to the full table lock while using the ALTER TABLE ... ALTER COLUMN command.

For very large tables, the row level updates can further be divided into several batches, and, using a loop, all rows can be updated and data can be added in the new column considering the time needed for 1 batch. This further reduces the time needed for the row level lock on the table and the table is available for other operations while we are adding data to the new column.

Below is a script that performs this operation for a batch of 1000 rows at a time with a 1 second gap between the updates, until all the rows are updated and the new column has all the data of the old column -

#!/bin/bash
# Script to update column in batches
batch=1
PSQL='/usr/bin/psql -Atq -d assetdb -U postgres'
count=`$PSQL -c 'select count(1) from assets where location_id is not null and location_id_new is null;'`
echo "Execution start time: `date +'%x %X'`"
while [ $count != 0 ];
do
echo "Remaining Rows = $count"
echo "Executing Batch $batch update..."

`$PSQL -c "update assets set location_id=location_id where id in (select id from assets where location_id is not null and location_id_new is null limit 1000);"`
echo "Batch $batch complete. Waiting..."
sleep 1
let batch=$batch+1
count=`$PSQL -c 'select count(1) from assets where location_id is not null and location_id_new is null;'`
done

echo "Execution end time: `date +'%x %X'`"
echo "All rows updated. Exiting..."

It is recommended to perform a vacuum analyze on the table after the data is added in the new column, which will take care of the bloat generated while running the UPDATE statements. The autovacuum process can also handle the bloat if it is configured correctly.

Once the execution of this script is complete, we can verify that all data from the old column has been added to the new column using the below command. This should give an output as 0, which shows that all the rows have been transferred to the new column.

select count(1) from assets where location_id is not null and location_id_new is null;'

Now that we have transferred all the row data in the new column and the count is verified, we can proceed with the final step, which is to perform the column switchover. This includes dropping the trigger, renaming and dropping the old column, and renaming the new column with the actual name.

If there are any Foreign keys associated with the column, the foreign key constraints have to be dropped from the old column and added to the new column as well. This whole step has to be done in a single transaction, so that there is no chance of any new rows inserting / updating the table while this is being executed.

Afterwards, we will drop the trigger function used for the migration and perform a final analyze on the table. If there is a foreign key associated with the column, the foreign key constraint has to be validated as well. The VALIDATE command acquires a SHARE UPDATE EXCLUSIVE lock on the table and it does not block any read / write statement on the table.

begin work;
lock table assets in exclusive mode;
-- if any foreign keys are present, those need to be handled at this time as well
-- ALTER TABLE assets DROP CONSTRAINT fk_location;
drop trigger tr_location on assets;
alter table assets rename column location_id to location_id_old;
alter table assets rename column location_id_new to location_id;
alter table assets drop column location_id_old;
-- ALTER TABLE assets ADD CONSTRAINT fk_location FOREIGN KEY (location_id) REFERENCES location (id) NOT VALID;
-- even though it's NOT VALID it will enforce that any new rows added obey the constraint, 
-- but will avoid prolonged time needed to validate each row against the referencing table.
commit work;

drop function copy_location_id();
analyze assets;

-- ALTER TABLE assets VALIDATE constraint fk_location;
-- above statement will not require exclusive lock regardless of time taken to complete.
-- analyze location;

After the final step is complete, we have now successfully updated the column datatype. The procedure is a bit longer than just running the ALTER TABLE ... ALTER COLUMN statement, but it requires minimal locks and this has been successfully tested on a live environment.

Was this article helpful?

0 out of 0 found this helpful