CTE materialization default behavior changed in PostgreSQL 12

Afrah Razzak
Afrah Razzak

Introduction

With the release of PostgreSQL 12, a significant change was introduced in the handling of Common Table Expressions (CTEs). Prior to this version, PostgreSQL always materialized the result of a CTE, treating it as an optimization fence. This behavior ensured that the output of a CTE was computed and stored before any further operations were applied. However, this approach was not always optimal, especially for non-recursive and side-effect-free CTEs. PostgreSQL 12 brings a new approach to handling such CTEs, offering improved performance and flexibility.

Historical Behavior in PostgreSQL

In versions prior to PostgreSQL 12, CTEs were always materialized. This means that when a CTE was defined, PostgreSQL would compute the result set and store it temporarily. This behavior served as an optimization fence, preventing any further optimizations or modifications from being pushed down into the CTE query.

Example in PostgreSQL 11

Consider the following test table and query:

CREATE TABLE cte_test (
    id int,
    name varchar
);

INSERT INTO cte_test
    VALUES (generate_series(1, 500000), 'test' || generate_series(1, 500000));

WITH cte_results AS (
    SELECT
        *
    FROM
        cte_test
)
SELECT
    *
FROM
    cte_results
WHERE
    id = 100;

 id  |  name
-----+---------
 100 | test100
(1 row)

EXPLAIN ANALYZE WITH cte_results AS (
    SELECT
        *
    FROM
        cte_test
)
SELECT
    *
FROM
    cte_results
WHERE
    id = 100;

Following is the execution plan of the query:

                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 CTE Scan on cte_results  (cost=6135.81..13859.63 rows=1716 width=36) (actual time=0.030..100.523 rows=1 loops=1)
   Filter: (id = 100)
   Rows Removed by Filter: 499999
   CTE cte_results
     ->  Seq Scan on cte_test  (cost=0.00..6135.81 rows=343281 width=36) (actual time=0.009..31.349 rows=500000 loops=1)
 Planning Time: 0.049 ms
 Execution Time: 101.772 ms
(7 rows)

Here, PostgreSQL scans the entire table cte_test, stores the results in the temporary CTE cte_results, and then filters the rows based on the condition id = 100. This means a less efficient query execution, especially for large datasets.

New Behavior in PostgreSQL 12

In PostgreSQL 12, the handling of CTEs has been refined. For non-recursive and side-effect-free CTEs, PostgreSQL now defaults to inlining the CTE into the outer query if it is used only once. This means that the CTE is no longer treated as an optimization fence, allowing the query planner to push down restrictions and optimizations into the CTE query itself.

Example in PostgreSQL 12

The same query executed in PostgreSQL 12:

EXPLAIN ANALYZE WITH cte_results AS (
    SELECT
        *
    FROM
        cte_test
)
SELECT
    *
FROM
    cte_results
WHERE
    id = 100;

The execution plan now shows:

                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..6398.73 rows=1716 width=36) (actual time=0.202..33.656 rows=1 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   ->  Parallel Seq Scan on cte_test  (cost=0.00..5227.12 rows=1009 width=36) (actual time=13.919..29.558 rows=0 loops=2)
         Filter: (id = 100)
         Rows Removed by Filter: 250000
 Planning Time: 0.053 ms
 Execution Time: 33.670 ms
(8 rows)

Here, PostgreSQL does not materialize the CTE cte_results. Instead, it performs a parallel sequential scan on the table cte_test, directly applying the filter id = 100. This change can significantly reduce execution time and resource usage.

Benefits of the New Approach

  • Improved Performance: By inlining non-recursive, side-effect-free CTEs, PostgreSQL can apply optimizations more effectively, such as pushing down filters or using indexes, which can lead to faster query execution.

  • Avoiding Redundant Computation: In cases where a CTE is referenced multiple times, materialization can avoid redundant computation. However, if a CTE is only used once, materializing it may not be necessary and can be avoided with the new behavior.

  • Flexibility for Complex Queries: Users now have more control over how CTEs are handled. They can enforce the old behavior using MATERIALIZED or force inlining with NOT MATERIALIZED.

Controlling the Behavior

While the new default behavior is generally beneficial, there are scenarios where materializing a CTE might still be desired, such as preventing certain optimizations or managing complex dependencies. PostgreSQL 12 introduces two keywords to control this behavior:

  • MATERIALIZED: Forces the CTE to be materialized.
  • NOT MATERIALIZED: Forces the CTE to be inlined, removing the optimization fence.

Example:

EXPLAIN ANALYZE WITH cte_results AS MATERIALIZED (
    SELECT
        *
    FROM
        cte_test
)
SELECT
    *
FROM
    cte_results
WHERE
    id = 100;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 CTE Scan on cte_results  (cost=7703.00..18953.00 rows=2500 width=36) (actual time=0.028..104.276 rows=1 loops=1)
   Filter: (id = 100)
   Rows Removed by Filter: 499999
   CTE cte_results
     ->  Seq Scan on cte_test  (cost=0.00..7703.00 rows=500000 width=14) (actual time=0.007..34.537 rows=500000 loops=1)
 Planning Time: 0.045 ms
 Execution Time: 105.531 ms
(7 rows)

Conclusion

The change in default behavior for CTEs in PostgreSQL 12 is a significant enhancement that improves performance and provides greater flexibility in query planning. By default, non-recursive, side-effect-free CTEs are now inlined, allowing for more efficient query execution. However, users still retain the ability to control this behavior using MATERIALIZED and NOT MATERIALIZED.

This change is part of PostgreSQL's ongoing efforts to optimize and enhance query performance, making it a more powerful tool for data management and analysis. For developers and database administrators, understanding these changes is crucial for writing efficient SQL queries and optimizing database performance.

Was this article helpful?

0 out of 0 found this helpful