This article examines two alternative functions in PostgreSQL, ARRAY_AGG() and STRING_AGG(), which can be used as substitutes for Oracle's COLLECT() function. We delve into the prerequisites for using these functions, explore their functionalities and usage examples, compare them to the COLLECT() function, and provide a conclusion regarding their suitability as alternatives.
Oracle's COLLECT() function is a powerful aggregation function that allows users to aggregate data into nested collections or arrays within a single query. It is commonly used to combine multiple rows of data into a single collection, making it a valuable tool for various data manipulation and reporting tasks.
Example:
In this example, we have created a table called 'emp' and inserted the below data:
CREATE TABLE emp (
empno NUMBER(4) CONSTRAINT pk_emp PRIMARY KEY,
ename VARCHAR2(10),
job VARCHAR2(9),
mgr NUMBER(4),
hiredate DATE,
sal NUMBER(7,2),
comm NUMBER(7,2),
deptno NUMBER(2)
);
The result after creating a table called 'emp' and inserting the few data:
SELECT * FROM EMP;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
7369 SMITH CLERK 7902 17-DEC-80 800 - 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7566 JONES MANAGER 7839 02-APR-81 2975 - 20
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 - 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 - 10
7788 SCOTT ANALYST 7566 09-JUL-02 3000 - 20
7839 KING PRESIDENT - 17-NOV-81 5000 - 10
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
7876 ADAMS CLERK 7788 23-MAY-02 1100 - 20
7900 JAMES CLERK 7698 03-DEC-81 950 - 30
7902 FORD ANALYST 7566 03-DEC-81 3000 - 20
7934 MILLER CLERK 7782 23-JAN-82 1300 - 10
Created a user-defined datatype and a function:
CREATE OR REPLACE TYPE t_varchar2_tab AS TABLE OF VARCHAR2(4000);
/
CREATE OR REPLACE FUNCTION tab_to_string (p_varchar2_tab IN t_varchar2_tab,
p_delimiter IN VARCHAR2 DEFAULT ',') RETURN VARCHAR2 IS
l_string VARCHAR2(32767);
BEGIN
FOR i IN p_varchar2_tab.FIRST .. p_varchar2_tab.LAST LOOP
IF i != p_varchar2_tab.FIRST THEN
l_string := l_string || p_delimiter;
END IF;
l_string := l_string || p_varchar2_tab(i);
END LOOP;
RETURN l_string;
END tab_to_string;
/
The query below shows the COLLECT function in action.
SELECT deptno,
tab_to_string(CAST(COLLECT(ename) AS t_varchar2_tab)) AS employees
FROM emp
GROUP BY deptno;
DEPTNO EMPLOYEES
10 CLARK,KING,MILLER
20 SMITH,JONES,SCOTT,ADAMS,FORD
30 ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
3 rows selected.
Here you can observe that all employees are listed accordingly to their department number separated by a comma.
When migrating from Oracle to PostgreSQL or when working with PostgreSQL as the primary DBMS, users may encounter scenarios where equivalent functions to Oracle's features are required. In the case of Oracle's COLLECT() function, which aggregates data into nested collections, users may seek alternative functions in PostgreSQL that provide similar functionality. We have array_agg() and string_agg() which provides us the above desired output.
ARRAY_AGG() is an aggregation function in PostgreSQL that allows you to aggregate values from multiple rows into a single array. It is primarily used for creating arrays as a result of grouping data or combining rows in a query. The function takes an input expression and returns an array containing all the values of that expression for each group.
Syntax and parameters of ARRAY_AGG():
ARRAY_AGG(expression) [ORDER BY [sort_expression {ASC | DESC}], [...])
The function accepts a single input parameter, which is the expression to be aggregated into an array. This expression can be a column name, a function, or any valid expression that produces a value. The result of ARRAY_AGG() is an array containing all the aggregated values.
Example of using ARRAY_AGG():
edb=# SELECT deptno, array_agg(ename) as employees from emp GROUP BY deptno;
deptno | employees
10 | {CLARK,KING,MILLER}
30 | {ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES}
20 | {SMITH,JONES,FORD,SCOTT,ADAMS}
(3 rows)
Note: Make sure to use datatypes supported by postgres when creating table which is equivalent to oracle table.
Here the above function returns an array. But the oracle collect() doesn't have the curly braces, to eliminate the curly braces we can combine array_agg() with array_to_string function, this function takes the input values and coverts them into a string.
edb=# SELECT deptno, array_to_string(array_agg(ename),',') as employees from emp GROUP BY deptno;
deptno | employees
10 | CLARK,KING,MILLER
30 | ALLEN,WARD,MARTIN,BLAKE,TURNER,JAMES
20 | SMITH,JONES,FORD,SCOTT,ADAMS
(3 rows)
But if this feel little complicated, postgresql offers another powerful functionality called 'string_agg()':
It is an aggregate function that concatenates a list of strings and places a separator between them. The function does not add the separator at the end of the string.
Syntax and parameters of the STRING_AGG():
STRING_AGG ( expression, separator [order_by_clause] )
The STRING_AGG() function accepts two arguments and an optional ORDER BY clause.
- expression is any valid expression that can resolve to a character string. If you use other types than character string type, you need to explicitly cast these values of that type to the character string type.
- separator is the separator for concatenated strings.
- The order_by_clause is an optional clause that specifies the order of concatenated results. It has the following form:
ORDER BY expression1 {ASC | DESC}, [...]
The STRING_AGG() is similar to the ARRAY_AGG() function except for the return type. The return type of the STRING_AGG() function is the string while the return type of the ARRAY_AGG() function is the array.
Example of using string_agg():
edb=# SELECT * FROM EMP;
empno | ename | job | mgr | hiredate | sal | comm | deptno
7369 | SMITH | CLERK | 7902 | 17-DEC-80 00:00:00 | 800.00 | | 20
7499 | ALLEN | SALESMAN | 7698 | 20-FEB-81 00:00:00 | 1600.00 | 300.00 | 30
7521 | WARD | SALESMAN | 7698 | 22-FEB-81 00:00:00 | 1250.00 | 500.00 | 30
7566 | JONES | MANAGER | 7839 | 02-APR-81 00:00:00 | 2975.00 | | 20
7654 | MARTIN | SALESMAN | 7698 | 28-SEP-81 00:00:00 | 1250.00 | 1400.00 | 30
7698 | BLAKE | MANAGER | 7839 | 01-MAY-81 00:00:00 | 2850.00 | | 30
7782 | CLARK | MANAGER | 7839 | 09-JUN-81 00:00:00 | 2450.00 | | 10
7839 | KING | PRESIDENT | | 17-NOV-81 00:00:00 | 5000.00 | | 10
7844 | TURNER | SALESMAN | 7698 | 08-SEP-81 00:00:00 | 1500.00 | 0.00 | 30
7900 | JAMES | CLERK | 7698 | 03-DEC-81 00:00:00 | 950.00 | | 30
7902 | FORD | ANALYST | 7566 | 03-DEC-81 00:00:00 | 3000.00 | | 20
7934 | MILLER | CLERK | 7782 | 23-JAN-82 00:00:00 | 1300.00 | | 10
7788 | SCOTT | ANALYST | 7566 | 20-MAR-87 00:00:00 | 3000.00 | | 20
7876 | ADAMS | CLERK | 7788 | 23-APR-87 00:00:00 | 1100.00 | | 20
(14 rows)
edb=# SELECT deptno, STRING_AGG(ename, ', ') AS employees
edb-# FROM emp
edb-# GROUP BY deptno;
deptno | employees
10 | CLARK, KING, MILLER
30 | ALLEN, WARD, MARTIN, BLAKE, TURNER, JAMES
20 | SMITH, JONES, FORD, SCOTT, ADAMS
(3 rows)
Here you can observe that we got the same result which we got when used collect() function with cast in oracle.
In summary, the need for alternative functions in PostgreSQL such as array_agg() and string_agg() arises from the differences in syntax, features, and the desire for compatibility and portability across different DBMS. Having PostgreSQL-specific alternatives to Oracle's functions allows users to achieve similar results, facilitates migration efforts, and takes advantage of the open-source ecosystem surrounding PostgreSQL.