Friday, April 27, 2018

Db2: CTE and CONNECT BY - two kinds of recursion

Writing recursive SQL (based on M.C.Escher by Henrik Loeser)
Writing recursive SQL
Recently, I gave a talk on SQL Recursion. One of the cool features in Db2 that I (re-)discovered for that presentation is that there are two different forms of syntax for recursive queries. One is the Common Table Expression (CTE), another the CONNECT BY syntax. Let me introduce you to the two and show you a sample query in both styles.

If you had to write recursive queries, e.g., to create a bill of material or to find dependencies in data, you would typically use a SELECT statement with a common table expression (CTE). CTEs are easily spotted by starting with the keyword WITH. Some form of temporary table is defined, then the initial data (seed) is specified followed by UNION ALL and the SELECT for the recursive part. A regular fullselect is then applied to the result of that computation.

The CONNECT BY provides another syntax for hiearachical and recursive queries. It was first introduced by Oracle and made its way into Db2 as part of the SQL compatibility features. It allows to write a, more or less, regular SELECT statement followed by a START WITH clause to define the initial data (see) and a CONNECT BY clause to specify the recursion. There are some additional keywords to access data from the prior step, the root ancestor or the (string) path from the root to a node. It is quite powerful and often shorter than a CTE. The only drawback is that you have to enable it in the Db2 compatibility vector.

The following two queries return the same result. It is the organizational structure of the departments in the SAMPLE database. The first query uses CONNECT BY and is directly taken out of the Db2 documentation.


SELECT LEVEL, CAST(SPACE((LEVEL - 1) * 4) || '/' || DEPTNAME
       AS VARCHAR(40)) AS DEPTNAME
FROM DEPARTMENT
     START WITH DEPTNO = 'A00'
     CONNECT BY NOCYCLE PRIOR DEPTNO = ADMRDEPT

The second query is my version of the above with a CTE. As you can see, it is longer. Readability depends on what you are used to.

WITH tdep(level, deptname, deptno) as (
    SELECT 1, CAST( DEPTNAME AS VARCHAR(40)) AS DEPTNAME, deptno
    FROM department 
    WHERE DEPTNO = 'A00'
    UNION ALL
    SELECT t.LEVEL+1, CAST(SPACE(t.LEVEL  * 4) || '/' || d.DEPTNAME
       AS VARCHAR(40)) AS DEPTNAME, d.deptno
    FROM DEPARTMENT d, tdep t
    WHERE d.admrdept=t.deptno and d.deptno<>'A00')
SELECT level, deptname
FROM tdep 


If you only knew one style of recursive query then it is time to try out the other. Both queries above work with the SAMPLE database. If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn.