Friday, April 27, 2018

Db2: CTE and CONNECT BY - two kinds of recursion

Writing recursive SQL (based on M.C.Escher by Henrik Loeser)
Writing recursive SQL
Recently, I gave a talk on SQL Recursion. One of the cool features in Db2 that I (re-)discovered for that presentation is that there are two different forms of syntax for recursive queries. One is the Common Table Expression (CTE), another the CONNECT BY syntax. Let me introduce you to the two and show you a sample query in both styles.

If you had to write recursive queries, e.g., to create a bill of material or to find dependencies in data, you would typically use a SELECT statement with a common table expression (CTE). CTEs are easily spotted by starting with the keyword WITH. Some form of temporary table is defined, then the initial data (seed) is specified followed by UNION ALL and the SELECT for the recursive part. A regular fullselect is then applied to the result of that computation.

The CONNECT BY provides another syntax for hiearachical and recursive queries. It was first introduced by Oracle and made its way into Db2 as part of the SQL compatibility features. It allows to write a, more or less, regular SELECT statement followed by a START WITH clause to define the initial data (see) and a CONNECT BY clause to specify the recursion. There are some additional keywords to access data from the prior step, the root ancestor or the (string) path from the root to a node. It is quite powerful and often shorter than a CTE. The only drawback is that you have to enable it in the Db2 compatibility vector.

The following two queries return the same result. It is the organizational structure of the departments in the SAMPLE database. The first query uses CONNECT BY and is directly taken out of the Db2 documentation.


SELECT LEVEL, CAST(SPACE((LEVEL - 1) * 4) || '/' || DEPTNAME
       AS VARCHAR(40)) AS DEPTNAME
FROM DEPARTMENT
     START WITH DEPTNO = 'A00'
     CONNECT BY NOCYCLE PRIOR DEPTNO = ADMRDEPT

The second query is my version of the above with a CTE. As you can see, it is longer. Readability depends on what you are used to.

WITH tdep(level, deptname, deptno) as (
    SELECT 1, CAST( DEPTNAME AS VARCHAR(40)) AS DEPTNAME, deptno
    FROM department 
    WHERE DEPTNO = 'A00'
    UNION ALL
    SELECT t.LEVEL+1, CAST(SPACE(t.LEVEL  * 4) || '/' || d.DEPTNAME
       AS VARCHAR(40)) AS DEPTNAME, d.deptno
    FROM DEPARTMENT d, tdep t
    WHERE d.admrdept=t.deptno and d.deptno<>'A00')
SELECT level, deptname
FROM tdep 


If you only knew one style of recursive query then it is time to try out the other. Both queries above work with the SAMPLE database. If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn.

Tuesday, April 24, 2018

Automated, regular database jobs with IBM Cloud Functions (and Db2)

IBM Cloud Functions and Db2
Yesterday, I blogged about the latest tutorial I wrote. The tutorial discusses how to combine serverless and Cloud Foundry for data retrieval and analytics. That scenario came up when I looked into regularly downloading GitHub traffic statistics for improved usage insights. What I needed was a mechanism to execute small Python script on a daily or weekly basis. After looking into some possible solutions, IBM Cloud Functions was the clear winner. In this blog, I am going to discuss how simple it is to implement some regular, automated activities, such as maintenance jobs for a cloud database.

Monday, April 23, 2018

Use Db2 and IBM Cloud to analyze GitHub traffic data (tutorial)

Architecture: GitHub Traffic Analytics
In a new solution tutorial, I show you how to automatically retrieve and store GitHub traffic data the serverless way with IBM Cloud Functions and Db2. The data can then be analyzed via a Web app deployed to Cloud Foundry on IBM Cloud. The app is secured with App ID using OpenID Connect. The new service Dynamic Dashboard Embedded provides visualization of the views and clones of GitHub repositories.

Friday, April 13, 2018

IBM Cloud and Watson security questionnaires

If you are reading my blog regularly, you are aware of the entries on cloud security. Today, I want to point you to some newer reading material at the Cloud Security Alliance. They now have up-to-date self-assessment security questionnaires for the IBM Cloud.

The three available security documents cover IBM Cloud Infrastructure (IaaS), IBM Cloud Platform (PaaS), and the IBM Watson Services with their cognitive / AI capabilities. Using the provided material, you can learn about application security, audits, compliance, how business continuity and operational resilience are handled, procedures for change control, data and data center security, IAM (identify and access management) and much more. The answers include many links to references, documentation and other valuable information. Just for that it is worth going over the security questionnaires for IBM Cloud.

If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn.