Monday, August 5, 2013

DB2 BLU: A look at the catalog tables and the synopsis table

DB2 10.5 with BLU Acceleration allows to create column-organized tables. They are the foundation for "actionable compression" and smart in-memory processing. I plan to cover some details in upcoming posts and already have written some articles about DB2 BLU earlier. Today I show you what happens in the DB2 catalog, the metadata, when you create a table.

I started by "db2set DB2_WORKLOAD=ANALYTICS". This is the magic knob to simplicity and performance for analytic queries. After creating a database, I connected and then I was ready to create a table.

create table hl.bludata(id int, quota decimal(5,2),sold int, comment varchar(200))
DB20000I  The SQL command completed successfully.


The most important metadata for tables is accessible in the SYSCAT.TABLES catalog view. By sending the following query to DB2 a small subset of the columns is fetched for the table I just created:

select tabname,tabschema,property,compression,tableorg from syscat.tables where tabname like '%BLUDATA%'

TABNAME                        TABSCHEMA PROPERTY                         COMPRESSION TABLEORG
------------------------------ --------- -------------------------------- ----------- --------
BLUDATA                        HL                            Y                         C      
SYN130805063137432548_BLUDATA  SYSIBM                        YY                        C      

  2 record(s) selected.


But wait, there are two tables. The first has the name and schemaname as specified, the second is created in the schema SYSIBM and is created under the covers. It is the so-called synopsis table and its name starts with the prefix SYN followed by a timestamp and the name of the base table. Both tables are column-organized (TABLEORG=C). For both we also see a value for PROPERTY which is a character vector. The first "Y" is at position 20, indicating a column-organized table. The synopsis table also has a "Y" at position 21, officially marking it a synopsis table (see the descriptions for SYSCAT.TABLES).

The purpose of the synopsis table is to help with data organization of the base table and to aid DB2 with fast data skipping during query processing (I plan to talk about dyanmic list prefetching in a future post).

Because both tables are related to each other, another catalog table is involved, SYSCAT.TABDEP, where table or object dependencies are recorded.

select tabname,dtype,bname,bschema,btype from syscat.tabdep where tabname like '%BLUDATA%' or bname like '%BLUDATA%'

TABNAME                       DTYPE  BNAME   BSCHEMA BTYPE
----------------------------- ------ ------- ------- -----
SYN130805063137432548_BLUDATA 7      BLUDATA HL      T   

  1 record(s) selected.



Based on the metadata the table "SYN130805063137432548_BLUDATA" is a synopsis table (DTYPE=7) and depends on the table I created (BLUDATA). But how does the synopsis table look like? I am using the command "describe table" to see what columns have been defined:

describe table SYSIBM.SYN130805063137432548_BLUDATA

                                Data type                     Column
Column name                     schema    Data type name      Length     Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
IDMIN                           SYSIBM    INTEGER                      4     0 Yes  
IDMAX                           SYSIBM    INTEGER                      4     0 Yes  
QUOTAMIN                        SYSIBM    DECIMAL                      5     2 Yes  
QUOTAMAX                        SYSIBM    DECIMAL                      5     2 Yes  
SOLDMIN                         SYSIBM    INTEGER                      4     0 Yes  
SOLDMAX                         SYSIBM    INTEGER                      4     0 Yes  
TSNMIN                          SYSIBM    BIGINT                       8     0 No   
TSNMAX                          SYSIBM    BIGINT                       8     0 No   

  8 record(s) selected.


What can be seen is that for all non-character columns of my table we have a corresponding MIN and MAX column. In addition we have two BIGINT columns TSNMIN and TSNMAX. TSN stands for Tuple Sequence Number and helps to recompose a row based on individual column values. The synopsis table plays a pivotal role (pun intended) for faster query execution. But that is part of a future post...