DB2 10.5 with BLU Acceleration allows to create column-organized tables. They are the foundation for "actionable compression" and smart in-memory processing. I plan to cover some details in upcoming posts and already have written some articles about DB2 BLU earlier. Today I show you what happens in the DB2 catalog, the metadata, when you create a table.
I started by "db2set DB2_WORKLOAD=ANALYTICS". This is the magic knob to simplicity and performance for analytic queries. After creating a database, I connected and then I was ready to create a table.
create table hl.bludata(id int, quota decimal(5,2),sold int, comment varchar(200))
DB20000I The SQL command completed successfully.
The most important metadata for tables is accessible in the SYSCAT.TABLES catalog view. By sending the following query to DB2 a small subset of the columns is fetched for the table I just created:
select tabname,tabschema,property,compression,tableorg from syscat.tables where tabname like '%BLUDATA%'
TABNAME TABSCHEMA PROPERTY COMPRESSION TABLEORG
------------------------------ --------- -------------------------------- ----------- --------
BLUDATA HL Y C
SYN130805063137432548_BLUDATA SYSIBM YY C
2 record(s) selected.
But wait, there are two tables. The first has the name and schemaname as specified, the second is created in the schema SYSIBM and is created under the covers. It is the so-called synopsis table and its name starts with the prefix SYN followed by a timestamp and the name of the base table. Both tables are column-organized (TABLEORG=C). For both we also see a value for PROPERTY which is a character vector. The first "Y" is at position 20, indicating a column-organized table. The synopsis table also has a "Y" at position 21, officially marking it a synopsis table (see the descriptions for SYSCAT.TABLES).
The purpose of the synopsis table is to help with data organization of the base table and to aid DB2 with fast data skipping during query processing (I plan to talk about dyanmic list prefetching in a future post).
Because both tables are related to each other, another catalog table is involved, SYSCAT.TABDEP, where table or object dependencies are recorded.
select tabname,dtype,bname,bschema,btype from syscat.tabdep where tabname like '%BLUDATA%' or bname like '%BLUDATA%'
TABNAME DTYPE BNAME BSCHEMA BTYPE
----------------------------- ------ ------- ------- -----
SYN130805063137432548_BLUDATA 7 BLUDATA HL T
1 record(s) selected.
Based on the metadata the table "SYN130805063137432548_BLUDATA" is a synopsis table (DTYPE=7) and depends on the table I created (BLUDATA). But how does the synopsis table look like? I am using the command "describe table" to see what columns have been defined:
describe table SYSIBM.SYN130805063137432548_BLUDATA
Data type Column
Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
IDMIN SYSIBM INTEGER 4 0 Yes
IDMAX SYSIBM INTEGER 4 0 Yes
QUOTAMIN SYSIBM DECIMAL 5 2 Yes
QUOTAMAX SYSIBM DECIMAL 5 2 Yes
SOLDMIN SYSIBM INTEGER 4 0 Yes
SOLDMAX SYSIBM INTEGER 4 0 Yes
TSNMIN SYSIBM BIGINT 8 0 No
TSNMAX SYSIBM BIGINT 8 0 No
8 record(s) selected.
What can be seen is that for all non-character columns of my table we have a corresponding MIN and MAX column. In addition we have two BIGINT columns TSNMIN and TSNMAX. TSN stands for Tuple Sequence Number and helps to recompose a row based on individual column values. The synopsis table plays a pivotal role (pun intended) for faster query execution. But that is part of a future post...