Friday, August 9, 2013

DB2 BLU: Some additional metadata, the synopsis table, and page map indexes

Today I continue my journey into the metadata of DB2 with BLU Acceleration which I recently started. I had created a table and looked at the catalog and found a synopsis table. The new column-organized tables for in-memory analytics are supposed to be simple to use, no indexes to create. No indexes? Let me check...

First on today's list is a look at SYCAT.INDEXES, the catalog view holding index information.

select indschema, indname, tabname, colnames, indextype from syscat.indexes where tabname like '%BLUDATA%'

INDSCHEMA     INDNAME               TABNAME                       COLNAMES                           INDEXTYPE
------------- --------------------- ----------------------------- ---------------------------------- ---------
SYSIBM        SQL130809032717072049 BLUDATA                       +SQLNOTAPPLICABLE+SQLNOTAPPLICABLE CPMA


  2 record(s) selected.
What is interesting to find, is that two indexes have been created. One is on the base table, one on the synopsis table. They are of a new index type CPMA which stands for Cde Page MAp (and CDE for Columnar Data Engine). I found this in the description of SYSCAT.INDEXES and by dumping the index data using db2dart (see below).

Next I wanted to see how the synopsis table changed by inserting data. The following small script generated my data file with 10000 rows.


for i in {1..10000}
   echo "$i,0.8,$i,''" | tee -a bludata.csv

Then I was ready for loading the data. Here are some snippets of the output.

load from bludata.csv of del replace into hl.bludata
SQL3500W  The utility is beginning the "ANALYZE" phase at time "08/09/2013
SQL3500W  The utility is beginning the "LOAD" phase at time "08/09/2013
SQL3110N  The utility has completed processing.  "10000" rows were read from
the input file.

SQL3500W  The utility is beginning the "BUILD" phase at time "08/09/2013

First the data is analyzed to optimize data layout and the column-specific compression. The comes the actual load phase followed by a build phase (for maintaining the internal index). With an empty table the synopsis table was empty, too. Now, with 10000 rows loaded, we can find some entries:
select * from sysibm.SYN130809032716942015_BLUDATA

IDMIN       IDMAX       QUOTAMIN QUOTAMAX SOLDMIN     SOLDMAX     TSNMIN               TSNMAX             
----------- ----------- -------- -------- ----------- ----------- -------------------- --------------------
          1        2328     0.80     0.80           1        2328                    0                 1023
        754        2810     0.80     0.80         754        2810                 1024                 2047
       1296        3563     0.80     0.80        1296        3563                 2048                 3071
       1567        4187     0.80     0.80        1567        4187                 3072                 4095
       3895        6243     0.80     0.80        3895        6243                 4096                 5119
       4790        6996     0.80     0.80        4790        6996                 5120                 6143
       5061        7387     0.80     0.80        5061        7387                 6144                 7167
       5694        8283     0.80     0.80        5694        8283                 7168                 8191
       7750       10000     0.80     0.80        7750       10000                 8192                 9215
       8984        9767     0.80     0.80        8984        9767                 9216                 9999

  10 record(s) selected.

For every 1024 rows there is an entry in the synopsis table. Not much surprising stuff can be found as my test data only has variations in the ID and SOLD columns.

My last action for today was to look at the index data as stored on disk. By invoking db2dart with the DI option it is possible to dump formatted index data. Both indexes only have 3 pages each and you can try using db2dart yourself to see the output. What is included in each report as part of the index metadata is the index type which is as follows:
Index Type = NonUnique Bi-directional Large RID CDE Page Map Split-High No exclusive lock optimzation RID-list Compression Variable Slot Directory Prefix Compression
Seeing it labeled as Page Map index verifies that I dumped the right index. That concludes my Friday "spare time", have a nice weekend. And if you have time to read, here is the link to my other DB2 BLU-related posts.