Data Henrik: August 2013

Friday, August 23, 2013

DB2 10.5 Fix Pack has been released

The first fix pack for DB2 10.5 has been released, it is available on the DB2 fix pack page (FPs by version). The DB2 Information Center has been updated and you find the so-called Fix Pack Summary here. There are not many entries in what is new, but there are many improvements "under the cover". If you have never tried DB2 10.5 you can download a DB2 evaluation copy - it is on the new code level 10.5.0.1 as well.

Thursday, August 22, 2013

Bananas, pants, and DB2 with BLU Acceleration

Have you ever fed a banana to a young child? It would have been banana puree, i.e., mashed bananas (there are even instructions for that!). When you think about it, after feeding the banana puree you have a happy child - and a fork, a spoon, a plate and maybe more to wash. As a grownup or older child you would just peel and eat the banana, no dishes to do. Isn't that smart?

Source Wikipedia (Ed Yourdan / Themightyquill)

Now imagine that the bananas taste well and you eat a lot. Babies grow, you "grow". For the babies and children you have to buy new clothes because eventually, even with tricks like adjustable waist bands etc., the pants and shirts don't fit anymore. As a grownup you can react to "growing in the middle" and keep a healthy diet, do some workout, and keep your pants. No money to spend. Isn't that smart?

In database systems like DB2 avoiding to do the dishes would translate to avoid accessing data, to avoid I/O operations. DB2 with BLU Acceleration uses data skipping to only access those portions of the data where comparisons or further processing is necessary. No sweat, no dirty dishes. Even if the data needs to be processed, many operations can be done on the compressed, encoded version of it. Think of it as using a set of plates that remain in their original, shrink-wrapped packaging. That makes "clean-up" faster, a smart move. :)

What do you do when you have more data than it fits in memory? A baby would need larger clothes, some buy bigger machines. DB2 BLU uses decades old, but highly improved technologies like a bufferpool and algorithms to keep the most needed data in memory (or is it in-memory?) and to bring it in from disk with advanced prefetching before needed. That would be like telling your extra pounds to stay at home, so that your existing pants fit well. Maybe you could even wear a smaller size or keep your current pants longer. That would be smart and would save many bucks. DB2 BLU even goes further by tailor-made pants, i.e., laying out the data to make the best use of CPU registers and to improve performance even more. Tailor-made pants with expandable waist bands, "haute couture" for your data at no extra cost.

My older BLU-related posts can be found using the blu label.

Wednesday, August 21, 2013

Video: Interview with IBM Fellow Tim Vincent about DB2 with BLU Acceleration

This interesting video showing Paul Z. (you can ignore his real last name... :) interviewing Tim Vincent about BLU Acceleration in DB2 10.5 was posted yesterday. In the interview Tim explains the key concepts of BLU Acceleration, the "7 big ideas", and explains how the technology works.

My posts about BLU Acceleration can be found here.

Wednesday, August 14, 2013

German DB2 event: DB2 Aktuell 2013 - Last days for discount

In October I will be one of the speakers (again) for the German DB2 event named "DB2 Aktuell". The location changes every year and this time it will be in the Stuttgart area, in Ehningen. The 2-day conference is in October, but tomorrow is the last day for receiving discounts on the registration.

Tuesday, August 13, 2013

DB2 Quiz: Where does this output belong to?

Here is a small, light DB2 Summer quiz for. The following is part of the output for a DB2 command.

V:10 R:5 M:0 F:0 I:0 SB:0

Where does it belong to? What is it? Any takers?

Monday, August 12, 2013

Now available: DB2 BLU Acceleration Kit for Trial

A new easy way for installing and trying out DB2 with BLU Acceleration is available: The DB2 BLU Acceleration Kit for Trial. The kit is a software image that automatically installs a Linux operating system, DB2 and some tools into an x86_64 machine - the machine can be real HW or a virtual machine. The only input needed is for passwords, then the kit installs everything starting with the OS.

You can read more of my DB2 BLU-related posts by following this link.

Friday, August 9, 2013

DB2 BLU: Some additional metadata, the synopsis table, and page map indexes

Today I continue my journey into the metadata of DB2 with BLU Acceleration which I recently started. I had created a table and looked at the catalog and found a synopsis table. The new column-organized tables for in-memory analytics are supposed to be simple to use, no indexes to create. No indexes? Let me check...

First on today's list is a look at SYCAT.INDEXES, the catalog view holding index information.

select indschema, indname, tabname, colnames, indextype from syscat.indexes where tabname like '%BLUDATA%'

INDSCHEMA     INDNAME               TABNAME                       COLNAMES                           INDEXTYPE
------------- --------------------- ----------------------------- ---------------------------------- ---------
SYSIBM        SQL130809032717072049 BLUDATA       +SQLNOTAPPLICABLE+SQLNOTAPPLICABLE CPMA
SYSIBM    SQL130809032717232599 SYN130809032716942015_BLUDATA +SQLNOTAPPLICABLE+SQLNOTAPPLICABLE CPMA

2 record(s) selected.
What is interesting to find, is that two indexes have been created. One is on the base table, one on the synopsis table. They are of a new index type CPMA which stands for Cde Page MAp (and CDE for Columnar Data Engine). I found this in the description of SYSCAT.INDEXES and by dumping the index data using db2dart (see below).

Next I wanted to see how the synopsis table changed by inserting data. The following small script generated my data file with 10000 rows.

#!/bin/bash

for i in {1..10000}
do
   echo "$i,0.8,$i,''" | tee -a bludata.csv
done

Then I was ready for loading the data. Here are some snippets of the output.

load from bludata.csv of del replace into hl.bludata
...
SQL3500W The utility is beginning the "ANALYZE" phase at time "08/09/2013
05:06:47.313324".
...
SQL3500W The utility is beginning the "LOAD" phase at time "08/09/2013
05:06:48.215545".
...
SQL3110N The utility has completed processing. "10000" rows were read from
the input file.
...
SQL3500W The utility is beginning the "BUILD" phase at time "08/09/2013
05:06:48.802651".

First the data is analyzed to optimize data layout and the column-specific compression. The comes the actual load phase followed by a build phase (for maintaining the internal index). With an empty table the synopsis table was empty, too. Now, with 10000 rows loaded, we can find some entries:

select * from sysibm.SYN130809032716942015_BLUDATA

IDMIN       IDMAX       QUOTAMIN QUOTAMAX SOLDMIN     SOLDMAX     TSNMIN               TSNMAX
----------- ----------- -------- -------- ----------- ----------- -------------------- --------------------
          1        2328     0.80     0.80           1        2328                    0                 1023
        754        2810     0.80     0.80         754        2810                 1024                 2047
       1296        3563     0.80     0.80        1296        3563                 2048                 3071
       1567        4187     0.80     0.80        1567        4187                 3072                 4095
       3895        6243     0.80     0.80        3895        6243                 4096                 5119
       4790        6996     0.80     0.80        4790        6996                 5120                 6143
       5061        7387     0.80     0.80        5061        7387                 6144                 7167
       5694        8283     0.80     0.80        5694        8283                 7168                 8191
       7750       10000     0.80     0.80        7750       10000                 8192                 9215
       8984        9767     0.80     0.80        8984        9767                 9216                 9999

10 record(s) selected.

For every 1024 rows there is an entry in the synopsis table. Not much surprising stuff can be found as my test data only has variations in the ID and SOLD columns.

My last action for today was to look at the index data as stored on disk. By invoking db2dart with the DI option it is possible to dump formatted index data. Both indexes only have 3 pages each and you can try using db2dart yourself to see the output. What is included in each report as part of the index metadata is the index type which is as follows:
Index Type = NonUnique Bi-directional Large RID CDE Page Map Split-High No exclusive lock optimzation RID-list Compression Variable Slot Directory Prefix Compression
Seeing it labeled as Page Map index verifies that I dumped the right index. That concludes my Friday "spare time", have a nice weekend. And if you have time to read, here is the link to my other DB2 BLU-related posts.

Wednesday, August 7, 2013

IOQWT - A typical IBM acronym that works for DB2 BLU

IOQWT is a typical acronym, used to deal with lengthy product names like IBM InfoSphere Optim Query Workload Tuner for DB2 for Linux, UNIX, and Windows. It is part of the InfoSphere Optim suite of tools for data lifecycle management. IOQWT usually helps to tune single queries or entire workloads. Some licenses for the tool are included in Advanced Editions and the Developer Edition of DB2. That is how I got my fingers on it. And the reason I had to try out using IOQWT is that the so-called Workload Table Organization Advisor (WTOA...?) can predict whether it makes sense to convert tables from row organization to a column-organized layout, i.e., whether DB2 with BLU Acceleration is a big benefit.

My journey started by starting up Data Studio with IOQWT integration. In the so-called Data Source Explorer I had to activate my database for tuning. After the license had been applied and additional internal-use tables been created, I could start tuning, i.e., invoke the tuning advisors. The first was to tell IOQWT what statements to analyze. It allows to directly paste a single query as text, however, it only recommends table organizations for a workload, a set of queries. Thus I chose to import a file with three queries. My test database uses a TPC-H schema and I imported 3 queries.

After saving the 3 queries to a workload, IOQWT was ready for action. As next step I could select what types of advisors it should run and what type of tuning suggestions I was interested in. After marking Table Organization as the kind of advice I proceeded to the next step, waiting for results. :)

As you can see below, IOQWT analyzed six tables from my small TPC-H database. If I would convert tables to column organization (BLU Acceleration), it predicted a performance improvement of 98.79%. The most gain would be for the first query which right now has the most costs and would have least costs associated after the conversion. So DB2 with BLU Acceleration seems to make sense for my workload and by analyzing it in IOQWT I got that information without converting my database.

That's it for today, see older posts on DB2 with BLU Acceleration.

Results from Optim Query Workload Tuner suggesting DB2 BLU

Monday, August 5, 2013

DB2 BLU: A look at the catalog tables and the synopsis table

DB2 10.5 with BLU Acceleration allows to create column-organized tables. They are the foundation for "actionable compression" and smart in-memory processing. I plan to cover some details in upcoming posts and already have written some articles about DB2 BLU earlier. Today I show you what happens in the DB2 catalog, the metadata, when you create a table.

I started by "db2set DB2_WORKLOAD=ANALYTICS". This is the magic knob to simplicity and performance for analytic queries. After creating a database, I connected and then I was ready to create a table.

create table hl.bludata(id int, quota decimal(5,2),sold int, comment varchar(200))
DB20000I The SQL command completed successfully.

The most important metadata for tables is accessible in the SYSCAT.TABLES catalog view. By sending the following query to DB2 a small subset of the columns is fetched for the table I just created:

select tabname,tabschema,property,compression,tableorg from syscat.tables where tabname like '%BLUDATA%'

TABNAME                        TABSCHEMA PROPERTY                         COMPRESSION TABLEORG
------------------------------ --------- -------------------------------- ----------- --------
BLUDATA                        HL                            Y                         C
SYN130805063137432548_BLUDATA SYSIBM                        YY                        C

2 record(s) selected.

But wait, there are two tables. The first has the name and schemaname as specified, the second is created in the schema SYSIBM and is created under the covers. It is the so-called synopsis table and its name starts with the prefix SYN followed by a timestamp and the name of the base table. Both tables are column-organized (TABLEORG=C). For both we also see a value for PROPERTY which is a character vector. The first "Y" is at position 20, indicating a column-organized table. The synopsis table also has a "Y" at position 21, officially marking it a synopsis table (see the descriptions for SYSCAT.TABLES).

The purpose of the synopsis table is to help with data organization of the base table and to aid DB2 with fast data skipping during query processing (I plan to talk about dyanmic list prefetching in a future post).

Because both tables are related to each other, another catalog table is involved, SYSCAT.TABDEP, where table or object dependencies are recorded.

select tabname,dtype,bname,bschema,btype from syscat.tabdep where tabname like '%BLUDATA%' or bname like '%BLUDATA%'

TABNAME                       DTYPE BNAME BSCHEMA BTYPE
----------------------------- ------ ------- ------- -----
SYN130805063137432548_BLUDATA 7    BLUDATA HL      T

1 record(s) selected.

Based on the metadata the table "SYN130805063137432548_BLUDATA" is a synopsis table (DTYPE=7) and depends on the table I created (BLUDATA). But how does the synopsis table look like? I am using the command "describe table" to see what columns have been defined:

describe table SYSIBM.SYN130805063137432548_BLUDATA

                                Data type                     Column
Column name                     schema    Data type name      Length     Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
IDMIN                           SYSIBM    INTEGER                      4     0 Yes
IDMAX                           SYSIBM    INTEGER                      4     0 Yes
QUOTAMIN                        SYSIBM    DECIMAL                      5     2 Yes
QUOTAMAX                        SYSIBM    DECIMAL                      5     2 Yes
SOLDMIN                         SYSIBM    INTEGER                      4     0 Yes
SOLDMAX                         SYSIBM    INTEGER                      4     0 Yes
TSNMIN                          SYSIBM    BIGINT                       8     0 No
TSNMAX                          SYSIBM    BIGINT                       8     0 No

8 record(s) selected.

What can be seen is that for all non-character columns of my table we have a corresponding MIN and MAX column. In addition we have two BIGINT columns TSNMIN and TSNMAX. TSN stands for Tuple Sequence Number and helps to recompose a row based on individual column values. The synopsis table plays a pivotal role (pun intended) for faster query execution. But that is part of a future post...

Thursday, August 1, 2013

CTQ: Come on, Tune my Query (with DB2 BLU)

DB2 10.5 comes with BLU Acceleration. In a previous post I had explained what technologies are used for BLU and other analytic accelerators. For BLU it is a mix of columnar storage organization with extreme compression together with exploitation of parallel computing and new CPU instruction sets plus enhancements to bufferpool and I/O management for in-memory processing - quite a long list. But how can I find out whether BLU Acceleration is used?

The key to performance is looking for the new CQT operator in the query access plans (explain output). A usage example is shown here in the DB2 Information Center:

Access Plan:
-----------
 Total Cost:   570246
 Query Degree:  32

              Rows 
             RETURN
             (   1)
              Cost 
               I/O 
               |
               191 
             LTQ   
             (   2)
             570246 
             505856 
               |
               191 
             CTQ   
             (   3)
             570245 
             505856 
               |
               191 
             GRPBY 
             (   4)
             570245 
             505856 
               |
           2.87997e+08 
             ^HSJOIN
             (   5)
             377881 
             505856 
         /-----+------\
   2.87997e+08         2e+06 
     TBSCAN           TBSCAN
     (   6)           (   7)
     329484           4080.56 
     499008            6848 
       |                |
   2.87997e+08         2e+06 
 CO-TABLE: TPCDS  CO-TABLE: TPCDS   
   STORE_SALES       CUSTOMER
       Q1               Q2

But what is the operator CTQ and what does CTQ stand for? Is it "Critical To Quality", "Copy The Query", "Come on, Tune the Query"or "Critical To Quickness"? TQs are so-called Table Queue operators and are used to pass a special result set from A to B. They are often seen in partitioned databases (InfoSphere Warehouse) as either Directed Table Queues (DTQs) or Broadcast Table Queue (BTQs) to move data across nodes in the cluster. For BLU Acceleration a CTQ is the "columnar to row" gatekeeper and means that the world of BLU has been left and regular DB2 technology takes over.

Ideally, there is only a single CTQ in a query plan and high up because this would mean (almost) everything is processed with BLU Acceleration. If you only store some tables in columnar format, other tables in classic row organization (remember: DB2 with BLU Acceleration is hybrid) and see many CTQs in an access plan and only near the bottom of the plan, rethinking the strategy for the physical database design might be a good idea.

Come on, Tune the Queries - of course with DB2 and BLU Acceleration...

Pages