Thursday, August 1, 2013

CTQ: Come on, Tune my Query (with DB2 BLU)

DB2 10.5 comes with BLU Acceleration. In a previous post I had explained what technologies are used for BLU and other analytic accelerators. For BLU it is a mix of columnar storage organization with extreme compression together with  exploitation of parallel computing and new CPU instruction sets plus enhancements to bufferpool and I/O management for in-memory processing - quite a long list. But how can I find out whether BLU Acceleration is used?

The key to performance is looking for the new CQT operator in the query access plans (explain output). A usage example is shown here in the DB2 Information Center:
Access Plan:
-----------
 Total Cost:   570246
 Query Degree:  32

              Rows 
             RETURN
             (   1)
              Cost 
               I/O 
               |
               191 
             LTQ   
             (   2)
             570246 
             505856 
               |
               191 
             CTQ   
             (   3)
             570245 
             505856 
               |
               191 
             GRPBY 
             (   4)
             570245 
             505856 
               |
           2.87997e+08 
             ^HSJOIN
             (   5)
             377881 
             505856 
         /-----+------\
   2.87997e+08         2e+06 
     TBSCAN           TBSCAN
     (   6)           (   7)
     329484           4080.56 
     499008            6848 
       |                |
   2.87997e+08         2e+06 
 CO-TABLE: TPCDS  CO-TABLE: TPCDS   
   STORE_SALES       CUSTOMER
       Q1               Q2

But what is the operator CTQ and what does CTQ stand for? Is it "Critical To Quality", "Copy The Query", "Come on, Tune the Query"or "Critical To Quickness"? TQs are so-called Table Queue operators and are used to pass a special result set from A to B. They are often seen in partitioned databases (InfoSphere Warehouse) as either Directed Table Queues (DTQs) or Broadcast Table Queue (BTQs) to move data across nodes in the cluster. For BLU Acceleration a CTQ is the "columnar to row" gatekeeper and means that the world of BLU has been left and regular DB2 technology takes over.

Ideally, there is only a single CTQ in a query plan and high up because this would mean (almost) everything is processed with BLU Acceleration. If you only store some tables in columnar format, other tables in classic row organization (remember: DB2 with BLU Acceleration is hybrid) and see many CTQs in an access plan and only near the bottom of the plan, rethinking the strategy for the physical database design might be a good idea.

Come on, Tune the Queries - of course with DB2 and BLU Acceleration...

4 comments:

Rob said...

A question: Will BLU join two columnar tables without a CQT? Will the join take place in-memory?

Henrik Loeser said...

Yes, the tables would be joined "under" the CTQ. This is actually seen in the access plan above. Depending on your system and involved data sizes this is typically done in memory. If the involved data is much bigger than the memory, disk is involved.

Henrik

Sundeep Yadav said...

Hi Henrik,

My question is assuming that db2 10.5 TSN is same as RIDs in mainframe.

We are trying to design our ODS database and we are coming
across many features of DB2 that wont run on a column based table.

One question that is outstanding in our minds is how does DB2 hold the
rids of a column based table. I would imagine you have a column value and
then a list of rids/TSN where that value is used, but how are they
stored and do we suffer from RID list processing when we delete large
amounts of data from non unique or low cardinality columns.

We are trying to design a 6 month table or set of tables so that we can
round robin delete on the 7th month but we cant partition on the month
value and truncate the oldest data as we would in a row based table.

So if we create one big table we need to physically delete the data from
the oldest month and we can imagine that this would be painful. We are also
afraid of contention on the Rid list for new inserts.

Can you throw some light to help us understand the storage and processing. Or what can be alternate way of implementing the design.

thanks
Sundeep

Henrik Loeser said...

Sundeep,

the TSN are not really RIDs as the RID have information about the row storage (tablespace, page, etc.). I would recommend reading the following paper as a starter. It explains details about the storage organization for columnar organisation.

Let me know your questions thereafter.

http://researcher.watson.ibm.com/researcher/files/us-ipandis/vldb13db2blu.pdf

Henrik

LinkWithin

Related Posts with Thumbnails