Friday, February 27, 2015

DB2 Battle: Optimization Profile vs. Statement Concentrator (Part 1)

DB2 explain output
Today I wanted to try using a DB2 Optimization Profile for a statement impacted by the DB2 Statement Concentrator. It turned out to be a longer exercise than thought, but also with more fun, covering a lot of the DB2 tuning and monitoring infrastructure. So set some minutes aside and follow my journey into the world of query optimization, workload monitoring, session actuals, and more. I split it into two parts because of length (Update on 02-Jul-2015: A third part is here).

The original question I tried to answer was related to REOPT and a query like this:
select id, s
from betw
where id between ? and ?



Is it possible to force an application to always reoptimize a statement, so that depending on the given range either an index scan or table scan is used. DB2 has the concept of multi-phase statement compilation, optimization, and execution. The individual steps are briefly explained in the introduction to this article on improving query performance by using REOPT bind option. Using that option it is possible to have (complex) statements re-optimized before every execution, e.g., to address different values in input variables.

The mentioned statement concentrator is a means for reducing query compilation and optimization effort by replacing literals in similar statements by parameter markers. It can have lots of benefits, but also could cause suboptimal access plans if values are not distributed evenly.

For my tests I used a JDBC-based SQL GUI to send statements like "explain plan for select id, s from betw where id between 20 and 30" to DB2, changing the upper bound to much higher values. Without the statement concentrator turned on this resulted in access plans with either an index scan or table scan. Here is such an index-based access plan:

Access Plan:
-----------
    Total Cost:         20.3338
    Query Degree:        1

            Rows
           RETURN
           (   1)
            Cost
             I/O
             |
           11.996
           FETCH
           (   2)
           20.3338
              3
         /---+----\
     11.996        98304
     IXSCAN   TABLE: HLOESER
     (   3)        BETW
     13.5606        Q1
        2
       |
      98304
 INDEX: HLOESER
     BETWIX
       Q1


With the statement concentrator turned on, all statements variations are transformed into a statement with variables. In the explain output the statements that have been rewritten by an active statement concentrator can be identified because of an extra section labeled "effective statement":

Effective Statement:
-------------------
select
  id,

  s
from
  betw
where
  id between :L0 and :L1


After getting this far I wanted to apply an optimization profile to cause the specific statement to be always reoptimized. Optimization profiles provide many ways for performance tuning on a, at least sometimes, low-level basis. They are less intrusive than database hints (see Oracle-style hints in DB2 :), can be turned on and off on different levels, and provide all kinds of options. The first step to use profiles is to prepare the DB2 tools catalog for them:

db2 "call sysinstallobjects('opt_profiles', 'c', '', '')" 

An optimization profile is created by writing an XML document according to the XML schema found here. I created the following profile:

 <?xml version="1.0" encoding="UTF-8"?>
<OPTPROFILE>
  <STMTMATCH EXACT='FALSE'/>
  <STMTPROFILE ID="REOPT betw">
     <STMTKEY>
        <![CDATA[select id,s from betw where id between :L0 and :L1 ]]>
     </STMTKEY>
     <OPTGUIDELINES>
       <REOPT VALUE="ALWAYS" />
     </OPTGUIDELINES>
  </STMTPROFILE>
</OPTPROFILE>

The optimization profile consists of a single so-called statement profile which is labeled "REOPT betw". It is for our SQL statement and I decided to use the version as seen in the effective statement. As a guideline to the optimizer the "REOPT ALWAYS" is passed. What I included in the profile is the switch to turn on inexact statement matching. It causes DB2 to consider more variations of the SQL statement for optimization by our rule. Once the optimization profile is complete, it can be inserted into the catalog table. For that I created a file "betw.csv" with a profile name and reference to the XML file:

"HLOESER","PROFILE_BETW","betw.xml"



That file is then imported into DB2:

db2 "import from betw.csv of del modified by lobsinfile insert_update into systools.opt_profile"

Whenever profiles are changed, it is a good idea to flush the cache:
db2 flush optimization profile cache

Usage of an optimization profile can be enabled in different ways, e.g., in an application by setting a registry variable like this:
set current optimization profile="HLOESER"."PROFILE_BETW"

Once this is done there is feedback in the DB2 explain output on whether profiles are in use and if yes, which one. This could be in the form of diagnostic information pointing to errors as shown below or in a section "profile information" that gives details about the name of the profile and the statement identifier:

A problem is reported
Extended Diagnostic Information:
--------------------------------

Diagnostic Identifier:     1
Diagnostic Details:     EXP0005W  The optimization profile could not be
            processed or found in the OPT_PROFILE table in the
            SYSTOOLS schema.
Diagnostic Identifier:     2
Diagnostic Details:     EXP0001W  An unexpected error occurred processing
            or applying an optimization profile or embedded
            optimization guideline. Line number "0", character
            number "0".

Success, the profile is in use
Profile Information:
--------------------
OPT_PROF: (Optimization Profile Name)
    HLOESER.PROFILE_BETW
STMTPROF: (Statement Profile Name)
    REOPT betw


So far, so good. An optimization profile has been created and depending on the which statement is executed by DB2, the explain output shows that my statement level guideline has been considered. To test whether the guideline is really applied, I executed several statements like "explain plan for select id, s from betw where id between 20 and x". X was in the range of 21 to 20000. With the statement concentrator on the explain output showed that all statements ended up in the following optimized statement.


SELECT
  Q3.ID AS "ID",
  Q3.S AS "S"
FROM
  (SELECT
     Q1.$C0
   FROM
     (VALUES
        0) AS Q1
   WHERE
     (:L0 <= :L1 SELECTIVITY 1.000000)
  ) AS Q2,
  HLOESER.BETW AS Q3
WHERE
  (Q3.ID <= :L1) AND
  (:L0 <= Q3.ID)


All statements had the same complex access plan:
Access Plan:
-----------
    Total Cost:         20.469
    Query Degree:        0

                Rows
             Rows Actual
               RETURN
               (   1)
                Cost
                 I/O
                 |
               167.205
                 19
               NLJOIN
               (   2)
               20.469
                 NA
          /------+-------\
         1               167.205
        NA                 19
      TBSCAN             FETCH
      (   3)             (   4)
    0.000186812          20.4688
        NA                 NA
        |              /---+----\
         0         167.205       98304
        NA           19           NA
 TABFNC: SYSIBM    IXSCAN   TABLE: HLOESER
      GENROW       (   5)        BETW
        Q1         13.6489        Q3
                     NA
                     |
                    98304
                     NA
               INDEX: HLOESER
                   BETWIX
                     Q3

Did my guideline work? I wasn't sure so this called for deeper look at the optimization. Read on here for the second part of the journey.