Data Henrik: DB2 Battle: Optimization Profile vs. Statement Concentrator (Part 2)

Today I wanted to try out using DB2 optimization profiles for a statement impacted by the statement concentrator. In part 1 I gave the background, showed how I created an optimization profile and that a simple explain statement didn't return what I expected. In this final part I am going to look at DB2 section actuals to hopefully proof that my optimization guideline works as I had hoped.

Because all my "explain plan for select ..." statements resulted in the same access plan, I decided to use session actuals to look at how statements are really executed within DB2. The actuals are kind of a live log of the real statement execution costs and the applied access plan. The first step towards session actuals is to have a workload and an activity event monitor and to switch the monitor on:

db2 "create workload betw applname('python') collect activity data with details,section"
db2 "create event monitor betwmon for activities write to table"
db2 "set event monitor betwmon state 1"

The above statements create a workload which collects section data. The application name (APPLNAME) is "python" because I use a Python script (see below) for parts of the testing.

Script:
import ibm_db
conn = ibm_db.connect("hltest","hloeser","secretpw")
ibm_db.exec_immediate(conn, 'set current optimization profile="HLOESER"."PROFILE_BETW"')
ibm_db.exec_immediate(conn, 'select id, s from betw where id between 2 and 20')

Once the script has been executed, the session actuals can be turned into explain information and then that information formatted. To identify which data entry to transform a query like the following one can be used. The table name consists of the generic prefix "ACTIVITYSTMT" and the monitor name:

select appl_id,uow_id,activity_id,stmt_text from ACTIVITYSTMT_BETWMON

The output then serves as input to the "explain_from_activity" procedure:
CALL EXPLAIN_FROM_ACTIVITY('*LOCAL.hloeser.150227075721',1,2, 'BETWMON',NULL,?,?,?,?,?)

The procedure takes the logged execution statistics for the section and turns it into regular explain data. The explain data can be stored in an user copy of the DB2 explain tables. The tool db2exfmt is then used to format the explain data to typical DB2 explain output. To identify what data to format the procedure above returns the needed information, something like this:

Value of output parameters
--------------------------
Parameter Name : EXPLAIN_SCHEMA
Parameter Value : HLEXP

Parameter Name : EXPLAIN_REQUESTER
Parameter Value : HLOESER

Parameter Name : EXPLAIN_TIME
Parameter Value : 2015-02-27-11.55.21.008920

Parameter Name : SOURCE_NAME
Parameter Value : SYSSH200

Parameter Name : SOURCE_SCHEMA
Parameter Value : NULLID

Parameter Name : SOURCE_VERSION
Parameter Value :

Return Status = 0

Calling db2exfmt with that information as input:
db2exfmt -d hltest -w 2015-02-27-11.55.21.008920 -n SYSSH200 -s NULLID -e HLEXP

Long story short, with all the setup I managed to get different access plans and costs for the query with the BETWEEN predicate, depending on the range of values. It was either an index or a table scan. db2exfmt also showed a "Explain from section", the use of my statement-level optimization guideline as shown earlier, and that the statement concentrator still was on (see "effective statement").

Here the two access plans as result from the REOPT ALWAYS in the optimization profile:

Access Plan:
-----------
    Total Cost:         590.416
    Query Degree:        0

      Rows
   Rows Actual
     RETURN
     (   1)
      Cost
       I/O
       |
      98302
      4850
     TBSCAN
     (   2)
     590.416
       NA
       |
      98304
       NA
TABLE: HLOESER
      BETW
       Q1

Access Plan:
-----------
    Total Cost:         20.3399
    Query Degree:        0

            Rows
         Rows Actual
           RETURN
           (   1)
            Cost
             I/O
             |
           18.9981
             19
           FETCH
           (   2)
           20.3399
             NA
         /---+----\
     18.9981       98304
       19           NA
     IXSCAN   TABLE: HLOESER
     (   3)        BETW
     13.5646        Q1
       NA
       |
      98304
       NA
INDEX: HLOESER
     BETWIX
       Q1

As a last test I removed the line from my script which sets the current optimization profile. With that all statements turn back into the generic and complex access plan:

Access Plan:
-----------
    Total Cost:         20.469
    Query Degree:        0

                Rows
             Rows Actual
               RETURN
               (   1)
                Cost
                 I/O
                 |
               167.205
                 19
               NLJOIN
               (   2)
               20.469
                 NA
          /------+-------\
         1               167.205
        NA                 19
      TBSCAN             FETCH
      (   3)             (   4)
    0.000186812          20.4688
        NA                 NA
        |              /---+----\
         0         167.205       98304
        NA           19           NA
TABFNC: SYSIBM    IXSCAN   TABLE: HLOESER
      GENROW       (   5)        BETW
        Q1         13.6489        Q3
                     NA
                     |
                    98304
                     NA
               INDEX: HLOESER
                   BETWIX
                     Q3

So what is the conclusion? Applying the statement concentrator turns similar statements into a generic one to cut down on query compilation and optimization costs. This is useful in many cases. If queries use predicates like the BETWEEN it can result in suboptimal plans. In such a case a DB2 optimization profile might be useful. But how to proof it? It turns out that session actuals are very useful to dig deeper into what is happening within the database engine. With few steps it is possible to look at how a statement is really executed. And sometimes a database engine does exactly as the user is hoping for... ;-)

Update on 02-Jul-2015: Some additional information can be found in part 3 of this journey.

Data Henrik

Pages

Friday, February 27, 2015

DB2 Battle: Optimization Profile vs. Statement Concentrator (Part 2)