Friday, February 27, 2015

DB2 Battle: Optimization Profile vs. Statement Concentrator (Part 2)

Today I wanted to try out using DB2 optimization profiles for a statement impacted by the statement concentrator. In part 1 I gave the background, showed how I created an optimization profile and that a simple explain statement didn't return what I expected. In this final part I am going to look at DB2 section actuals to hopefully proof that my optimization guideline works as I had hoped.

Because all my "explain plan for select ..." statements resulted in the same access plan, I decided to use session actuals to look at how statements are really executed within DB2. The actuals are kind of a live log of the real statement execution costs and the applied access plan. The first step towards session actuals is to have a workload and an activity event monitor and to switch the monitor on:


db2 "create workload betw applname('python') collect activity data with details,section"
db2 "create event monitor betwmon for activities write to table"
db2 "set event monitor betwmon state 1"

The above statements create a workload which collects section data. The application name (APPLNAME) is "python" because I use a Python script (see below) for parts of the testing.


Script:
import ibm_db
conn = ibm_db.connect("hltest","hloeser","secretpw")
ibm_db.exec_immediate(conn, 'set current optimization profile="HLOESER"."PROFILE_BETW"')
ibm_db.exec_immediate(conn, 'select id, s from betw where id between 2 and 20')



Once the script has been executed, the session actuals can be turned into explain information and then that information formatted. To identify which data entry to transform a query like the following one can be used. The table name consists of the generic prefix "ACTIVITYSTMT" and the monitor name:

select appl_id,uow_id,activity_id,stmt_text from ACTIVITYSTMT_BETWMON

The output then serves as input to the "explain_from_activity" procedure:
CALL EXPLAIN_FROM_ACTIVITY('*LOCAL.hloeser.150227075721',1,2, 'BETWMON',NULL,?,?,?,?,?)

The procedure takes the logged execution statistics for the section and turns it into regular explain data. The explain data can be stored in an user copy of the DB2 explain tables. The tool db2exfmt is then used to format the explain data to typical DB2 explain output. To identify what data to format the procedure above returns the needed information, something like this:

  Value of output parameters
  --------------------------
  Parameter Name  : EXPLAIN_SCHEMA
  Parameter Value : HLEXP  

  Parameter Name  : EXPLAIN_REQUESTER
  Parameter Value : HLOESER

  Parameter Name  : EXPLAIN_TIME
  Parameter Value : 2015-02-27-11.55.21.008920

  Parameter Name  : SOURCE_NAME
  Parameter Value : SYSSH200

  Parameter Name  : SOURCE_SCHEMA
  Parameter Value : NULLID 

  Parameter Name  : SOURCE_VERSION
  Parameter Value :

  Return Status = 0


Calling db2exfmt with that information as input:
db2exfmt -d hltest -w  2015-02-27-11.55.21.008920 -n SYSSH200 -s NULLID -e HLEXP

Long story short, with all the setup I managed to get different access plans and costs for the query with the BETWEEN predicate, depending on the range of values. It was either an index or a table scan. db2exfmt also showed a "Explain from section", the use of my statement-level optimization guideline as shown earlier, and that the statement concentrator still was on (see "effective statement").

Here the two access plans as result from the REOPT ALWAYS in the optimization profile:

Access Plan:
-----------
    Total Cost:         590.416
    Query Degree:        0

      Rows
   Rows Actual
     RETURN
     (   1)
      Cost
       I/O
       |
      98302
      4850
     TBSCAN
     (   2)
     590.416
       NA
       |
      98304
       NA
 TABLE: HLOESER
      BETW
       Q1

Access Plan:
-----------
    Total Cost:         20.3399
    Query Degree:        0

            Rows
         Rows Actual
           RETURN
           (   1)
            Cost
             I/O
             |
           18.9981
             19
           FETCH
           (   2)
           20.3399
             NA
         /---+----\
     18.9981       98304
       19           NA
     IXSCAN   TABLE: HLOESER
     (   3)        BETW
     13.5646        Q1
       NA
       |
      98304
       NA
 INDEX: HLOESER
     BETWIX
       Q1


As a last test I removed the line from my script which sets the current optimization profile. With that all statements turn back into the generic and complex access plan:

Access Plan:
-----------
    Total Cost:         20.469
    Query Degree:        0

                Rows
             Rows Actual
               RETURN
               (   1)
                Cost
                 I/O
                 |
               167.205
                 19
               NLJOIN
               (   2)
               20.469
                 NA
          /------+-------\
         1               167.205
        NA                 19
      TBSCAN             FETCH
      (   3)             (   4)
    0.000186812          20.4688
        NA                 NA
        |              /---+----\
         0         167.205       98304
        NA           19           NA
 TABFNC: SYSIBM    IXSCAN   TABLE: HLOESER
      GENROW       (   5)        BETW
        Q1         13.6489        Q3
                     NA
                     |
                    98304
                     NA
               INDEX: HLOESER
                   BETWIX
                     Q3

So what is the conclusion? Applying the statement concentrator turns similar statements into a generic one to cut down on query compilation and optimization costs. This is useful in many cases. If queries use predicates like the BETWEEN it can result in suboptimal plans. In such a case a DB2 optimization profile might be useful. But how to proof it? It turns out that session actuals are very useful to dig deeper into what is happening within the database engine. With few steps it is possible to look at how a statement is really executed. And sometimes a database engine does exactly as the user is hoping for... ;-)


DB2 Battle: Optimization Profile vs. Statement Concentrator (Part 1)

DB2 explain output
Today I wanted to try using a DB2 Optimization Profile for a statement impacted by the DB2 Statement Concentrator. It turned out to be a longer exercise than thought, but also with more fun, covering a lot of the DB2 tuning and monitoring infrastructure. So set some minutes aside and follow my journey into the world of query optimization, workload monitoring, session actuals, and more. I split it into two parts because of length.

The original question I tried to answer was related to REOPT and a query like this:
select id, s
from betw
where id between ? and ?


Is it possible to force an application to always reoptimize a statement, so that depending on the given range either an index scan or table scan is used. DB2 has the concept of multi-phase statement compilation, optimization, and execution. The individual steps are briefly explained in the introduction to this article on improving query performance by using REOPT bind option. Using that option it is possible to have (complex) statements re-optimized before every execution, e.g., to address different values in input variables.

The mentioned statement concentrator is a means for reducing query compilation and optimization effort by replacing literals in similar statements by parameter markers. It can have lots of benefits, but also could cause suboptimal access plans if values are not distributed evenly.

For my tests I used a JDBC-based SQL GUI to send statements like "explain plan for select id, s from betw where id between 20 and 30" to DB2, changing the upper bound to much higher values. Without the statement concentrator turned on this resulted in access plans with either an index scan or table scan. Here is such an index-based access plan:

Access Plan:
-----------
    Total Cost:         20.3338
    Query Degree:        1

            Rows
           RETURN
           (   1)
            Cost
             I/O
             |
           11.996
           FETCH
           (   2)
           20.3338
              3
         /---+----\
     11.996        98304
     IXSCAN   TABLE: HLOESER
     (   3)        BETW
     13.5606        Q1
        2
       |
      98304
 INDEX: HLOESER
     BETWIX
       Q1


With the statement concentrator turned on, all statements variations are transformed into a statement with variables. In the explain output the statements that have been rewritten by an active statement concentrator can be identified because of an extra section labeled "effective statement":

Effective Statement:
-------------------
select
  id,

  s
from
  betw
where
  id between :L0 and :L1


After getting this far I wanted to apply an optimization profile to cause the specific statement to be always reoptimized. Optimization profiles provide many ways for performance tuning on a, at least sometimes, low-level basis. They are less intrusive than database hints (see Oracle-style hints in DB2 :), can be turned on and off on different levels, and provide all kinds of options. The first step to use profiles is to prepare the DB2 tools catalog for them:

db2 "call sysinstallobjects('opt_profiles', 'c', '', '')" 

An optimization profile is created by writing an XML document according to the XML schema found here. I created the following profile:

 <?xml version="1.0" encoding="UTF-8"?>
<OPTPROFILE>
  <STMTMATCH EXACT='FALSE'/>
  <STMTPROFILE ID="REOPT betw">
     <STMTKEY>
        <![CDATA[select id,s from betw where id between :L0 and :L1 ]]>
     </STMTKEY>
     <OPTGUIDELINES>
       <REOPT VALUE="ALWAYS" />
     </OPTGUIDELINES>
  </STMTPROFILE>
</OPTPROFILE>

The optimization profile consists of a single so-called statement profile which is labeled "REOPT betw". It is for our SQL statement and I decided to use the version as seen in the effective statement. As a guideline to the optimizer the "REOPT ALWAYS" is passed. What I included in the profile is the switch to turn on inexact statement matching. It causes DB2 to consider more variations of the SQL statement for optimization by our rule. Once the optimization profile is complete, it can be inserted into the catalog table. For that I created a file "betw.csv" with a profile name and reference to the XML file:

"HLOESER","PROFILE_BETW","betw.xml"



That file is then imported into DB2:

db2 "import from betw.csv of del modified by lobsinfile insert_update into systools.opt_profile"

Whenever profiles are changed, it is a good idea to flush the cache:
db2 flush optimization profile cache

Usage of an optimization profile can be enabled in different ways, e.g., in an application by setting a registry variable like this:
set current optimization profile="HLOESER"."PROFILE_BETW"

Once this is done there is feedback in the DB2 explain output on whether profiles are in use and if yes, which one. This could be in the form of diagnostic information pointing to errors as shown below or in a section "profile information" that gives details about the name of the profile and the statement identifier:

A problem is reported
Extended Diagnostic Information:
--------------------------------

Diagnostic Identifier:     1
Diagnostic Details:     EXP0005W  The optimization profile could not be
            processed or found in the OPT_PROFILE table in the
            SYSTOOLS schema.
Diagnostic Identifier:     2
Diagnostic Details:     EXP0001W  An unexpected error occurred processing
            or applying an optimization profile or embedded
            optimization guideline. Line number "0", character
            number "0".

Success, the profile is in use
Profile Information:
--------------------
OPT_PROF: (Optimization Profile Name)
    HLOESER.PROFILE_BETW
STMTPROF: (Statement Profile Name)
    REOPT betw


So far, so good. An optimization profile has been created and depending on the which statement is executed by DB2, the explain output shows that my statement level guideline has been considered. To test whether the guideline is really applied, I executed several statements like "explain plan for select id, s from betw where id between 20 and x". X was in the range of 21 to 20000. With the statement concentrator on the explain output showed that all statements ended up in the following optimized statement.


SELECT
  Q3.ID AS "ID",
  Q3.S AS "S"
FROM
  (SELECT
     Q1.$C0
   FROM
     (VALUES
        0) AS Q1
   WHERE
     (:L0 <= :L1 SELECTIVITY 1.000000)
  ) AS Q2,
  HLOESER.BETW AS Q3
WHERE
  (Q3.ID <= :L1) AND
  (:L0 <= Q3.ID)


All statements had the same complex access plan:
Access Plan:
-----------
    Total Cost:         20.469
    Query Degree:        0

                Rows
             Rows Actual
               RETURN
               (   1)
                Cost
                 I/O
                 |
               167.205
                 19
               NLJOIN
               (   2)
               20.469
                 NA
          /------+-------\
         1               167.205
        NA                 19
      TBSCAN             FETCH
      (   3)             (   4)
    0.000186812          20.4688
        NA                 NA
        |              /---+----\
         0         167.205       98304
        NA           19           NA
 TABFNC: SYSIBM    IXSCAN   TABLE: HLOESER
      GENROW       (   5)        BETW
        Q1         13.6489        Q3
                     NA
                     |
                    98304
                     NA
               INDEX: HLOESER
                   BETWIX
                     Q3

Did my guideline work? I wasn't sure so this called for deeper look at the optimization. Read on here for the second part of the journey.

Wednesday, February 11, 2015

DB2 License Management, new Offerings, and an hour of fun

DB2 license management

Last month several changes for DB2 10.5 were announced. This included the new built-in database encryption ("native  encryption") as well as ability to license high-end functionality as add-on to base editions of DB2. The page "Functionality in DB2 product editions and DB2 offerings" provides a good overview of what is available in each of the editions, including the new offerings. With all the good news some customers asked me whether they would need to reinstall DB2 to use new functionality or when upgrading, e.g., from DB2 Enterprise Server Edition (DB2ESE) to DB2 Advanced Enterprise Server Edition (DB2AESE). The short answer is: "Single image". The longer is about to follow and involves an hour of fun playing with DB2...


DB2 is really flexible in how it gets deployed. This includes platforms, resource usage, shared disk or database partitioning models, product names, and product features. A key to this ability is some intelligent code and a related license management. I have written about the tool db2licm as well as license management and enforcement in the past, but let's take another look at db2licm, the license management tool. The tool allows to list, add, and remove licenses, change the way licenses are enforced, and to generate reports.

To list the license or licenses for my DB2 instance, I use the option "-l". As can be seen I have the "DB2 Developer Edition" (DB2DE) installed. It includes all features and is suitable for development environments.

[hl@mymachine] db2licm -l
Product name:                     "IBM DB2 Developer Edition"
License type:                     "Developer"
Expiry date:                      "Permanent"
Product identifier:               "db2de"
Version information:              "10.5"

...

To demonstrate recent changes to DB2, I need to get to a different DB2 edition. "db2licm -r db2de" removes the license of the installed Developer Edition. Invocation with the "-a" option and a license file installs a license. An overview of license files is available in the Knowledge Center. I chose a Workgroup Server Edition (DB2WSE) for my tests:


[hl@mymachine] db2licm -l
Product name:                     "DB2 Workgroup Server Edition"
License type:                     "Authorized User Single Install"
Expiry date:                      "Permanent"
Product identifier:               "db2wse"
Version information:              "10.5"
Max amount of memory (GB):        "128"
Enforcement policy:               "Hard Stop"
Number of licensed authorized users: "25"
Features:
IBM DB2 BLU Acceleration In-Memory Offering:          "Not licensed"
IBM DB2 Performance Management Offering:              "Not licensed"
IBM DB2 Encryption Offering:                          "Not licensed"
IBM DB2 Business Application Continuity Offering:     "Not licensed"


As can be seen, the Workgroup Server Edition is available and uses up to 128 GB of main memory, even if more memory is available on the machine. Interesting to note is the "Hard Stop" enforcement policy. It indicates that DB2 would block my attempts to use any unlicensed features. How about trying it out? I restarted my instance and tried it:

[hl@mymachine] db2 create db enc2 encrypt
SQL8029N  A valid license key was not found for the requested functionality.
Reference numbers: "8".

Because the edition does not allow to use database encryption my attempt to create an encrypted database is blocked. Can I change that? Yes, switching to a soft enforcement policy would allow using a feature like encryption, but would log it as license violation (see later).

[hl@mymachine] db2licm -e db2wse soft

LIC1413W  A soft stop enforcement policy has been set. This enforcement
      policy specifies that unlicensed requests will be logged but not
      restricted.

Explanation:

You issued the db2licm command with the -e parameter, to update the
enforcement policy, and specified the value SOFT. (For example, db2licm
-e db2ese SOFT.) The value SOFT specifies that unlicensed requests will
be logged but not restricted.

User response:

If you want unlicensed requests to be stopped, you must change the
enforcement policy to HARD. For example, db2licm -e db2ese HARD.



LIC1411I  Enforcement policy type updated successfully.

I restarted my DB2 instance and then tried again to create an encrypted database:
[hl@mymachine] db2 create db enc2 encrypt
DB20000I  The CREATE DATABASE command completed successfully.


With the soft enforcement it is possible to use unlicensed features, thereby violating the license terms (which is the reason to have HARD STOP). Anyway, after some testing with encryption and compression, I checked my compliance:

[hl@mymachine] db2licm -g lic.out

LIC1440I  License compliance report generated successfully.
 

[hl@mymachine] cat lic.out

License Compliance Report

DB2 Workgroup Server Edition   Violation
        Data Compression
        Index Compression
        Columnar storage
        Encryption


IBM DB2 BLU Acceleration In-Memory Offering:          "Violation"
        Query Parallelism
        Workload Manager
        Columnar storage

IBM DB2 Performance Management Offering:              "Violation"
        Workload Manager

IBM DB2 Encryption Offering:                          "Violation"
        Encryption

IBM DB2 Business Application Continuity Offering:     "Not used"

Not bad, right? Lots of violations in few minutes and all the new DB2 feature offerings are listed. To be compliant again, I installed the license for DB2DE again and reran the report:

[hl@mymachine] cat lic.out
LIC1439I  DB2 server has detected that "IBM DB2 Developer Edition" is installed on this system.  Products and functions obtained via this offering may only be used for testing or development purposes as outlined in your License Agreement.  The License Agreement for this offering is located in the 'license' directory in the installation path for this product.


As this I am using an universal license, there isn't much to report. However, the license terms point out that I cannot use that edition for production systems. Hence the long informational message.

That's it in terms "do I need to reinstall DB2 to make use of new feature offerings?". No, just apply the correct license or license mode.

LinkWithin

Related Posts with Thumbnails