Friday, June 12, 2009

pureXML performance tip: A sequence of good indices...

When you have indexes you want your dbms of choice use them as efficiently as possible - even with XQuery involved. However, SQL and XQuery can be complicated and sometimes a helping hand is needed to aid the compiler/optimizer making the right decisions. Today I am going to show you how to make use of XQuery semantics to simplify a query and at the same time getting query plans with index ANDing to work. Special thanks to my colleague Matthias Nicola, author of the DB2 pureXML Cookbook, for the base idea.

For the query examples below I will use the DB2 sample database which includes XML data. The table CUSTOMER already has a couple XML indices defined, including one on the "Cid" attribute and the customer "name". You can look into the system catalog by trying something like:

select indname,pattern from syscat.indexxmlpatterns where indname in (select indname from syscat.indexes where tabname='CUSTOMER');

The following might not be an interesting query in itself, but it nicely demonstrates what I wanted to show you.

select * from customer
where
(xmlexists('declare default element namespace "http://posample.org";
$INFO/customerinfo[@Cid=1000]')

or
xmlexists('declare default element namespace "http://posample.org";
$INFO/customerinfo[@Cid=1002]'))

and
xmlexists('declare default element namespace "http://posample.org";
$INFO/customerinfo[name="Kathy Smith"]');


Basically, the query is written in the regular SQL style. We are ORing two predicates on the same path (or column), then ANDing a predicate on another path. Because we have two matching indexes, we would expect index ANDing in the plan. Using EXPLAIN we can see that this is not the case.

select * from customer
where xmlexists('declare default element namespace "http://posample.org"; $INFO/customerinfo[(@Cid=1000 or @Cid=1002) and name="Kathy Smith"]');


Even rewriting the query in a more elegant and shorter way as shown above doesn't result in index ANDing. What we can do now, is to make use of the fact that XQuery's General Comparisons utilize existential semantics, i.e., if one matching item is found, all is true. Instead of ORing comparisons on the "Cid" attribute, we compare that attribute against a constructed sequence of values. Semantically it is the same, but it is simpler for the optimizer.

select * from customer
where
xmlexists('declare default element namespace "http://posample.org";
$INFO/customerinfo[@Cid=(1000,1002)]')

and
xmlexists('declare default element namespace "http://posample.org";
$INFO/customerinfo[name="Kathy Smith"]')
;

select * from customer
where
xmlexists('declare default element namespace "http://posample.org";
$INFO/customerinfo[@Cid=(1000,1002) and name="Kathy Smith"]');


For each of two above queries indexes on "Cid" and "name" are picked and combined using index ANDing (IXAND operator). Simple idea, huge impact.