Friday, July 3, 2009

Lessons from the field: Simplicity wins - XMLGROUP, XMLROW, and XMLAGG

Simplicity and brevity win and that's why I have a longer post on keeping it short. :)

Programmers and engineers love sugar in all kinds of forms, shapes, and flavors - probably to keep the brain powered. Some of us especially like syntactic sugar as it helps to remove complexity and keeps us focused on the essential parts.

Some days ago while being with a customer I noticed a query like this (rewritten by me against the DB2 sample database):

with temp (columns) as
(select xmlelement(name "employee", xmlconcat(
xmlelement(name "id", empno),
xmlelement(name "first", firstnme),
xmlelement(name "last", lastname)))
from employee)
select xmlelement(name "all",xmlagg(columns)) from temp

Basically, all rows of a table (employee) should be returned within a single XML document (rooted with "all") like this:

<all>
<employee>
<id>some value</id>
<first>some value</first>
<last>some value</last>
</employee>
...
</all>


The above, initial query already uses XMLAGG to aggregate the fragments coming from the individual rows into the big document - which is good. However, it uses a common table expression (the WITH expression) to first build the row-based XML fragments, then later combines them into the result document. This unnecessarily complicates the query. It can be rewritten simply into:

select xmlelement(name "all",
xmlagg(xmlelement(name "employee",
xmlconcat(

xmlelement(name "id", empno),
xmlelement(name "first", firstnme),
xmlelement(name "last", lastname)))))

from employee


It is simpler now, but the XML construction still takes up some thoughts as it uses nested calls to XMLELEMENT and XMLCONCAT to build the row fragments. The latter can be accomplished shorter and more elegantly by using either XMLROW, or even better XMLGROUP:

select xmlelement(name "all",
xmlagg(
xmlrow(empno as "id",firstnme as "first",lastname as "last" option row "employee")))
from employee


select xmlgroup(empno as "id",firstnme as "first",lastname as "last" option row "employee" root "all")
from employee



XMLROW allows to very elegantly construct an XML fragment from row data, XMLGROUP to aggregate such or other fragments. XMLROW and XMLGROUP are simpler to use than the XMLELEMENT-based queries, they are easier on the eye, and what is best, should result in better performance than the other variants.

Let's take a look at the optimized query text for the initial query and for the XMLGROUP-based:
Original:

SELECT $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
$INTERNAL_QNAME$(NULL, NULL, all), $INTERNAL_XMLNAMESPACES$((''),
('')), Q4.$C0)
FROM
(SELECT XMLAGG(Q3.$C0)
FROM
(SELECT Q2.$C0
FROM
(SELECT $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
$INTERNAL_QNAME$(NULL, NULL, employee), $INTERNAL_XMLNAMESPACES$((''),
('')), XMLCONCAT(arity, $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
$INTERNAL_QNAME$(NULL, NULL, id), $INTERNAL_XMLNAMESPACES$((''),
('')), $INTERNAL_SQLTOXML$(Q1.EMPNO)), $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
$INTERNAL_QNAME$(NULL, NULL, first), $INTERNAL_XMLNAMESPACES$((''),
('')), $INTERNAL_SQLTOXML$(Q1.FIRSTNME)), $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
$INTERNAL_QNAME$(NULL, NULL, last), $INTERNAL_XMLNAMESPACES$((''),
('')), $INTERNAL_SQLTOXML$(Q1.LASTNAME))))
FROM HLOESER.EMPLOYEE AS Q1) AS Q2) AS Q3) AS Q4


The XMLGROUP-based query:

SELECT XMLDOCUMENT($INTERNAL_XMLELEMENT$(1, $INTERNAL_QNAME$(NULL,
NULL, all), Q3.$C0))
FROM
(SELECT XMLAGG(Q2.$C0)
FROM
(SELECT $INTERNAL_XMLELEMENT$(1, $INTERNAL_QNAME$(NULL,
NULL, employee), $INTERNAL_XMLELEMENT$(1, $INTERNAL_QNAME$(NULL,
NULL, id), $INTERNAL_SQLTOXML$(Q1.EMPNO)), $INTERNAL_XMLELEMENT$(1,
$INTERNAL_QNAME$(NULL, NULL, first), $INTERNAL_SQLTOXML$(Q1.FIRSTNME)),
$INTERNAL_XMLELEMENT$(1, $INTERNAL_QNAME$(NULL,
NULL, last), $INTERNAL_SQLTOXML$(Q1.LASTNAME)))
FROM HLOESER.EMPLOYEE AS Q1) AS Q2) AS Q3


The overall general access plan (not shown) is the same for both queries, however the optimized query text differs. What can be seen is fewer SELECT expressions and a simplified construction, resulting in less CPU cycles spent.

After all the words, let me recap: Simplicity and brevity win...