Data Henrik: June 2009

Monday, June 29, 2009

The DB2 Process Model - Some Links

Starting with version 9.5, DB2 switched from a process-oriented architecture to a multi-threaded system architecture. One of the reasons was to better exploit the current and upcoming HW architectures.

Now that customers are migrating to DB2 9.5 and DB2 9.7, some of them coming from systems with unstable future or high maintenance costs, questions often heard are about things like "db2sysc" or "db2fmp". The DB2 Information Center has a short overview of the DB2 Process Model as a good starter. An article on developerWorks, "How multithreaded architecture works in DB2 9.5", has some more information.

Friday, June 26, 2009

DB2 Information Center - Revisited

When a new version of DB2 comes out, it is always very interesting to visit the DB2 Information Center (see my old post here). One reason is to read the "What is new" documents. They give a good overview about all the changes, not just what has been in the marketing focus.

To give an example: By clicking from the "What's New overview" to the "New features and functionality" to "Performance Enhancements", I could go on and read about Access Plan Reuse for consistent performance.

If you have your favorite topics, you could try to search for them (I tried "XMLTABLE"). What I noticed is that the search results have been improved in that sense that now short summaries are presented as part of the result list. This helps you avoid a lot of clicking and is a very welcome usability improvement.

Sunday, June 21, 2009

So many reasons to...

When it is weekend and family time you need really good reasons to get to your computer and download the new DB2 9.7. Thus, you mention things like:

The extended compression features (index compression, XML compression, compression of temp tables) will allow me to reduce the clutter on my desk.
With the improved autonomic feature I will have more family time (but first I need to download and install it).
pureXML support for database partitioning, range partitioning, and multi-dimensional clustering let me allow to store even more recipes for you (in XML format of course) and analyze them.
...

At this time I have to leave (family time before a business trip), so please finish for me. What reasons would you mention? Let me know...

BTW: Here is the link to the DB2 9.7 download site.

Friday, June 19, 2009

Lessons from the field (again): Remember the trees!

In April I had already written about the nice surprises XML and XQuery have for developers. XQuery and XPath operate on the tree representation of a document and in most cases those trees have more than one branch. This is something to keep in mind, always!

This morning I was reminded about that article and that problem again when I was contacted with a question about two versions of seemingly the same query against the DB2 sample database:

xquery declare default element namespace "http://posample.org";
for $d in db2-fn:xmlcolumn("PURCHASEORDER.PORDER")
where $d/PurchaseOrder/item[quantity = 1]
return $d/PurchaseOrder/item/name

and

xquery declare default element namespace "http://posample.org";
db2-fn:xmlcolumn ("PURCHASEORDER.PORDER")/PurchaseOrder/item[quantity = 1]/name

In both cases we iterate over the PurchaseOrders and seem to select only items with a quantity of one. But the first query returns 4 records, the second query 3 records. Why?
This is because the element we are qualifying, i.e., PurchaseOrder, can have (and actually has!) multiple item elements, PurchaseOrder qualifies if there is at least one such item with a quantity of 1 (existential semantics). The second query directly filters at the item level. Hence the difference in the results.

The first query can be rewritten to return the same result as the second query:

xquery declare default element namespace "http://posample.org";
for $d in db2-fn:xmlcolumn("PURCHASEORDER.PORDER")/PurchaseOrder/item
where $d[quantity = 1]
return $d/name

What do we learn? Regardless of SQL or XQuery: Think green! Remember the trees! And enjoy pureXML...

Thursday, June 18, 2009

Some basic presentation/Powerpoint rules...

A couple years ago when I was attending a year-long company-sponsored leadership curriculum, a presentation class (not just Powerpoint class) was part of it. Since then - after lots of practicing - I passed some of the learned on to my teams. The most important part of a presentation is to have the right content for the audience, have it well-structured, added with the right amount of "gimmicks" to make the content sink in faster and deeper.

Unfortunately, after many presentations you leave wondering "What is the message?" and "Could even more 'data' be packed on a single slide?". And sometimes they even fall into something like here:

Tuesday, June 16, 2009

XML: Learn how to say NO

When you have young kids you know that once they have learned the power of the word "NO!", your life gets "more interesting". Learning to say "no" in a more polite way and to use its power more wisely is something that is good for private and business life. Now you may ask "but what about database systems?" Ok, I got you...

When dealing with relational data, refusing "stuff" is more or less simple. You cannot store a string in an integer or date column unless you cast it and it has the right format. You can define unique indexes to make sure certain values are unique. You can define constraints or use triggers to make sure the data you store is in sync with your idea of an idyllic world. For XML data the picture is slightly different and some options are listed here.

A straight-forward way of verifying XML data is to validate it. You can do that in DB2 directly using the XMLVALIDATE() function or in special appliances like WebSphere DataPower. Using triggers or constraints you can make sure all data gets validated.

Validation might be more than you want and then making use of XML indexes is something to consider. When you store XML without validation which is typical for most customers, DB2 only checks that the document is well-formed. Creating an index and specifying the REJECT INVALID VALUES clause, DB2 checks that the value of an XML element or attribute conforms to the specified type.

<order><date>sometime</date></order>

CREATE INDEX myIX on myTable(doc) GENERATE KEY USING XMLPATTERN '/order/date' AS SQL DATE REJECT INVALID VALUES;

If you inserted a document like above, then during the index creation DB2 would choke and point out that the value "sometime" is not valid for a DATE type. A value like "09/29/2000" would pass. Once an index has been created, INSERTs/UPDATEs would pass/fail depending on which values are used for the "date" element.

What do you do when you have a larger table and CREATE INDEX fails? Look into your diagnostic log. DB2 generates a nice query statement for you to retrieve the troublemaker.

Now say YES to pureXML and try it out...

Friday, June 12, 2009

pureXML performance tip: A sequence of good indices...

When you have indexes you want your dbms of choice use them as efficiently as possible - even with XQuery involved. However, SQL and XQuery can be complicated and sometimes a helping hand is needed to aid the compiler/optimizer making the right decisions. Today I am going to show you how to make use of XQuery semantics to simplify a query and at the same time getting query plans with index ANDing to work. Special thanks to my colleague Matthias Nicola, author of the DB2 pureXML Cookbook, for the base idea.

For the query examples below I will use the DB2 sample database which includes XML data. The table CUSTOMER already has a couple XML indices defined, including one on the "Cid" attribute and the customer "name". You can look into the system catalog by trying something like:

select indname,pattern from syscat.indexxmlpatterns where indname in (select indname from syscat.indexes where tabname='CUSTOMER');

The following might not be an interesting query in itself, but it nicely demonstrates what I wanted to show you.

select * from customer
where
(xmlexists('declare default element namespace "http://posample.org";
$INFO/customerinfo[@Cid=1000]')
or
xmlexists('declare default element namespace "http://posample.org";
$INFO/customerinfo[@Cid=1002]'))
and
xmlexists('declare default element namespace "http://posample.org";
$INFO/customerinfo[name="Kathy Smith"]');

Basically, the query is written in the regular SQL style. We are ORing two predicates on the same path (or column), then ANDing a predicate on another path. Because we have two matching indexes, we would expect index ANDing in the plan. Using EXPLAIN we can see that this is not the case.

select * from customer
where xmlexists('declare default element namespace "http://posample.org"; $INFO/customerinfo[(@Cid=1000 or @Cid=1002) and name="Kathy Smith"]');

Even rewriting the query in a more elegant and shorter way as shown above doesn't result in index ANDing. What we can do now, is to make use of the fact that XQuery's General Comparisons utilize existential semantics, i.e., if one matching item is found, all is true. Instead of ORing comparisons on the "Cid" attribute, we compare that attribute against a constructed sequence of values. Semantically it is the same, but it is simpler for the optimizer.

select * from customer
where
xmlexists('declare default element namespace "http://posample.org";
$INFO/customerinfo[@Cid=(1000,1002)]')
and
xmlexists('declare default element namespace "http://posample.org";
$INFO/customerinfo[name="Kathy Smith"]');

select * from customer
where
xmlexists('declare default element namespace "http://posample.org";
$INFO/customerinfo[@Cid=(1000,1002) and name="Kathy Smith"]');

For each of two above queries indexes on "Cid" and "name" are picked and combined using index ANDing (IXAND operator). Simple idea, huge impact.

Friday, June 5, 2009

How to insert XML data into DB2

When you start working with XML in the database, one of your first statements after creating a table with an XML column typically is to insert the "Hello World" test:


insert into foo values(1,'<test>Hello World</test>')

It's great to test out XML in the database and seeing is believing. However, later, when dealing with your real application, statements like the one above are not such a great idea for a couple reasons:

Because you provide the inserted data as constant, it's harder for DB2 to optimize the statements. You should use parameter markers or host variables to provide the data if you want to use the insert statement. The benefit is that statements are prepared only once and it's also less statement parsing for the DB2 compiler. In other words, you save money and the planet...
Another reason to avoid (string) constants is that their size is limited. Even though a SQL statement can be up to 2 GB in length (that's a lot of typing, but easy for statement generators), string constants are limited to 32 kb and you risk running into error SQL0102N.

If you are inserting bulk data, you should also consider import or load. For both of them, XML data can be kept in files.

Wednesday, June 3, 2009

What makes a great XML schema?

When you start using XML within your company there is a good chance that at one point in time you want to define your own XML schema. Then, the next question is "What makes a great XML schema?"

Well, there are many design philosophies, and best practices and recommendations for XML schema design exist (utilize your favorite search engine for that). Another way of approaching the design of new XML schemas is to look into how existing XML schemas have been designed (similar to learning from existing code libraries). To get you started, here are some resources:

Susan Malaika's "Get started with Industry Formats and Services with pureXML" has a nice overview of standards and provides links to them.
Wikipedia has a list of XML schemas.
If this is still not enough, Altova also has a Schema Library.
Finally, in Paul Kiel's article "Profiling XML Schema" is an overview of some industry schemas and what XML Schema features they use and which they avoid. This can give you an indication on what to watch for when you start with your own XML schemas.

BTW: You can see some of the industry formats in action.

Tuesday, June 2, 2009

Buy groceries, produce XML, feed DB2, gain weight insight

By now some of you might have figured I was on vacation. It was very relaxing with almost no emails and no IBM - and no blog. Once during the vacation I was reminded by my kids about what I do. When we shopped for some groceries, the kids had to point out (loudly and in public of course) that the cash register/POS terminal was from IBM.

And then I had to briefly think about what we were doing. By buying the groceries we would create a nice transaction - actually several because we paid with electronic cash. Later, the different sales slips would be transferred in XML format to the company's headquarter and eventually fed into the central enterprise warehouse. In many companies the POS transactions are already shipped as XML because of its flexibility and simple way of looking into transmission issues. However, once received most companies today shred (or "decompose") the XML files before the data even reaches a database. A lot of information could be lost during that phase, sometimes even data integrity or security is at risk. These are some of the reasons why more and more companies are looking into directly feeding the XML files with the POS slips into the database, e.g., the enterprise warehouse, operational data stores, or central staging areas for other backbone systems.

DB2 with its pureXML functionality can be of great help because it allows to store the XML data in its native format, keeping all the information. Using SQL/XML and XQuery it is possible to look into the data and analyze it. Functions such as XMLTABLE allow to present XML data in relational table structures to support BI tools that are not (yet) XML-enabled.

It's good to know that the upcoming DB2 9.7 (see the early acces program and the announcement) improves the functionality for gaining "XML Insight" in data warehouses even further (XML data in range partitioning, multi-dimensional clustering, database partitioning). Really all information that is contained in the POS data can be used, the flexibility that XML offers can be carried along the entire chain, IT processes can be simplified and costs saved, and the goods and prices in the stores adapted to the market needs.

And the latter is what we - as shoppers - are looking for. But I did not tell all of the above to my kids when we left the supermarket as it would have certainly lowered their level of happiness with the new ice cream...

Pages