Tuesday, June 16, 2009

XML: Learn how to say NO

When you have young kids you know that once they have learned the power of the word "NO!", your life gets "more interesting". Learning to say "no" in a more polite way and to use its power more wisely is something that is good for private and business life. Now you may ask "but what about database systems?" Ok, I got you...

When dealing with relational data, refusing "stuff" is more or less simple. You cannot store a string in an integer or date column unless you cast it and it has the right format. You can define unique indexes to make sure certain values are unique. You can define constraints or use triggers to make sure the data you store is in sync with your idea of an idyllic world. For XML data the picture is slightly different and some options are listed here.

A straight-forward way of verifying XML data is to validate it. You can do that in DB2 directly using the XMLVALIDATE() function or in special appliances like WebSphere DataPower. Using triggers or constraints you can make sure all data gets validated.

Validation might be more than you want and then making use of XML indexes is something to consider. When you store XML without validation which is typical for most customers, DB2 only checks that the document is well-formed. Creating an index and specifying the REJECT INVALID VALUES clause, DB2 checks that the value of an XML element or attribute conforms to the specified type.

<order><date>sometime</date></order>


CREATE INDEX myIX on myTable(doc) GENERATE KEY USING XMLPATTERN '/order/date' AS SQL DATE REJECT INVALID VALUES;


If you inserted a document like above, then during the index creation DB2 would choke and point out that the value "sometime" is not valid for a DATE type. A value like "09/29/2000" would pass. Once an index has been created, INSERTs/UPDATEs would pass/fail depending on which values are used for the "date" element.

What do you do when you have a larger table and CREATE INDEX fails? Look into your diagnostic log. DB2 generates a nice query statement for you to retrieve the troublemaker.

Now say YES to pureXML and try it out...