Wednesday, July 28, 2010

Catalog view xmlstrings for simpler access to stringID information

The XML support in DB2 is tightly integrated into the database engine and provides fast and sophisticated processing of XML data. In the past I had explained why - for compactness and speed - element and attribute names, namespace information is replaced with so-called stringIDs. The string to stringID mappings are stored in a cached dictionary which is persisted in a system catalog.

That system catalog, SYSIBM.SYSXMLSTRINGS, an internal table, has undergone some changes over the past database versions. In DB2 9.1, pureXML support was restricted to databases using a Unicode codepage. Hence, the string information was stored in clear text in the database codepage. Users could easily access AND display the system information. In DB2 9.5, the pureXML feature could also be used in non-Unicode databases. The VARCHAR-based string column was then changed into a VARCHAR FOR BIT DATA column to store the UTF-8 codes properly. Via a new function XMLBIT2CHAR it was possible to turn the encoded information back into a readable string.

Now, in the current version DB2 9.7, life got much simpler because a catalog view SYSCAT.XMLSTRINGS was introduced. It shows the stringID, the string in the database codepage (by calling the mentioned XMLBIT2CHAR function), and the string as bit data (hex format).