Wednesday, March 25, 2009

Puzzled about the XML Regions Index?

In February I wrote about the XDA Object and that XML data is buffered, i.e., it is using the bufferpool for performance. Earlier I had given the puzzle below.

The triangle usually depicts an index (balanced tree). In the picture the index is over NA, EMEA, AP - three geographic regions. The solution to the puzzle is the so-called "XML Regions Index" in DB2. You may be puzzled and ask: What is the XML Regions Index?

The DB2 Information Center states the following:
The XML regions index captures how an XML document is divided up internally into regions, which are sets of nodes within a page. When an XML document is represented as nodes, each node is a record in a page. Since regions are sets of nodes within a page, the number of regions index entries can be reduced, and performance may be improved, if a larger page size that can store more nodes within a page is used.
DB2 formats XML documents into data pages which then fit into the bufferpool. In order to handle documents that are larger than a data page, documents are split up into so-called regions which again fit into a page. To be able to quickly find a document for a given document ID or the region for a given node inside such a document and to be able to recompose the XML document from its parts, the XML region index is used. The index holds an entry for every region. The larger the page size (DB2 offers 4kb, 8kb, 16kb, and 32kb), the fewer pages are needed to store a document. The fewer pages, the fewer regions and hence entries in the regions index. And the smaller an index, the faster the access.

That's why in the "15 best practices for pureXML performance in DB2 9" we recommend to use large page sizes where possible:
As a rule of thumb, choose a page size for XML data which is not smaller than two times your average expected document size, subject to the maximum of 32 KB. If you use a single page size for relational and XML data, or for data and indexes, a 32 KB page size may be beneficial for XML data but somewhat detrimental for relational data and index access. In such cases, 16 KB or 8 KB pages may be a better choice that works well for both.
Coming back to the initial puzzle, the rule of thumb is true regardless of the geographic region you are living in...