Wednesday, June 13, 2012

DB2 pureXML and bufferpools - revisited

Some years ago I wrote that XML data is buffered for performance reason and looked at the XDA object. Now I found out that I didn't answer the "buffered or not" question entirely. The question which bufferpools are utilised depending on inlined or regular XML storage remains. So let me answer it.

CREATE TABLE myX (id INT, x1 XML INLINE LENGTH 500, x2 XML) IN myT1 LONG IN myT2;

Considering the above CREATE TABLE statement, data for table myX would be stored in two tablespaces, myT1 and myT2, and use the associated bufferpool(s). If the internal representation of the XML data for column x1 would be up to 500 bytes, the row data consisting of an integer value for id, the XML data for x1, and a descriptor pointing to the XML data of x2 would be stored in the DAT object and hence in tablespace myT1. When accessed, the data would go to the bufferpool associated with myT1. If the XML data for x1 would be larger than the 500 bytes, the row would hold an integer and two descriptors (one each for x1 and x2).
The descriptors would point to entries in data pages in the XDA object which is stored in tablespace myT2. All XML data for x2 and the data too long to be inlined for x1 would be stored in myT2. When accessed, the bufferpool associated with tablespace myT2 would hold the pages.

In short again: XML data is buffered and it goes through the bufferpool associated with the tablespace it is stored in. This is the same as with regular relational data or index pages.

2 comments:

Anonymous said...

I am sure you posting what you believe to be true, and what some in IBM have told you, but I don't believe that is accurate unless INLINE specification is made. DB2 cannot store XML data in bufferpools for the same reason they cannot store LOB data in bufferpools. It is rather revealing that neither LOB nor XML columns can be accessed via HADR standby read only database. Also, performance of pureXML that is not INLINE'd is not anywhere as good as an equivalent VARCHAR column with XML data would be (with the restriction that both columns were less than 32K and on a 32K page size).

Obviously, pureXML has advantages over VARCHAR and CLOB, in its ability to understand XML schema in the SQL, but in terms of performance, it works like a CLOB.

Henrik Loeser said...

Well, you believe, I know... :)

As non-believer and regular DB2 user you could look into how the database system works. One way is to look at the disk with the tool db2dart. You could see the on-disk structures and how they facilitate query performance. You could take a performance trace and see that the bufferpool is involved with processing the data. And there some more ways I could tell you if I had your name and email address.

Did you look up the term "FUD" in Wikipedia...?

Best regards,
Henrik

LinkWithin

Related Posts with Thumbnails