Monday, February 23, 2009

DB2 pureXML And Bufferpools - XDA Object

One of the questions I answer a lot to both customers and IBMers alike is whether XML data is buffered, i.e., whether XML data utilizes the bufferpools. Before I explain it in more detail, let me answer the question briefly. Yes, XML data goes through the DB2 bufferpool and the entire navigation within documents happens on XML data in the bufferpool. This is for performance, performance, and performance.

Let's take a look at a database that we create with system managed space (SMS). Note that this is NOT the recommended way of creating a database and you should just use the default. The reason we use SMS is that it makes the different database objects visible.

create db testx user tablespace managed by system using ('testx')

Once the database is created, we connect and then create five tables like this:
create table t1(id int, doc xml, text clob(10k));
create table t2(id int, doc xml);
create table t3(id int, text clob(10k));
create table t4(id int);
create table t5(id int unique not null);
After the tables are created, we try to locate the directory or folder where the objects are placed. Because I am using DB2 Express-C on a Windows laptop, I am seeing something like shown in the picture below.

Now we take a look at the DB2 Information Center to have the file types explained.

Each subdirectory or container has a file created in it called SQLTAG.NAM. This file marks the subdirectory as being in use so that subsequent table space creation does not attempt to use these subdirectories.

In addition, a file called SQL*.DAT stores information about each table that the subdirectory or container contains. The asterisk (*) is replaced by a unique set of digits that identifies each table. For each SQL*.DAT file there might be one or more of the following files, depending on the table type, the reorganization status of the table, or whether indexes, LOB, or LONG fields exist for the table:
  • SQL*.BKM (contains block allocation information if it is an MDC table)
  • SQL*.LF (contains LONG VARCHAR or LONG VARGRAPHIC data)
  • SQL*.LB (contains BLOB, CLOB, or DBCLOB data)
  • SQL*.XDA (contains XML data)
  • SQL*.LBA (contains allocation and free space information about SQL*.LB files)
  • SQL*.INX (contains index table data)
  • SQL*.IN1 (contains index table data)
  • SQL*.DTR (contains temporary data for a reorganization of an SQL*.DAT file)
  • SQL*.LFR (contains temporary data for a reorganization of an SQL*.LF file)
  • SQL*.RLB (contains temporary data for a reorganization of an SQL*.LB file)
  • SQL*.RBA (contains temporary data for a reorganization of an SQL*.LBA file)
When we look at the files, we see that the objects for table "t1" have the prefix "SQL00002" and for "t5" the prefix "SQL00006". Now why do we have the different object types?
  • t1/SQL00002 has a DAT object for the table data, LB and LBA objects for the CLOB column, and XDA for the XML data. Why do we have an INX index object? The DB2 Information Center has the answer again (look for the section on the XML data type).
    When a column of type XML is created, an XML path index is created on that column. A table-level XML region index is also created when the first column of type XML is created.
    The INX object is needed for the two system indexes that are created for the XML column.
  • t2/SQL00003 and t3/SQL00004 demonstrate what we just said. If we only have a CLOB column, no index/INX is needed, with XML only, we don't have LOB objects.
  • t4/SQL00005 only has an INTEGER column and the only object is of type DAT. No index is present and hence there isn't any INX file.
  • Finally, to make the reverse probe, we created t5/SQL00006 with a unique constraint which is implemented as a unique index. With an index present, we need an INX object.
Long story short, XML and LOB data are different. XML data is in its own storage object, even if you usually can't see it. DB2 caches XML data in the bufferpools for performance. I didn't explain today how XML data is organized and why XML data can be buffered even if documents are larger than a database page. The latter has to do with the puzzle I gave you some weeks ago. Stay tuned for the details...