Wednesday, November 25, 2009

Small is beautiful (and fast): LOB Inlining

In the past I had emphasized that LOBs and XML data are different in DB2, even though both are stored in the LONG tablespace if you wish so (LONG IN option during CREATE TABLE). XML data is buffered, i.e., pages with XML data are held in the bufferpool, LOBs require a direct fetch via their descriptor and are not buffered by DB2. XML columns do not have a length specification, LOB columns have the maximum length specified. The bigger the maximum possible LOB size is, the larger is the LOB descriptor that usually is stored as part of the row.

Now, starting with DB2 9.7, they have a neat feature in common: Both can be inlined. What is inlining and what are the benefits (for LOBs)?

Because LOBs and XML data can be quite large, they are stored outside the regular row data, i.e., outlined, and a LOB descriptor points to the large object. With inlining however, when the data fits into the regular data pages, it is stored as part of the row, i.e., inline.
DBAs can specify the maximum inline size for each LOB or XML column by using "INLINE LENGTH xxx" during CREATE/ALTER TABLE. The total row size, i.e., the sum of all column sizes and some overhead, needs to fit into the data pages. What this means is that if you use 8k pages, an inline length of 15000 won't work for sure. In contrast to XML columns, if no inline length is specified, for LOB columns an inline length equal to the descriptor size is set implicitly.

Why store 200 bytes outlined and have a 220 bytes LOB descriptor in the row (total 420 bytes), when all can be stored in 200 bytes in the row? And now we are already talking about the benefits of LOB inlining. If you have lots of small LOBs, inlining can improve performance and reduce storage significantly. Storage is reduced because the extra LOB descriptor is avoided. In addition, LOB data stored in the row, i.e., inlined, benefits from row compression when enabled. The performance improvement comes from the direct access (no LOB descriptor involved) and the fact that regular table data is cached in the bufferpool. The direct fetch operations to the unbuffered LOBs are avoided.

As shown, LOB inlining has many benefits and if your Large OBjects (LOBs) are really small objects you should exploit this feature. I found this article describing how LOB inlining in DB2 is used for SAP databases. Because many small LOBs exist to store properties, this can have significant performance benefits.