Wednesday, July 10, 2013

Extended row size support in DB2: Fitting more data into a single row than the page size would typically allow (and sometimes it makes sense, but that is another story - stay tuned for user experience...)

One of the new features of the new DB2 10.5 release is the support for so-called extended row size. As you might have guessed from the article's title, you can squeeze a lot of data into a single row now. The feature can be used for compatibility reasons with other vendors, to consolidate a table with "side tables" into a single one, or to possibly improve performance by moving tables to smaller page sizes. Let's take a look into the details of "extended row size".

Many of the DB2 size-related limits are documented in the overview of SQL and XML limits, however all the information on column sizes, row sizes, and what can be fitted into data pages of different size is part of the CREATE TABLE documentation. There, way down, we find that for 4k pages the row size is limited to 4005 bytes, for 8k to 8101 bytes, for 16k to 16293 bytes, and for 32k to 32677 bytes. So, how can we insert more data into a row?

A new database configuration parameter, extendend_row_sz, has been added. For new databases the default is ENABLE, for migrated databases DISABLE (for backward compatibility). I created a new database and the extended row size support is enabled. Let's give it a try:

db2 => connect to hl

   Database Connection Information

 Database server        = DB2/LINUX 10.5.0
 SQL authorization ID   = HLOESER
 Local database alias   = HL

db2 => create table ers(id int unique not null, firstname varchar(3000), lastname varchar(3000))
DB20000I  The SQL command completed successfully.
db2 => update db cfg using extended_row_sz disable
DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully.
db2 => create table ers2(id int unique not null, firstname varchar(3000), lastname varchar(3000))
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL0286N  A table space could not be found with a page size of at least "8192"
that authorization ID "HLOESER" is authorized to use.  SQLSTATE=42727
db2 => update db cfg using extended_row_sz enable
DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully.
db2 => create table ers2(id int unique not null, firstname varchar(3000), lastname varchar(3000))
DB20000I  The SQL command completed successfully.
db2 => insert into ers values(1,'Henrik','Loeser')
DB20000I  The SQL command completed successfully.


The database configuration parameter can be changed online as can be seen above. Creation of the table ERS2 is blocked with extended row sizes DISABLED, then it works after enabling it again. Inserting some simple test data into the table succeeds. To test this new feature further, I created a data file with 3 rows (ID is 2 to 4). The reason for this type of test is the size limitation of the DB2 CLP and trying to avoid a GUI. The first row has two long strings with about 3000 bytes each, the second row a short and a long string, the last row a long and a short string. Importing the data is no problem:

db2 => import from "data.txt" of del insert into ers
SQL3109N  The utility is beginning to load data from file "data.txt".

SQL3110N  The utility has completed processing.  "3" rows were read from the
input file.

SQL3221W  ...Begin COMMIT WORK. Input Record Count = "3".

SQL3222W  ...COMMIT of any database changes was successful.

SQL3149N  "3" rows were processed from the input file.  "3" rows were
successfully inserted into the table.  "0" rows were rejected.


Number of rows read         = 3
Number of rows skipped      = 0
Number of rows inserted     = 3
Number of rows updated      = 0
Number of rows rejected     = 0
Number of rows committed    = 3


So how can we find out more about how this feature works? The documentation says that some data might be stored outside the row as LOB if the data is too big for the row. I am going to use db2dart for that. First I obtain the TABLE ID and TABLESPACE ID of our test table ERS:

db2 => select tableid,tbspaceid from syscat.tables where TABNAME='ERS'

TABLEID TBSPACEID
------- ---------
      8         2

  1 record(s) selected.


The next step is to invoke db2dart from the shell for the database "hl":
hloeser@rotach:~$ db2dart hl /DD

Please enter
Table ID or name, tablespace ID, first page, num of pages, and y/n for verbose:
(may suffix page number with 'p' for pool relative if working with a pool-relative tablespace)
8 2 0 80 y

         The requested DB2DART processing has completed successfully!
                    Complete DB2DART report found in: HL.RPT


The "/DD" option direct db2dart to dump data pages. It prompts us to enter the IDs for the table and tablespace (which we know), the first page to dump (zero), the number of pages to dump (80), and whether verbose mode is preferred (yes). Based on our input a report file "HL.RPT" is generated. After some general information, free space control records, and a table description record, the first actual data records can be seen:
            Slot 4:

               Offset Location = 2850  (xB22)
               Record Length = 34  (x22)

               Record Type = Table Data Record (FIXEDVAR)

               Record Flags = 0

               Fixed part length value = 14

                  Column 1:
            Fixed offset: 0
                  Type is Long Integer
                  Value = 1

                  Column 2:
            Fixed offset: 4
                  Type is Variable Length Character String
                  Length = 6 Offset = 14
                      48656E72 696B                          Henrik         

                  Column 3:
            Fixed offset: 9
                  Type is Variable Length Character String
                  Length = 6 Offset = 20
                      4C6F6573 6572                          Loeser         


         Slots Summary:  Total=5,  In-use=5,  Deleted=0.

This is our first row (ID 1) that we inserted, everything is stored in a single record, as expected and as usual. Reading on in the file, the other rows appear. I won't print them here for size reasons, but here is how they are stored.
  • ID 2, two long strings: first string in the record, second string outside the record
  • ID 3,  short string, long string: both strings within a data page, single record because they fit
  • ID 4, long string and short string: both strings within a data page, single record because they fit
The column value for row with ID2 is stored outside the record as LOB. The db2dart output looks like this:

                  Column 3:
            Fixed offset: 9
                  Type is Variable Length Character String
                  Length = 24 Offset = 3020
    
                  Var Descriptor Length = 24
                     lfd_check = 89
                     lfd_version = 0
                     lfd_life_lsn = 0000000000064942
                     lfd_size = 3024
                     lfd_first = 2
                     lfd_dir = 0

         Slots Summary:  Total=1,  In-use=1,  Deleted=0.

What can also be seen is that only a single slot is in use on this data page because of the record size.

I hope you found these details interesting and it encourages you to use db2dart to look into storage structures. When I visit universities, I always tell students to dig deeper into the implementation of database systems...