Wednesday, August 28, 2024

A look at local external tables in Db2

Surprising Db2 results or not?
My recent post about Db2 backup to S3 or Object Storage triggered a question. And the answer to it is that I wrote about external tables first in 2017, about 7 years ago. Since then, some features were added or changed for external tables and remote storage support, most recently in Db2 11.5.7 (AFAIK). With my new Db2 playground in place, I thought it was nice to test external tables again.

External tables

Relational databases like Db2 use tables to store the actual user data. An external table is not stored in the database itself, but somewhere outside. The Db2 documentation for CREATE EXTERNAL TABLE states:

While tables typically reside in a database, an external table resides in a text-based, delimited file, or in a fixed-length-format file outside of a database.

The above means that Db2 restricts where and how such external table data can be stored. It is also noteworthy that an external table is not related to the concept of federation or federated data sources.

External tables are used to load or to unload data, that is either reading data from a file or writing to it. It means that an external table only supports SELECT and INSERT statements, but no DELETE, TRUNCATE, or UPDATE.

External tables in Db2 can reside either in the local file system or on remote storage. An external table can be created using a simple statement like the following (which I used for the quiz in the above screenshot):

create external table quiz(id int, n varchar(30)) using (file_name 'quiz_table.csv')

It defines a table name "quiz" with two columns and looks for the data in the file "quiz_table.csv". I could create the same table using this slightly different options:

create external table quiz(id int, n varchar(30)) using (dataobject 'quiz_table.csv')

The FILE_NAME parameter refers to a local file and is the same as using DATAOBJECT with the option REMOTESOURCE set to "LOCAL", the default value. There is even an alternative syntax for CREATE EXTERNAL TABLE if you only want to write to the file.

Because external tables neither support update nor delete operations, the following is by design, but surprising at first:

db2 => select * from quiz

ID          N                             
----------- ------------------------------
          1 Hello world                   

  1 record(s) selected.

db2 => insert into quiz values(2,'Surprise')
DB20000I  The SQL command completed successfully.
db2 => select * from quiz

ID          N                             
----------- ------------------------------
          2 Surprise                      

  1 record(s) selected.


The table has data (select), then you insert into that table and select again. Only the inserted data is shown. The reason is that Db2 unloaded data to the file by replacing its content. The documentation states that for "unload" (insert) the file will be overwritten if it exists. This happened in the scenario above.

Configuration and security

Typically, you have to specify the full path to the data file, e.g., '/data/files/quiz_table.csv'. However, depending on the combination (!) of the two database configuration parameters extbl_location and extbl_strict_io, it is possible to only use relative paths or just file names. The configuration extbl_strict_io determines how extbl_location can be used. 

If the default value of NO is set, the location configuration - if set - can be a list of multiple local paths. The absolute path specified in FILE_NAME / DATAOBJECT must then be found in the list. The default value for extbl_location is the Db2 instance install path.

If YES is set for extbl_strict_io, then the location configuration can only be a single path. That path is interpreted as base, then the authorization ID of the table definer is added, then the file name or relative path provided in the CREATE EXTERNAL TABLE statement. For my example above, I set extbl_strict_io to YES, set extbl_location to '/data/ext', created a subdirectory 'db2inst1', and created a CSV file with a row.

If something goes wrong or does not fit for your configuration, you and other users might see this error message SQL20569N:

SQL20569N  The external table operation failed due to a problem with the corresponding data file or diagnostic files. File name:
"/data/ext/db2inst1/quiz_table.csv". Reason code: "1".  SQLSTATE=428IB

Catalog metadata

In the Db2 catalog, you can notice external tables by looking at the value of the PROPERTY column in SYSCAT.TABLES:

select substr(property,27,1) from syscat.tables where tabname='QUIZ'

External tables have this flag set and TYPE is "T" (table). Further, some additional configuration options are stored in SYSCAT.EXTERNALTABLEOPTIONS.

Conclusions

That's it for today. If you have feedback, suggestions, or questions about this post, please reach out to me on Mastodon (@data_henrik@mastodon.social) or LinkedIn.