Tuesday, September 12, 2017

Db2 with External Tables: First Tests

External Tables - New in Db2
Db2 Warehouse recently added a new feature named external table. Because I was interested in that feature for a while and I have an instance of Db2 Warehouse on Cloud running in the IBM Cloud datacenter in Frankfurt, Germany, I performed some quick tests. Here is what it is and how I got it to work.


Overview: External Tables

According to the Db2 documentation for external tables, it is a table that resides in a text-based file outside the Db2 database. The external table can be used to
  • store data outside the database and still be able to query it,
  • to easily load data from outside and apply some additional features like casts or joins with other data or omit columns,
  • share and transfer data to applications.
A table can either reside in Cloud Object Storage with Swift or S3 API or in the local file system or in some other location determined by ODBC / JDBC URIs. So in short, the data is outside of Db2 but can be accessed, in some ways, through SQL.

External Table with Cloud Object Storage

For my first test I put a small delimited file testdata.csv up into my Cloud Object Storage with Swift API. Then, I created the external table exttabl1 pointing to that file:


CREATE EXTERNAL TABLE exttab1(a int,s varchar(50)) using
  (dataobject 'testdata.csv'
    DELIMITER ','
   swift('https://fra02.objectstorage.softlayer.net/auth/v1.0/',
    'IBMOS12345:hloeser',
    '5b1dmypassword983',
    'henrik'
   )
 );



The above is similar to a regular CREATE TABLE. It has the column definitions followed by USING with some options. Via DATAOBJECT I tell Db2 to use my test file. The option DELIMITER is needed because the default delimiter is '|' (the vertical bar). I ran into some errors with my truly comma-separated values.
The last option, SWIFT, has several parts to provide the access URI as well as username, password and the storage container / bucket. To utilize S3 storage, replace "swift" by "s3" and provide similar credentials. It is easy. Once the table is created, it can be queried:


select * from exttab1;

Business or SQL as usual. It is not possible to update, delete or truncate an external table. However, I was able to insert into it and thereby replacing the entire file content. Thus, use it with caution.

When the external table is accessed, Db2 provides diagnostic information in log files. The files can be placed via the LOGDIR or ERROR_LOG option. By default the log output is written into the same location as the source file for the external table. That additional I/O and data might be unexpected to many (I was surprised, too). It should be placed somewhere else and cleaned up on a regular basis.

Conclusions

External tables provide a simple, SQL-based mechanism to interface with external data, both to incorporate it into the Db2 database or to export it for other use. Storage options include "the cloud" with the popular Swift and S3 APIs. I am going to test external tables further and so should you...

If you want to learn more about that topic or discuss the blog with me, then meet me at the upcoming IDUG EMEA and Db2 Aktuell conferences. If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn.