Wednesday, August 26, 2020

Db2 and S3-compatible Cloud Object Storage

A look at Db2 and S3 storage
A look at Db2 and S3 storage

Recently, I was contacted about an old blog post where I discussed how to access Cloud Object Storage from Db2. Since writing that article, both Db2 and (IBM) Cloud Object Storage have evolved. Thus, it is time for an update on how to backup data to cloud storage, load from external storage and even directly query data somewhere in the Internet...

S3 API

Originally, S3 was just the shortname of Amazon (Web Services') Simple Storage Service. Because of its wide spread, its S3 API was adopted by many other storage providers. This includes IBM Cloud Object Storage.

Earlier, IBM was offering an OpenStack Swift-based cloud storage that was part of its Softlayer acquisition.
 

Db2 and S3 API-based storage

Db2 integrates with S3-based storage in different areas. One way to access S3 storage is to use the CATALOG STORAGE ACCESS command which I discussed in my old blog (see there for details). It allows commands like LOAD, INGEST, BACKUP and RESTORE to access files hosted on cloud object storage solutions.

Another option is the CREATE EXTERNAL TABLE statement. It allows to create a table which has its data not stored in the database, but with the data in a (remote) file. This includes S3- and SWIFT-based cloud storage. 
 
You can even dynamically reference such an external table without first creating it. See the external-table-reference definition as part of the FROM clause in subselects.

All what is needed to access the Cloud Object Storage (COS) using the S3 API is information on its endpoint (the URL), the right credentials and the bucket (folder) name. As beginner, obtaining the right credentials might be tricky. The reason is that the usual API key or username / password do not work. Special HMAC credentials are needed. But you can create them the same way you generate regular storage credentials, only an additional parameter is needed.
 

Db2 and S3 examples

How does it look like when accessing IBM Cloud Object Storage (COS) from Db2 when using the S3 API? Here are some examples to get you started:

Catalog a COS instance in Db2:

CATALOG STORAGE ACCESS alias mys3 VENDOR S3 SERVER s3.eu-de.cloud-object-storage.appdomain.cloud user 'f7foobarxxxcfeb4bxxx' password '22223333foobarfoobar22223333'" 

Or use a specific file in a COS bucket to define an external table:

CREATE EXTERNAL TABLE exttab2(a int) using (dataobject 'mydatafile.dat' s3('s3.eu-de.cloud-object-storage.appdomain.cloud', 'f7foobarxxxcfeb4bxxx', '22223333foobarfoobar22223333', 'mybucket' ) ) 

Or reference a file hosted in a COS bucket to insert its data into a database table:

INSERT INTO mytable SELECT * FROM EXTERNAL 'mydata.txt' USING (CCSID 1208 s3('s3.eu-de.cloud-object-storage.appdomain.cloud', 'f7foobarxxxcfeb4bxxx', '22223333foobarfoobar22223333', 'myotherbucket')) 

The above links and examples should get you over any bumps and should let you make use of cloud storage, as additional backup location (disaster recovery) or to easily integrate external data.

If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn.