Friday, July 1, 2016

Store and Query XML Data with dashDB on Bluemix

XML Column in dashDB
I recently got asked whether it is possible to process XML data with dashDB on IBM Bluemix. The answer to that is that it is possible. dashDB is based on DB2 with its industry-leading pureXML support which I wrote many blog entries about. In the following I give you a quick start into preparing dashDB to store XML data and how to query it.

If you are using the regular dashDB service plans which are tailored to analytics, then by default all tables use columnar storage. That format provides deep compression and high performance query processing capabilities for analytic environments, but it is not suited for natively storing XML data. That is the reason why tables need to be created by explicitly stating ORAGNIZE BY ROW in the "Run SQL" dialog (see screenshot above):

CREATE TABLE myTable(id INT, doc XML) ORGANIZE BY ROW

The above statement creates the table "myTable" with two columns, the second of type XML, and in the classic row-oriented table format.

SQL/XML Query with dashDB
Once the table is created, data can be inserted. This can be done by using INSERT statements in the "Run SQL" dialog or by connecting other tools to dashDB. The "Load Hub" is designed for analytic data sets and does not support XML-typed columns. An introduction to inserting XML data can be found in the pureXML tutorial in the DB2 documentation.
After the XML data is in, the "Run SQL" dialog can be used again to query the documents. Queries can be either in SQL (SQL/XML) or in XQuery, see the screenshots with examples.

I hope that gives you a rough idea how to utilize the pureXML feature in dashDB, even though its main focus is analytics.
XQuery with dashDB