Tuesday, September 22, 2009

XMLTABLE - The all-in-one function?! (Part 1, Syntax)

What can produce a relational table out of XML data or a sequence of XML fragments? What can be used to shred (or since the Enron scandal "decompose") data simply by using SQL when ingesting data into a warehouse? What can serve relational applications while managing XML data? Of course I am talking about the XMLTABLE function that is part of the SQL standard and the DB2 pureXML feature.

I plan to post a couple of entries about this very versatile function, hence the "Part 1" in the title. Today, I start with a focus on the syntax for typical usage scenarios.

At first sight the XMLTABLE syntax looks mostly straight-forward:
XMLTABLE "(" [namespace declaration ","] row-definition-XQuery [passing-clause] [COLUMNS column-definitions] ")"

Basically, you could first optionally declare some global namespaces (more later), then comes an XQuery expression similar to those in XMLQUERY and XMLEXISTS to define the row context, then the optional, but familiar PASSING clause and finally the COLUMN definitions similar to a CREATE TABLE statement.

There are usually different ways of writing a (X)query. For the XMLTABLE function, the XQuery clause needs some consideration because it defines the row context, i.e., what part of the XML document is available (and is iterated over) for the column values when each row of the resultset is produced. In some examples in future parts I will show the impact of the XQuery expressions.

The PASSING clause is optional because you could work with constants in your XQuery (not very likely) or use column names to reference the data (e.g., "$DOC" for the DOC column). In many cases you will want to use the PASSING clause to utilize parameter markers, e.g., when directly ingesting application data.

The (optional) column definition is similar to a simple CREATE TABLE statement. You specify the column name and its type (e.g., NAME VARCHAR(20)). After the type comes the most interesting part, the keyword "PATH" followed by a string literal that is interpreted as XQuery expression. Within that XQuery the context (".") refers to that set in the row context (see above). If you would iterate over employees in a department, you could then simply refer to the employees' first- and lastname like shown:

SELECT t.* FROM dept, XMLTABLE('$DEPT/dept/emp' COLUMNS first VARCHAR(20) PATH './first', last VARCHAR(20) PATH './last') as t

Note that for columns all types are supported which are supported by XMLCAST. The reason is that behind the covers XMLCAST is called to map the value identified by the column-related XQuery to the relational column value.

Earlier I mentioned that global namespaces could be declared. Imagine that the department documents all have a default namespace "foo" (e.g., "<dept xmlns="foo"><emp>..."). In order to properly navigate within the documents your query would need to look like shown:

select x.* from dep,xmltable('declare default element namespace "foo";$DOC/dept/emp' COLUMNS first VARCHAR(20) PATH 'declare default element namespace "foo";./first', last VARCHAR(20) PATH 'declare default element namespace "foo";./last') as x

All the different XQueries would need to declare the namespace "foo". To make our lifes simpler, the SQL standard allows to globally declare the namespace using the XMLNAMESPACES function (which usually is used for publishing purposes):

select x.* from dep,xmltable(XMLNAMESPACES(default 'foo'),'$DOC/dept/emp' COLUMNS first VARCHAR(20) PATH './first', last VARCHAR(20) PATH './last') as x

The namespace is declared only once, the statement looks much cleaner and is simpler to write.

That's it for today as an introduction. Please let me know if you have questions on XMLTABLE that you would like to have answered in a future post.