Monday, March 25, 2013

Passing string sequences to SQL/XML functions: Tokenize to the rescue

This morning I answered an interesting DB2 XQuery/SQL question about parameter passing. The core issue was related to the different data models and bridging them when data is passed from the SQL world to XQuery or back. After checking this blog's archive I found that I hadn't written about this common issue before. So let's start by showing you what worked and what not:

create table tok(id int, doc xml);

insert into tok values(1,'<a><b>IBM</b></a>');
insert into tok values(2,'<a><b>BLA</b></a>');


First I create table and insert two simple documents. Now we are ready to query the table:

select * from tok
where xmlexists('$DOC/a[b = ("IBM","whatever")]');
 

ID          DOC ----------- -------------------------------
          1  <a><b>IBM</b></a>


   1 record(s) selected.


select * from tok where xmlexists('$DOC/a[b = $para]' passing cast('IBM,whatever' as varchar(200)) as "para");

ID          DOC ----------- -------------------------------
 

   0 record(s) selected.

In the first query above we directly evalute a comparison against a list of string values (IBM, BLA). It matches the first XML document. The second query passes the same comma-separated value in, but none of the documents is found. What happened?

In XQuery the comparisons above are called "general comparison" (there is also value comparison and node comparison). The general comparison returns true above if some value in the first sequence is equal to some value in the second sequence. The important piece of information is sequence. In the first query, the '("IBM", "BLA")' constructs a sequence with two string values as items . The "(" and ")" are used as sequence constructor. In the second query, a single string value is passed in. There are probably different ways of composing the string, but it remains a string.



What can be done? For the example above XQuery and XPath have the function fn:tokenize(). That function splits up a string into a sequence of substrings. Thus, we would have a sequence again that could be used in the comparison. Let's see whether it works:



select * from tok where xmlexists('$DOC/a[b = fn:tokenize($para,",")]' passing cast('IBM,whatever' as varchar(200)) as "para");

ID          DOC ----------- -------------------------------
          1  <a><b>IBM</b></a>


   1 record(s) selected.

As it can be seen, the tokenize helps to split up the parameter into a sequence and the query result is as expected again.