Friday, August 27, 2010

Yes or no: What is better for your data?

In two posts in the past, I explained how to say no to invalid data and how to find out what data was wrong and causing errors. With the REJECT INVALID VALUES clause during CREATE INDEX it is possible to reject XML documents that have values in the specified XPath location not compliant with the specified data type. Now the question I got is: What is better, to reject invalid values or to ignore them?

Well, let me answer with THE answer: "It depends." I will try to be more specific in a minute. Using XML is often for flexibility reasons and so you may not know what the future will hold. However, in most cases the XML schemas, i.e., the structure of the documents and the data types to expect, are known. And then you want to be sure the data matches what you expect. Because the data at the path specified in the CREATE INDEX statement needs to be casted to the target data type anyway, regardless of whether the value ends up in the index (valid) or not (invalid), the performance is the same. Moreover, with the approach to reject you can be sure that IF the path is present, its value is in the index. That leaves room for optimizations at runtime.

So my recommendation is that if you don't need to be as flexible as possible, then use the REJECT INVALID VALUES clause. If you should receive an error because of a casting issue, see my post on how to find out why that I mentioned above.