Data Henrik: July 2009

Wednesday, July 29, 2009

Wikibooks, XQuery, and Sudoku

You probably have heard about all three, Wikibooks, XQuery, and Sudoku, but maybe not in the same context.

Wikibooks is a wiki-based collection of free educational text books. One of the books is targeted at providing XQuery examples. Even if the system used for most of the examples is not (yet) DB2, it is worth a look. How come Sudoku puzzles into the big picture? Well, one of the chapters in the XQuery book deals with "Puzzles, Maths and Games" and one example shown is how to solve Sudoku puzzles using XQuery.

As I said, the examples are not based on DB2, but it should be possible to port the functions and query to DB2 (ping me for ideas). If you are teaching, writing something similar (from scratch) could be a nice student project.

Tuesday, July 28, 2009

Did you get it? Setting up analytic systems is simpler now

Some other interesting announcements today were about IBM acquiring SPSS as well as the IBM Smart Analytics System. The latter is a combination of IBM software and hardware, consisting of a foundation and add-on modules for different target areas.

The IBM Smart Analytics System delivers a comprehensive and flexible set of capabilities, including business intelligence reporting, analysis, dashboards, and scorecards, data mining, cubing services, text analytics, data warehouse management, storage and server platform. It can also be used by clients or cloud service providers to deliver the power of analytics to private or public cloud services.

Some more information is on the IBM Smart Analytics System product page.

Is it real? A virtual conference again

Today I received an invitation for IBM's next "Data In Action" virtual conference on August 19th:

This conference will feature effective data management solutions for smarter business outcomes. Hear from Donald Feinberg, Vice President and Distinguished Analyst from Gartner, about the state of the DBMS software market and data management challenges facing businesses today. The topics that Mr. Feinberg will examine include; the problem of exploding data volumes, new computing paradigms for handling this explosion, data management cost reduction techniques and technologies, and the perspective of the DBA now and in the future.

Following Mr. Feinberg will be a moderated panel of customers and partners who are leveraging the unique and desirable features only available in DB2 9.7.

In the invitation I got it says that the first 97 attendees will receive a complimentary memory stick. Sounds like a real conference with goodies. Hey, wait - it says IBMers are not eligible for the memory stick. Dang - reality is back.

Monday, July 27, 2009

Travel: Using humor to cope with stress

Last Friday evening I was on the way home using a regional express train from Stuttgart to Friedrichshafen (Germany). I was happy to catch it and make the connection because another train I had been on had been running late. About 45 minutes into the ride our train had to stop in the nowhere and the following announcement came over the PA: "Because of persons on the track, continuation will be delayed for an unknown time." Often "person on the track" means suicide and in the worst case it can take hours to reopen the tracks.

Many people got upset about the situation, but I decided to take the delay with humor, seeing it as positive that at least mobile Internet was working where the train was halted and that I still had some food, water, and battery life left. I was reminded about a "humor and stress" class IBM had payed for me a couple years back. The facilitator was Tim Gard, known for his travel stories and chicken feet. Enjoy.

BTW: The train continued after about 60 min. I made it home safely with my emails done.

Thursday, July 23, 2009

Beware Be aware of the invisible!

Searching the Internet Movie Database, IMDb, for titles with the word "invisible, it seems that fighting the invisible has been something fascinating for decades. XML and XQuery are fascinating, too, but I am not aware of any movies featuring them in the title (maybe still too early?). Anyway, what I am up to today is to show you an interesting XML/XQuery technicality which many beginners are struggling with - and it's caused by something invisible!

Let's start by creating a simple table and inserting an XML document.


create table foo(id int,doc xml);

insert into foo values(1,'<test>hello</test>');
DB20000I  The SQL command completed successfully.

The insert statement succeeds as expected. The string is implicitly parsed as XML document because the target column is of type XML. We already know of SQL/XML publishing functions like XMLELEMENT and XMLATTRIBUTES, so let's try to construct the same document and insert it again.


insert into foo values(2,xmlelement(name "test",'hello'));
------------------------------------------------------------------------------
insert into foo values(2,xmlelement(name "test",'hello'))
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL20345N  The XML value is not a well-formed document with a single root
element.  SQLSTATE=2200L

Surprisingly (?) the insert statement fails. DB2 is complaining that it is not a well-formed document with a single root element. What happened? We only have one element, why the error message? The answer is that DB2 is expecting a document node as specified in the XQuery and XPath Data Model (XDM). The latter is the standard which XQuery and DB2 pureXML are built on.

When we inserted the XML document as a string, the XML parser automatically (implicitly) inserted a document node for us because that node is kind of invisible when looking at XML strings. However, when we build or construct an XML document from ground up using the publishing functions, we have to explicitly create an document node. The SQL/XML standard has a special functions for it: XMLDOCUMENT. Let's try it again.


insert into foo values(2,xmldocument(xmlelement(name "test",'hello')))
DB20000I  The SQL command completed successfully.

The XQuery language has several ways of constructing XML documents, too. The easiest is to directly provide a string with tags as shown next. Wrapping it into the XMLQUERY function, we try to insert the generated XML:


insert into foo values(3,xmlquery('<test>hello</test>'))
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL20345N  The XML value is not a well-formed document with a single root
element.  SQLSTATE=2200L

Not suprisingly (this time!) it fails again and there are two ways to fix it. One is to use the document node constructor provided by the XQuery standard, the other one to use XMLDOCUMENT() as before.


insert into foo values(3,xmlquery('document{<test>hello</test>}'))
DB20000I  The SQL command completed successfully.

insert into foo values(4,xmldocument(xmlquery('<test>hello</test>')))
DB20000I  The SQL command completed successfully.

Fighting the invisble is not as scary as it seems knowing how and a movie like "The invisible node" probably wouldn't draw a big crowd. Moreover, the document node is not entirely invisible if you know how to make it visible (any parallels to the movies?). Showing you how is a future topic... (creating opportunities for a sequel here :)

Monday, July 20, 2009

Relational vs. pureXML: Some indexing differences

You probably know how relational indexing works. When you create an index on a column, then for every row in the table there is an index entry. There is a 1:1 relationship between the rows and the index entries.

In the XML world things can be quite different. Let's imagine that we have XML documents where as part of other personal information the names of our siblings are listed ("CREATE TABLE persons(id INT, doc XML)"). The XML fragment could like the following for a single sibling (2 kids for the parents):

<siblings>
  <name>Sarah</name>
</siblings>

If we would like to index the sibling names, we could do it in DB2 the following way:

CREATE INDEX sibNamesIdx ON persons(doc)
GENERATE KEY USING XMLPATTERN
'/siblings/name' as SQL VARCHAR(40)

Because we have one sibling in the above example, there would be a single entry in the index - the same as in relational. But what happens if you are an only child (<siblings/>)? Or what if you have multiple brothers and sisters?


<siblings>
  <name>Sam</name>
   <name>Sarah</name>
    <name>Sean</name>
  <name>Siegfried</name>
  <name>Simon</name>
  <name>Susie</name>
  <name>Sydney</name>
</siblings>

For an only child there wouldn't be a sibling name and hence no index entry. For multiple siblings there would be an index entry each, resulting in multiple index entries for a single row. This is quite different from the relational world.

What may look strange or at least unusual to you has some benefits to you. In the relational world you would need to model such data with joined-in side tables or with sparse columns. In the latter case you would manage NULL values in your index. In the XML world you only have data that is really present represented in your indexes, thus keeping them small.

Thursday, July 16, 2009

Fun with databases: How cereals turn into silverware

This morning my wife came back from grocery shopping. Among other things, she brought back a new organic cereal mix that just had started to hit the shelves. My wife was happy because she got it for free - more or less.

While paying at the cash register, my wife checked the receipt because the amount seemed to be higher than expected. Then she noticed an item on the list that wasn't in her cart: A 24 piece silverware set. What was in her cart, but not on the receipt was the new cereal mix. After some phone calls by the store clerk to the supervisor they together found out that both the silverware and cereals were in the product database, but somehow the system mixed it up. Was the barcode on the product incorrect? Were both products listed with the same barcode? Was it a database inconsistency or a system glitch?

My wife got the cereals for free as token of appreciation for waiting and the trouble and left some puzzled store employees behind.

Did you run into similar situations? Any ideas how this could have happened?

Tuesday, July 14, 2009

pureXML for your well-being!

The new IBM Data Management magazine is out and one of the articles is about how DB2 pureXML is helping to improve the healthcare system. Next time you visit a doctor's office, ask them how they manage your patient records...

Job vs. Life: The importance of tools and ROI

When you have a new home or a new garden - or both - one of your spare time jobs is to install things, assemble furniture, remove weeds, plant and water, etc. Usually you have a choice of really manually performing a tasks, e.g., using a regular screwdriver, or trying to simplify and speed up what you want to accomplish, e.g., using a power screwdriver. For most tasks you have the choice of using mostly already present tools or investing into new power tools or similar stuff. The questions are always, how much time and effort can I save, how often will I use the power tool again (and sometimes even: if I buy the tool, will my significant other actually do it instead of me)?

Often, the decision is to buy a tool and to be happy about the simplifications and the saved time. The actual problem is to select the right brand and version of tool.

When you are running database applications the situation is similar. There are many tasks that can be performed by a skilled person without tools or just with whatever the core software (OS, DBMS, JDK, etc.) offers. However, often an additional tool can help to save time and improve the quality of what needed to be done. Similar questions as above for the house and garden need to be answered, only thereafter it can become more confusing when shopping for the right tool [Did you happen to select a power screwdriver that was renamed/rebranded a couple times?]. IBM offers a wealth of tools, sometimes for every problem even two, ranging from application development over data life cycle management and data encryption to performance monitoring.

Going back to house and garden, I usually buy tools from the same brand/company because they have the same "look and feel", once I dealt with one tool, it's quicker to learn how to handle the next. BTW: That's the same principle behind the Optim suite of tools for Integrated Data Management.

Monday, July 13, 2009

Energy Consumption in our Passive House / Jahresstromverbrauch in unserem Passivhaus

[Fuer die deutsche Fassung bitte weiter unten schauen]

Today a year ago we could start with our apples to apples energy metering in our passive house, a low energy house. This morning I went down to our basement to read out the meters, something I typically try to do once a week. So how much electrical energy did we consume over the last 365 days? A total of 5026 kWh. This is less than anticipated. Let me try to put this into perspective and to give you more details.

Our house has ~150 square meters (~1620 sq. ft.) in two levels plus an unheated basement of almost 100 square meters. We are 2 adults and two kids, I am mostly working from home and my wife is domestic engineer (a.k.a. homemaker or stay-at-home mom).

We have two power meters, one for the heating/ventilation system and hot water, another one for the regular household (fridge, freezer, washer, dryer, computers, phones, oven, dishwasher, lights, ...). What is surprising to us is how much we still spend on heating compared to the household consumption:

Household consumption: 2244 kWh

Heating/ventilation/water: 2782 kWh

However, after analyzing the fact deeper, we realized that we only have new A+ or A++-rated household appliances and mostly use energy saving bulbs. Moreover, the computers do not use much energy even though they are often running 12+ hours.

For the heating/water part this is within the expected range. From April to October a solar panel is heating up the water. During the winter we try to keep the inside temperature around 22 degrees Celsius (72 degrees Fahrenheit).

I tried to find comparison data and according to some German sources a similar household would spend around 4000-5000 kWh just on the household part. Heating ranges from 8000-25000 kWh alone.

Jahresstromverbrauch in unserem Passivhaus
Vor einem Jahr konnten wir mit unserem Stromverbrauchsmessungen in unserem Passivhaus beginnen, vorher standen noch Bautrockner im Keller. Ueber das letzte Jahr hinweg haben wir 5026 kWh an Strom verbraucht. Der Verbrauch teilt sich auf 2 Zaehler/Bereich auf:

Haushaltsstrom: 2244 kWh
Heizung/Lueftung/Warmwasser: 2782 kWh

Beim Haushaltsstrom zahlt sich aus, dass wir nach unserem Umzug aus den USA nach Deutschland die meisten Elektrogeraete neu kaufen mussten und somit fast ausschliesslich A+ oder A++-Geraete haben. Der Haushaltsstrom umfasst alles von den Computern (Heimbuero) ueber Telefone bis zu den ueblichen Verbrauchern wie Kuehlschrank, Gefrierschrank, Waschmaschine, Trockner, Backofen, Herd und Spuelmaschine.

Der Heizstrom ist fuer die permanente Lueftung, die Kleinstwaermepumpe und Zusatzheizer sowie die Solarpumpe. Unser Haus hat 150 Quadratmeter an Wohnflaeche, die im Winter auf ca. 22 Grad Celsius gehalten werden.

Setzt man den Gesamtverbrauch in Perspektive zu den Durchschnittswerten in Deutschland, so zeigen sich die Einsparungen durch ein Passivhaus und durch Energiespargeraete. Wir haben neu gebaut und mussten umzugsbedingt auch neue Geraete anschaffen. Ob sich zusaetzliche Daemmung bzw. die Abloesung von Energiefressern durch moderne Geraete lohnen, muss jeder einzelfallbezogen entscheiden. Unsere Werte zeigen jedoch das Potenzial.

Wednesday, July 8, 2009

Oracle + TPoX = no results?

Conor has a nice post about Oracle and their lack of published TPoX benchmark results. Oracle has published all kinds of benchmark results (like those from TPC), but not for the XML benchmark "TPoX".
A while ago I had already "commented" about Oracle and their XML claims and lack of results. There is also a post about TPoX to give you some more background information.

Tuesday, July 7, 2009

Step back: I am a DBA! (or DB2 developer or IM specialist)

Yesterday, I read a news article about an airline announcing an expected delay of 8 hours because of a mechanical problem (on an island in the Mediterranean). They needed to fly in a mechanic. Then a passenger stepped forward, identified himself as aircraft mechanic, and later fixed the problem.

There are situations, when all kinds of people/occupations are sought ("We need a medic!"), but when was the last time you heard: "Is someone here a DBA?" When would be such a situation?

Friday, July 3, 2009

Lessons from the field: Simplicity wins - XMLGROUP, XMLROW, and XMLAGG

Simplicity and brevity win and that's why I have a longer post on keeping it short. :)

Programmers and engineers love sugar in all kinds of forms, shapes, and flavors - probably to keep the brain powered. Some of us especially like syntactic sugar as it helps to remove complexity and keeps us focused on the essential parts.

Some days ago while being with a customer I noticed a query like this (rewritten by me against the DB2 sample database):


with temp (columns) as 
  (select xmlelement(name "employee", xmlconcat(
            xmlelement(name "id", empno),
            xmlelement(name "first", firstnme),
            xmlelement(name "last", lastname)))
  from employee)
select xmlelement(name "all",xmlagg(columns)) from temp

Basically, all rows of a table (employee) should be returned within a single XML document (rooted with "all") like this:


<all>
 <employee>
    <id>some value</id>
  <first>some value</first>
  <last>some value</last>
 </employee>
 ...
</all>

The above, initial query already uses XMLAGG to aggregate the fragments coming from the individual rows into the big document - which is good. However, it uses a common table expression (the WITH expression) to first build the row-based XML fragments, then later combines them into the result document. This unnecessarily complicates the query. It can be rewritten simply into:


select xmlelement(name "all",
        xmlagg(xmlelement(name "employee",
          xmlconcat(
             xmlelement(name "id", empno),
            xmlelement(name "first", firstnme),
            xmlelement(name "last", lastname)))))
from employee

It is simpler now, but the XML construction still takes up some thoughts as it uses nested calls to XMLELEMENT and XMLCONCAT to build the row fragments. The latter can be accomplished shorter and more elegantly by using either XMLROW, or even better XMLGROUP:


select xmlelement(name "all",
        xmlagg(
          xmlrow(empno as "id",firstnme as "first",lastname as "last" option row "employee")))
from employee

select xmlgroup(empno as "id",firstnme as "first",lastname as "last" option row "employee" root "all")
from employee

XMLROW allows to very elegantly construct an XML fragment from row data, XMLGROUP to aggregate such or other fragments. XMLROW and XMLGROUP are simpler to use than the XMLELEMENT-based queries, they are easier on the eye, and what is best, should result in better performance than the other variants.

Let's take a look at the optimized query text for the initial query and for the XMLGROUP-based:
Original:


SELECT $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
  $INTERNAL_QNAME$(NULL, NULL, all), $INTERNAL_XMLNAMESPACES$((''),
  ('')), Q4.$C0)
FROM
(SELECT XMLAGG(Q3.$C0)
FROM
(SELECT Q2.$C0
FROM
(SELECT $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
  $INTERNAL_QNAME$(NULL, NULL, employee), $INTERNAL_XMLNAMESPACES$((''),
  ('')), XMLCONCAT(arity, $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
  $INTERNAL_QNAME$(NULL, NULL, id), $INTERNAL_XMLNAMESPACES$((''),
  ('')), $INTERNAL_SQLTOXML$(Q1.EMPNO)), $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
  $INTERNAL_QNAME$(NULL, NULL, first), $INTERNAL_XMLNAMESPACES$((''),
  ('')), $INTERNAL_SQLTOXML$(Q1.FIRSTNME)), $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
  $INTERNAL_QNAME$(NULL, NULL, last), $INTERNAL_XMLNAMESPACES$((''),
  ('')), $INTERNAL_SQLTOXML$(Q1.LASTNAME))))
FROM HLOESER.EMPLOYEE AS Q1) AS Q2) AS Q3) AS Q4

The XMLGROUP-based query:


SELECT XMLDOCUMENT($INTERNAL_XMLELEMENT$(1, $INTERNAL_QNAME$(NULL,
  NULL, all), Q3.$C0))
FROM
(SELECT XMLAGG(Q2.$C0)
FROM
(SELECT $INTERNAL_XMLELEMENT$(1, $INTERNAL_QNAME$(NULL,
  NULL, employee), $INTERNAL_XMLELEMENT$(1, $INTERNAL_QNAME$(NULL,
  NULL, id), $INTERNAL_SQLTOXML$(Q1.EMPNO)), $INTERNAL_XMLELEMENT$(1,
  $INTERNAL_QNAME$(NULL, NULL, first), $INTERNAL_SQLTOXML$(Q1.FIRSTNME)),
  $INTERNAL_XMLELEMENT$(1, $INTERNAL_QNAME$(NULL,
  NULL, last), $INTERNAL_SQLTOXML$(Q1.LASTNAME)))
FROM HLOESER.EMPLOYEE AS Q1) AS Q2) AS Q3

The overall general access plan (not shown) is the same for both queries, however the optimized query text differs. What can be seen is fewer SELECT expressions and a simplified construction, resulting in less CPU cycles spent.

After all the words, let me recap: Simplicity and brevity win...

Wednesday, July 1, 2009

Two essential links: IBM Data Studio and a free book

With a new version like DB2 9.7 there are other things you need or want, too. It's like with a house. You first remodel or upgrade something, then you need matching colors and hence buy new curtains, replace the carpet, try to get rid of some relatives (just kidding). Anyway, the point is that there are usually some added, often hidden costs or other issues.

The good news is that it is different and even free with DB2. Because you probably already got your free copy of DB2 Express-C 9.7, here are two essential links to make your database server remodeling complete (no relatives involved): There is a upgraded and free IBM Data Studio to take care of DB2 administration (and it has lots more). Finally, the free book "Getting Started with DB2 Express-C" has been updated to version 9.7 as well.

With all the bills coming in around the start of a new month, it is always good to see that there is so much good stuff freely available...

Pages