Data Henrik: 2009

Tuesday, December 22, 2009

Happy Holidays

With the last post this year, I wish you Happy Holidays, time for family and some rest. All the best for 2010, may all your transactions commit.

Henrik

Friday, December 18, 2009

Excuse me, what year do we have? - Some usability enhancements for XQuery in DB2

Imagine that in the XML world there is already 2010, but in your relational world it's still 2009. This or the reverse could happen if you directly use fn:current-date() in XQuery and the CURRENT DATE special register in SQL. The reason is that XQuery by definition operates in and returns UTC-based time and date values, e.g., whereas the SQL context in DB2 refers to the local timezone.

To work around possible issues, user could call fn:adjust-date-to-timezone() to convert a data (or analogously a time or dateTime value) to a specific, e.g., the local timezone. Well, the world got simpler just recently. DB2 9.7 FP1 introduced new (DB2-specific) functions to retrieve values in the local timezone directly, named current-local-date, current-local-time, current-local-dateTime, and local-timezone.

Having different years in the SQL and XQuery context might be a rare event, but the usability improvements are very handy - not just close to year end.

XML Processing on z/OS

A new Redbook has just been published that discusses XML Processing on z/OS. The book starts out with XML Essentials, so that even beginners to the XML world can make use of this book. After the introduction the different options of XML processing, ranging from COBOL, PL/I over CICS, IMS, and DB2 pureXML to Rational Developer and Java-based solutions are discussed in an overview section. The rest of the book then deals in detail with the different options and also includes a chapter about performance and cost considerations.

Thursday, December 17, 2009

Orwell, 1984, and Google

Forbes has an interesting article "When Google Runs Your Life". Basically, Google is on its way to impact (control? profile?) most of your private and corporate life. It has its reach into search, mapping and navigation, online and offline applications, into communication (both landlines and mobile). Through its browser and operating systems as well as DNS services it is also at the door to the Internet. To top that, by way of the AdSense and Analytics services there is a web of checkpoints plastering the Information Highway and more.

You talked with someone or left a message? You transcribed and analyzed it. You got emails? Analyzed and only the advertising got optimized. You surfed to some fancy places? Google already got the big picture of your personality.

In the article Eric Schmidt, Google's CEO, is cited as:

Privately, however, he has told friends to keep off a computer anything they want to keep private.

In several European countries there are already discussions about Google and compliance with privacy laws. It will be interesting to see how this moves forward.

Tuesday, December 15, 2009

Why? Motivation (based on fun)

Over the weekend I took a look at the Top Viral Video Ad Campaigns and noticed something by Volkswagen. It's not directly by them but by a project they sponsor which is called The Fun Theory. The idea is to introduce a fun factor to (at least for short term) change habits for the better. So far they have three videos and all are very interesting and funny.

By introducing something unusual into a common, daily setting they are creating curiosity and then fun. With that they are motivating people to change habits, maybe only once, maybe for longer.

Many people (I **not** included) come up with New Year's Resolution. Usually those resolutions are related to changing a habit for the good and people tend to fall back to their regular routine after a month or so. And in most cases they do so because one important ingredient is missing: Motivation (and fun).

Monday, December 14, 2009

Shock or Joy? Power Meters and the core of everything: Data

In many countries the end of year also is the time when power meters, or utility meters in general, are read out. When we lived in California the meters were manually read once a month and the monthly bill reflected the previous month's usage - with all its ups and downs. The feedback was more or less immediate which was good, but the amount to pay also significantly alternated between highs in Winter and Summer and lows in Spring and Fall.

In Germany, the meters are typically read once a year and the monthly amount is based on the averaged out estimate for the next year. You can plan ahead for the upcoming months in terms of payments, but for most people there is no direct feedback on their consumption. Our power meters will be read out over the next few days and I am happily expecting the annual statement sometime in January.

In some regions the Utilities have started deploying Smart Meters. In California, PG&E is changing to them, in Germany households are expected to move to them over time, too. I am still waiting for mine.

Why I am interested in it? First, direct feedback is valuable to improve or optimize consumption - monthly data is better than annual, daily or even immediate input better than monthly. Second, it is interesting to see what kind of data could be made available and in what format. My format of choice would be XML because of its flexibility and tooling support. Third, based on finer granularity, better pricing could be available (avoid the peak hours and save). Fourth, based on the data, you or a third-party company could analyze your consumption, compare with peer groups and look into finding ways to improve your monthly bill. Some companies already offer energy optimization services to enterprises and take a share of the savings.

Last, it would be fun to store XML-based energy data in DB2 pureXML and manage and analyze it myself.

What is at the core of all this is the data. Without its availability (in a usable format), there is no insight, no improvement (greener planet), no founding of new companies.

Friday, December 11, 2009

Friday Fun with Airlines

Yesterday, Adam's marketing thoughts were motivated by an old airline commercial and the feeling of coming home after a trip. That got me started last night because the feeling of "coming home" or "already feeling close to home" is what probably many travelers, including myself, share. I started to look for some funny videos that demonstrate my recent airline experiences. Here they are:

1) Flying has changed, it is safer today than it was, but the recent habit of charging for everything separately is annoying. Why not combine it?

2) The second video is about my experience as a tall person (198 cm / 6'6"). I have to crawl inside regional aircrafts. And in most others the overhead bins are a frequent hazard.

P.S.: Have a nice weekend to you all on DB2, Oracle, Informix, MySQL, MS SQL Server, PostgreSQL, Derby, and other database systems!

Thursday, December 10, 2009

Tell me what you are (or could be) - Admin functions for inlining

I recently wrote about LOB Inlining, a new feature in DB2 9.7. It allows to store LOBs together with the actual row data and thereby save space and in most cases improve performance. XML Inlining has been around since DB2 9.5 and proved very valuable for many customers. A common usability issue, and we can argue whether small or big, was that it was hard for XML data to estimate whether it got inlined or not. This was caused by the fact that when you insert XML data it is in text form and then is parsed and converted into the internal, native format. Thereby it can shrink or grow, depending on the data. No simple tooling was available to estimate whether it would be inlined or whether it actually got inlined.

Well, the XML parsing and the resulting size changes are still present in DB2 9.7 (we still have native XML support), but tooling that can be used for both XML and LOB data has been added in the form of two admin functions:

ADMIN_EST_INLINE_LENGTH works on both a XML or LOB column that is passed in as parameter. The return function returns either the estimated length of the column value for each of the column values or an indicator of whether the length would be too big or the estimate cannot be performed.
ADMIN_IS_INLINED basically returns either a 1 or 0 (yes/no) for each column value of whether it has been inlined or not.

Both functions are not meant to be run on your production system with millions of rows per table, but for either test environments or for [performance] problem diagnostics (think of a nicely written WHERE clause attached to filter down the number of rows). The functions help DBAs to determine a good inline length and also to verify whether values are then actually inlined as planned.

To make usage of the usability function even more usable (nice sentence, right?) a new sample program has been added that demonstrates how the two admin functions can be applied to XML and LOB columns.

Wednesday, December 9, 2009

Holiday Preparations - Your action required!

I recently wrote about how companies make use of their address data during the last weeks of the year. Today, I thought to join the **fun** of ongoing holiday activities and start my own giveaway/sweepstakes. I will be crowning my "Reader of the Year" later this month. To be eligible, you don't have to submit silly forms, post holiday videos to YouTube, solve puzzles like "December is the ____ month of the year" or get your kids to paint your family.

All I am asking is that you leave some basic information, similar to other sweepstakes. Please comment on this post, with your name, your full address, your birthday, your annual income (full dollars/Euros/etc. is ok), your occupation, your detailed family status, and your highest degree of IT certification (no school information required!).

P.S.: And remember the fun part.

Tuesday, December 8, 2009

Call with IBM experts on Using JDBC with pureXML (updated)

Tomorrow, Friday 4th, is another meeting of the group calling themselves pureXML Devotees. Details can be found here. The call will start Friday on 1pm US Eastern time and will cover DB2 on z/OS as well as DB2 for Linux, UNIX, and Windows.

As usual, the meeting will be recorded and the slides and replay information should be available on the same website a couple days after the call. Enjoy!

Update: The replay information and slides are now available at the pureXML Devotees website.

Let's phone, shop, and whatever.... - Testing new limits

For most of the activities these days data will be produced, ranging from your life and household to administration, manufacturing or services. A lot of the data will end up in data warehouses, either directly or in some condensed, aggregated, distilled way. And while people were talking about 1 TB or 10 TB warehouses only few years ago, scaling up to 100s of Terabyte or even Petabytes (PB) is discussed often now.

One of the enhancements in DB2 9.7 is for addressing this trend. Up to version 9.5 distribution maps were limited to 4096 entries (4 kB), now up to 32768 entries are possible. In a partitioned database the distribution key, i.e., the columns used to figure out on which database partition the entire row ends up, is important because it determines how evenly the data is split between the partitions. The more evenly balanced the distribution is, the better balanced typically the performance is.

To assign a database partition, the distribution key is hashed to an entry in the distribution map. The more entries in the maps, the smaller the skew. With the new increased distribution map in DB2 9.7, the skew remains small even for databases with a larger number of database partitions.

How do you test it? Increase your calling, shopping, driving, consuming. This will not only kick-start the economy, but also grow the enterprise warehouse and make sure new limits are tested (and introduced)...

Friday, December 4, 2009

The importance of correct and integrated data

You can deliver a product as fast and "good" as promised, but still loose a repeat customer. Here is a real life example.

Over the past days I ordered several books online. Usually, I use one specific company "A" (any guesses?) for two reasons. First, in Germany by law all German language books have to be sold for the list price, neither discounts nor coupons can be applied. "A" is offering free shipping on all books, regardless of their list price. Second, "A" is usually very quick in delivering the books and provides tracking information. I used this company as usual for some books.

Then, for an English book where every book seller can offer individual prices, I shopped around and found that I could save 2 EUR over at company "B". The product details showed that the book was in stock and would be sent the next day. I ordered the book in the evening and then started to wait for a confirmation email. And waited. The next morning I received 2 emails. The first was the acknowledgement of my order. The second was a status change email - now the book wasn't in stock anymore, but had a delivery time of 3 weeks. Ouch.
I checked the order status online and then the product page again. Both showed the 3 weeks for delivery. Why not cancel the order and shop somewhere else? I sent an email via contact form, tried to cancel online, and called the service hotline. On the phone they were friendly, confirmed the change to 3 weeks, wondered why it had changed over night and pointed out that it would be hard to cancel, but they would try anyway. The reason was they are working with a 3rd party fulfillment center. One day later, first the status said "trying to cancel", then it changed to "package sent", and in the afternoon the book arrived.

Company "B" delivered the product I ordered on time. However, their data isn't integrated and both customer and customer service are left to wonder about what is correct. Will I shop there again? Not this year... ;-)

Wednesday, December 2, 2009

Learn about MySQL and DB2

When you work with database systems for a longer time it is always a good idea to step back from time to time and look at "your world" from some levels up (isn't it the season for that anyway?). Now there is a book that helps you to get a good overview of not just DB2, but also MySQL. Any idea what book I am talking about?

After some years since the first edition, IBM just updated the MySQL to DB2 Conversion Guide (a so-called Redbook - no Oracle involved). What is good about the book is that they start with a high-level introduction into DB2 9.7 with its architecture and database objects and components, the available tools and utilities, and different ways of accessing the data (APIs, languages, etc.). Thereafter, they focus on MySQL and give a similar introduction. Only if you understand your source and the target platform a migration can be planned well. The rest and biggest part of the 450+ pages then deals with the necessary steps to successfully convert an existing MySQL-based application to one running on DB2.

It is something worth reading. And if MySQL is/was not your world, maybe you are interested in reading the Oracle to DB2 Conversion Guide.

Tuesday, December 1, 2009

Address databases (CRM), the holiday season, and load balancing

It's the time of the year when - it seems so - all companies of the world remember that they have stored "customer" address records somewhere or could acquire some cheaply. At least in my case, most of those companies didn't care for me (to me?) the rest of the year. Now they are sending friendly letters and emails, wishing the best for the season and yes, there are great deals I probably wasn't aware of.

Because from their point of view everybody seems to have plenty of time now during the Advent (nobody attending plenty of celebrations and ceremonies? nobody shopping for gifts? nobody preparing house and garden for the winter? nobody ...?), they suggest to switch all kinds of insurances, think about additional pension funds, or consider other "life-changing" events.

And then there are all those companies trying to sign me/us up for their now daily newsletters with special deals every day, with sweepstakes offering TVs and cash if I am willing to have my address data be distributed to another set of companies ("sponsors").

Finally, yesterday a letter arrived with a set of greeting cards attached which we should use for our holiday mails for friends and family. The front of the card was designed nicely ("how nice!"), the back was mostly advertising ("what a waste" -> trash).

If all those companies that feel the need to maintain my address data in a database all year and consider me a customer only now please would start treating me a customer throughout the year. I had time during the Summer break to look into insurances. I had money to buy the big ticket item after bonus payments in Spring. I was in the mood for pension/retirement planning when I saw the trees turn yellow/brown/red and then the leaves come down in Fall. So, pleeeeaaassseee, consider Load Balancing for us poor consumers, too, and start making use of your databases in different ways.

Monday, November 30, 2009

My topics as Wordle

Finally, I got some time to try my hands on Wordle, feeding in this blog. The result - after some adjustments - looks like this:

You can compare it to the thermal image of a house.

Friday, November 27, 2009

DB2 9.7 FP1 is available, supports read on standby (HADR)

Many companies are waiting one or two service packs/fixpacks before moving to new software versions, regardless of whether it is a database system, an operating system, or the software that holds together day-to-day operations (ERP). Well, let's start moving to DB2 9.7, FP1 is out now. No excuses anymore...

A list of enhancements is available in the Information Center, a general DB2 9.7 FP1 overview including the list of fixes is on the support page. The major enhancements include even more PL/SQL support to make it simpler to move off of Oracle and "read on standby" in HADR environments.

Do you webcam?

With the days getting shorter, darker, and colder (here in Germany) thoughts about nice, warm, and fun Summer days pop up more frequently. It's the time when some people start eating Marzipan, flip through their [online] photo albums, or do other things to keep the spirits high. Do you webcam?

When I am on business trips and back at the hotel, I usually try to check a webcam from "my place" before calling home ("I know how it looks like right now"). Do you webcam, too?

Before talking with co-workers in other locations, you can check webcams to make talking about the weather more interesting ("it is NOT snowing in your place"). Do you webcam?

Here are some webcams I use from time to time:

Friedrichshafen has a couple of webcams and I will only list three around the airport. The so-called slashcam can be controlled and has nice views. The same goes for the Zeppelin webcam. The aero club located at the airport maintains 4 webcams.
Boeblingen, where IBM's German Lab is located, has at least this webcam.
For the conditions in San Jose, California, my former home and place of several IBM locations, the SJSU offers a cam.
A very interesting "local" webcam is installed on one of the ships cruising the Lake of Constance, on the St. Gallen. In the archive you can follow the ship on its tours across the lake.

Now I want to hear from you, what are your webcams?

Thursday, November 26, 2009

The last one

Sometimes it is hard to say goodbye, sometimes not. Sometimes you get used to something and cannot imagine a life without it. Anyway, the day has come to move on in a different, hopefully better way. No more of it for now. The experience is nice to have, but I hopefully can live without it.

After putting up some shelves and drawers over the last few weeks, it was time to unpack the remaining moving boxes. Some of them contained stuff we took with us from Germany to California and never touched (textbooks from the time at university). One box contained framed awards from the past few years (where do we put them?), books with my first conference papers, a newspaper from the day of our wedding. All is unpacked. Goodbye moving boxes, we have moved in. The last one is done...

Wednesday, November 25, 2009

Small is beautiful (and fast): LOB Inlining

In the past I had emphasized that LOBs and XML data are different in DB2, even though both are stored in the LONG tablespace if you wish so (LONG IN option during CREATE TABLE). XML data is buffered, i.e., pages with XML data are held in the bufferpool, LOBs require a direct fetch via their descriptor and are not buffered by DB2. XML columns do not have a length specification, LOB columns have the maximum length specified. The bigger the maximum possible LOB size is, the larger is the LOB descriptor that usually is stored as part of the row.

Now, starting with DB2 9.7, they have a neat feature in common: Both can be inlined. What is inlining and what are the benefits (for LOBs)?

Because LOBs and XML data can be quite large, they are stored outside the regular row data, i.e., outlined, and a LOB descriptor points to the large object. With inlining however, when the data fits into the regular data pages, it is stored as part of the row, i.e., inline.
DBAs can specify the maximum inline size for each LOB or XML column by using "INLINE LENGTH xxx" during CREATE/ALTER TABLE. The total row size, i.e., the sum of all column sizes and some overhead, needs to fit into the data pages. What this means is that if you use 8k pages, an inline length of 15000 won't work for sure. In contrast to XML columns, if no inline length is specified, for LOB columns an inline length equal to the descriptor size is set implicitly.

Why store 200 bytes outlined and have a 220 bytes LOB descriptor in the row (total 420 bytes), when all can be stored in 200 bytes in the row? And now we are already talking about the benefits of LOB inlining. If you have lots of small LOBs, inlining can improve performance and reduce storage significantly. Storage is reduced because the extra LOB descriptor is avoided. In addition, LOB data stored in the row, i.e., inlined, benefits from row compression when enabled. The performance improvement comes from the direct access (no LOB descriptor involved) and the fact that regular table data is cached in the bufferpool. The direct fetch operations to the unbuffered LOBs are avoided.

As shown, LOB inlining has many benefits and if your Large OBjects (LOBs) are really small objects you should exploit this feature. I found this article describing how LOB inlining in DB2 is used for SAP databases. Because many small LOBs exist to store properties, this can have significant performance benefits.

Monday, November 23, 2009

Cooperate for greater performance! (Share a scan)

If you can benefit from someones work (and it is legal), why don't do it? If it is mutual, i.e., both sides benefit from each other, it's even better and a win-win situation. And when it is inside a database system and results in better query performance and more throughput while utilizing less system resources, it should cause big smiles from everyone. What I am talking about?

One of the many new features in DB2 9.7 is Scan Sharing. The optimizer now can group transactions that perform a table scan or block index scan on the same data into so-called share groups. Scanners within a share group coordinate among themselves, i.e., cooperate. The idea is that I/O and bufferpool activity is reduced by having multiple parties work on the same pages at the same time. This is great when your system is I/O bound because the costly physical I/O could be significantly reduced.

How would I explain such a feature to my kids? With the holiday season and family visits approaching I would tell them to coordinate the "bio breaks" on roadtrips. Instead of everyone independently asking for a pit-stop and daddy stopping every 30 minutes, the entire family could agree on visiting the restrooms at the same time and refilling the gas. Coordinated stops (comparable to physical I/O) would result in us significantly faster reaching the destination.

Tuesday, November 10, 2009

From small to big, and always good...

Do you want to keep a database on a mobile device? Or are you running a terabyte-sized warehouse? The requirements are then probably different in terms of hardware and software. The good news is there is a developerWorks article describing the different DB2 versions available, ranging from DB2 Everyplace to the full-fledged DB2 Enterprise Server Edition and the different feature packs.

Tell me something cool about this article
For each of the discussed versions there is a section "Tell me something coll running on...". It shows how an IBM customer is using that specific version of DB2.

Save Money, Get Promoted!!!

Many companies have programs for the continuous improvement of processes and operations. The idea is to save money, reach better quality, and move faster and smoother. Often, employees are given nice incentives to participate in such improvement programs, sometimes they get promoted for only a single brilliant idea.

Now, here is a (free!!!) advice. Don't try to win 10 million dollars in silly contests, but read the Redbook "Oracle to DB2 Migration: Compatibility Made Easy". You can even try out what you learned on the free DB2 Express-C. You will quickly understand how to save money, run database applications faster, and make DBA life simpler.

P.S.: Did I mention Award and Promotion...?

Monday, November 9, 2009

Bringing fun, a Zeppelin, the Bay Area, and Europe together

The following is a new commercial for a Danish retailer. It brings together all kinds of cliches (in a fun way), the Zeppelin from Airship Ventures, a look at the Golden Gate Bridge, and a European audience. My wife and I already watched it a couple of times and always found new funny details.

BTW: We are in Germany and use a different washing detergent, but have a Zeppelin overhead most days...

Call with DB2 experts on XMLTABLE (2009-11-10)

A call with DB2 experts from both z/OS and Linux, UNIX, and Windows will be tomorrow, Nov 10th, at 1pm US Eastern time. Details on this event as part of the "pureXML devotees" activities can be found on the website https://www.ibm.com/developerworks/wikis/display/db2xml/devotee#devotee-xmltable.

Designers, database developers , and DBAs are all invited to join. The call will be recorded, a live chat is available, and materials will be available online. Bring your own food and questions...

Thursday, November 5, 2009

How much does Google know (officially)?

Earlier this week I asked "How much data do I generate?", now I saw that Google is adding another service, the "Google account dashboard", that let's you see what Google is knowing about you. Supposedly.

It's interesting to see statistics about your account activities in one place. The interesting question is how do you interpret, how do you analyze it (from Google's perspective)? What does Google really know (or guess) about me? The question is not Google-specific, but applies to all those companies that collect data about you.

It is good to see such a feature in place because it creates more awareness of the data privacy/data collection issue. I would like to see similar services by other companies.

Wednesday, November 4, 2009

Additional information and help for DB2 pureXML

Recently, for a workshop, I put together a list of websites on where to find information when you want to know more about DB2 pureXML (both on z/OS or Linux, UNIX, and Windows). Why not share this list in the form of a blog post? You can comment if you have things that should be added.

Cross-platform:

pureXML Wiki: https://www.ibm.com/developerworks/wikis/display/db2xml/Home
pureXML forum: http://www.ibm.com/developerworks/forums/forum.jspa?forumID=1423
pureXML devotees: https://www.ibm.com/developerworks/wikis/display/db2xml/devotee
IBM Redbooks: http://www.redbooks.ibm.com/
DB2 pureXML cookbook: http://www.ibmpressbooks.com/cookbook
Blogs like this one (visit http://www.planetdb2.com for an overview)

DB2 for Linux, UNIX, and Windows:

DB2 Information Center: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp
my blog http://blog.4loeser.net or Conor's http://nativexmldatabase.com/

DB2 for z/OS:

DB2 for z/OS pureXML wiki: http://www.ibm.com/developerworks/wikis/display/db2xml/DB2+for+zOS+pureXML
DB2 pureXML team blog: http://www.ibm.com/developerworks/blogs/page/purexml
DB2 for z/OS Information Center: http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/topic/com.ibm.db29.doc/db2prodhome.htm

That's it. If you have more, let me know...

Sunday, November 1, 2009

How much data do I generate each day?

During vacation, I somehow got wondering how much data I generate each day. This is for all (in-) activities in a 24 hour slot. To make it simpler, I decided to only consider "private" activities.

Here is a list of activities, starting from night to day

Over night a lot of mail and packages are moved, leaving tracking data in databases.
There are also all kind of batch jobs running to produce statements, dig into details of transactions from the day before, and much more.
Because several alarm clocks are running, devices being on standby, and of course fridge, freezer, heating system, etc. performing their duties, energy consumption could be (e-)metered, leaving entries at the utilities. During the day, all kinds of electrical devices (stove, computer, microwave, washer/dryer, lights, etc.) will add data.
By making phone calls, connection records (and who knows what more?) is left. If I call a hotline, sometimes phone calls are recorded, often records added to CRM databases, etc.
When doing (grocery) shopping, a list of my items is added to the transactions. Maybe automatically some items are re-ordered.
Using electronic payment, a credit card/check card debit action is performed.
For paying bills, an electronic fund transfer with associated records at several banks is done.
When surfing, my traces are left in web logs, cookies are produced, "recommendations" are processed, Google and other ad networks/analytics/tracking companies are adding to my profile, my IP connections are stored, my DSL provider probably keeps additional information.
When parking in a downtown garage, my license plate is recorded during entry/exit.
The local library keeps a list (and maybe a history) of books on my and the kids' account.
and much more

What bigger items do I miss? How muck kBytes or MBytes is it every day? Should I be concerned or even frightened? Your comments are welcome...

Thursday, October 22, 2009

Social Computing from Mobile Devices in a Cloud Computing Environment for Advanced Analytics,,,

to make IT greener in reshaped data centers based on flash memory with activity monitoring for higher security. Yes, that is the stuff I will do next in my spare time because I am dealing with "Technologies You Can't Afford to Ignore".

Monday, October 19, 2009

What is the difference between a laptop and a mainframe?

Last week I told my wife that I could explain the difference between a laptop and a mainframe. After I was done she said that computer guys have a strange sense of humor. Here is how I explained it.

It boils down to: Never do this with a mainframe!

Hey, what's wrong with you?!?

Sometimes, you have to deal with stuff that is just wrong. Sometimes, it's not easy to even find out what's going on. And sometimes, there are small things that make a big difference...

In an earlier post I had described how an index over XML data can be used to clean up data. Today, I will focus on a neat feature that was added to DB2 9.5 Fix Pack 3 as well as to DB2 9.7. It is a stored procedure (SP) named XSR_GET_PARSING_DIAGNOSTICS. That stored procedure helps you to find out details on XML parsing and validation errors.

The SP has five input and two output parameter. The first is a BLOB with the XML document/fragment in question. The next two make up the XML schema's fully qualified name (schema.name) against which you want to validate or they can be NULL. The 4th parameter can be NULL again or should be the schemaLocation of the primary XML schema document. The last input parameter specifies whether to use schemaLocation information from inside the provided XML document (1) or not (0). If zero was specified and no schema name has been specified (parameters 2 and 3), then no validation is performed. This is exactly done (no validation) in our example:

call xsr_get_parsing_diagnostics(blob('<a>Hello<b>Another<c><c></a>'),NULL,NULL,NULL,0,?,?)

The last two parameters are the output parameters. The first is a placeholder for an output document (as VARCHAR), the 2nd the number of errors found in the XML document. The above call would produce an output like this:

  Value of output parameters
  --------------------------
  Parameter Name  : ERRORDIALOG
  Parameter Value : <ErrorLog>
  <XML_FatalError parser="XML4C">
    <errCode>202</errCode>
    <errDomain>http://apache.org/xml/messages/XMLErrors</errDomain>
    <errText>Expected end of tag 'c'</errText>
    <lineNum>1</lineNum>
    <colNum>27</colNum>
    <location>/a/b/c/c</location>
    <schemaType></schemaType>
    <tokenCount>1</tokenCount>
    <token1>c</token1>
  </XML_FatalError>
  <DB2_Error>
    <sqlstate>2200M</sqlstate>
    <sqlcode>-16129</sqlcode>
    <errText>
[IBM][CLI Driver][DB2/NT] SQL16129N  XML document expected end of tag "c".  SQLSTATE=2200M
    </errText>
  </DB2_Error>
</ErrorLog>


  Parameter Name  : ERRORCOUNT
  Parameter Value : 1

  Return Status = 0

The function is not just useful to deal with parsing or validation errors coming from your application data, but also if you have XML documents that are causing trouble in general.

Friday, October 16, 2009

Learn more about pureXML for z/OS and LUW

Susan Malaika is running a group called "pureXML devotee". What is it? Something to be afraid of? Fanatics you should avoid? The opposite is true (more or less :). In emails and meetings information about pureXML on both the mainframe and the distributed platform is exchanged and skills are deepened. In recent meetings XML storage and XML indexing on DB2 for z/OS were explained, earlier community meetings included storage and indexing on DB2 for Linux, UNIX, and Windows, as well as performance tips.

The next meeting will cover the XMLTABLE function with IBM experts joining from both products. Check out the details of this lively group at https://www.ibm.com/developerworks/wikis/display/db2xml/devotee. And when asked mention that Henrik sent you...

Tuesday, October 13, 2009

Teeth Whitening Tips and DB2 Express-C (and not Oracle 11g XE)

Recently, I stumbled over a so-called gadget that allows to add daily teeth whitening tips to my blog. How nice. But why would I want to have it on my blog? And are there that many tips that it warrants daily updates?

Today I was reminded about the teeth whitening because DB2 users have a lot to smile because of recent news and announcements (and white teeth may be of advantage on those group photos at the upcoming IOD Conference). DB2 users also have a reason to smile when reading the latest (no-) news on Oracle 11g Xpress Edition (XE). According to the Infoworld article, a no-cost edition of Oracle 11g is still a year out. Compare that with how quickly after the DB2 9.7 release the free DB2 Express-C was available. DB2 users can utilize it for developing applications, deploying it at no charge or even distribute it with their applications. And it is available on several platforms and operating systems.

BTW: Did I add the gadget to my blog? Check it out...

IBM Presentation Program / IBM Vortragsprogramm

IBM Germany recently started a presentation program called "IBM Vortragsprogramm". IBM experts are offered as speakers to interested companies or organization for free (panel discussions, talks, expert rounds). You can also hire me, too. The details are on the webpage which is in German.

Monday, October 12, 2009

Mixing cookies, sales records, and DB2 pureScale

This is one of the many stories where personal life and IT life intermix - or is it even always the case? Today my post is about cookies (real ones, nothing about visitor tracking or search engine optimization, SEO), it is about the upcoming holiday season (or are we in it already?), and about database technology, namely DB2 pureScale. But let's get started with the cookies first...

In Germany and some other countries, we have some types of cookies which are only available in the Christmas/holiday season. One of them, and my favorite, are Speculaas (or in German Spekulatien). It's a spiced shortcrust biscuit and they taste very well on their own or when soaked in milk. As mentioned, they are only available throughout the "Season" which these days seems to be from September to December. My wife has been trying to keep the "Christmas is not in September" tradition. And so I have been arguing, pleading, begging for a couple weeks that Speculaas are just some regular cookies. I tried it by pointing out that there is nothing special about these cookies, they could have been sold out this year so far and that we are lucky to have them back. I tried it with arguing that if we don't buy them, they could be sold out by Christmas. Anyway, after much back and forth we now are close to opening the second box. I emptied most of the first box one morning last week in my home office when everybody else was out. My kids also seem to like Speculaas which makes it easier for me in the afternoons...

With Speculaas in the house and the shops full of holiday articles, it dawned on me that we are approaching the Christmas season (and in some countries, like the US, Thanksgiving or other big festivities coming up even earlier). This is the season where additional store clerks are needed and when additional processing power needs to be available for web shops, in the database systems, and all back infrastructure that enables the upcoming peak sales/revenue period. It's the time where sales records are reached and everybody is busy in doing their part to help the economy.

Last week, DB2 pureScale was announced. It helps to horizontally scale up database processing power and achieve high availability of the database system. The key is that new machines can be seamlessly added to a database cluster to increase the system throughput. While there may not be much performance needed in, e.g., Summer, peak performance is needed throughout the season. Using pureScale it is simple to added that needed additional capacity. While it may be possible to move to a bigger machine (vertical scaling), it is not practical in terms of effort or benefits. Having a DB2 cluster also helps with even higher system availability. With DB2 pureScale it is possible to quiesce one member (machine) and service it. Or if parts of your cluster fails, the others are still available and let your business continue. All this is transparent to the database application. It doesn't know whether it is running on a regular or clustered DB2.

Many new computers (desktops, laptops, nettops) are sold during the season, often replacing older, less powerful systems. If there would be something like DB2 pureScale, you would just add another module to your existing computer and add processing power. If one module is broken, your photos, videos, audio streaming, etc. would still be accessible and you could continue with parts of your processing power and repair the failing components. What a thought!

Now it is time for the morning coffee and some cookies (guess which!)...

Friday, October 9, 2009

First official word on DB2 pureScale

IBM now has a press release out on DB2 pureScale and related technologies. What I found impressive is the scale-up test to more than 100 Power systems with still a productivity rate of around 80%. More on DB2 pureScale and the technology over the next weeks.

The official website seems to be here: http://www-01.ibm.com/software/data/db2/9/editions-features-purescale.html (I got a broken link when I clicked into the press release)

Thursday, October 8, 2009

Diagnosis: Simple(r) Life

DB2 produces diagnostic information which is stored in "db2diag.log" files. Depending on the setting of diaglevel there is more or less information in those log files. Many DBAs have automated tasks to back up and compress and maybe eventually remove those files because they grow over time. On one hand, these files cause some (little) work, on the other hand, they are really handy if something goes wrong. The more information is logged, the faster they grow.
Now let's take a look at one of the many small, but useful new features in DB2 9.7: It's an instance configuration parameter named diagsize. It can be used to configure a max size for the diagnostic information and causes the diagnostic log to become a rotating log. If there is no need to keep old diagnostic information and with a limit set, DB2 will automatically remove the oldest log and recycle the space for the next/newest diagnostic log file.

With the improvement in place, my diagnosis is: Simpler life.

Monday, October 5, 2009

Virtual machines, but real books, real products and - yes - really useful

Last month a new IBM Redbook titled DB2 Virtualization has been published. In the book the authors first explain what can be virtualized (servers, storage, network), then talk about about technologies related to virtualization. What I found very interesting was a longer section about sub-capacity licensing, i.e., how DB2 is licensed for some of the virtual environment (how you can save money or not), as well as an overview about the different virtualization products and solutions (incl. Power HyperVisor, VMWare, Linux Kernel-based Virtual Machine). That chapter also includes information about many great DB2 features, such as everything "autonomic", "self-", and "auto-".
The remaining chapters of the DB2 Virtualization book provide in-depth coverage of DB2 in several virtualization environments. Even if you are not planning on running DB2 in a virtual machine, I still recommend reading that book because it gives a good introduction into that topic combined with an overview of current technologies and products.

[More links on virtual appliances are in an older post.]

Thursday, October 1, 2009

A passive house in Fall / Unser Passivhaus im Herbst

It's Fall again and especially the nights are cooler - outside. While in houses in Germany the heating systems has been wakened up after the long break, our house is still warm inside. And we expect it to be like this without heating for the next month or longer. The thicker walls and better insulated windows protect the inside against the cold (sometimes down to 5-8 C / 40's F) that now is dominating most of the nights. With the Indian Summer in the small forest next doors, it's a time to enjoy the house and the inside temperatures.

Der Herbst ist mittlerweile deutlich spuerbar, zumindest draussen. Die Naechte sind kuehler geworden, zum Teil gehen die Temperaturen schon runter bis 5-8 C. In unserem Haus ist es nachwievor warm, die staerker isolierten Waende und Fenster zeigen nun, was sie koennen. Waehrend in den Haeusern von verschiedenen Verwandten und Bekannten z.T. schon seit ein paar Wochen die Heizung zeitweise ihren Dienst tut, war es bei uns bislang nocht nicht notwendig. Und ich erwarte/hoffe, dass es zumindest den Oktober ueber so auch bleibt. Auch unser Brauchwasser wird noch komplett durch die Solaranlage auf dem Dach erwaermt, was den Geldbeutel freut. Da kann man den bunten Herbstwald direkt nebenan noch mehr geniessen...

Wednesday, September 30, 2009

New TPoX release and performance numbers

[Seems like it is benchmark day today] Version 2.0 of the TPoX benchmark (Transaction Processing over XML) has been released. In an earlier post I explained what TPoX is and why it exists. The new release of the benchmark specification has some changes in how the data is generated as well as in some update statements of the workload. The workload driver has also been modified (its properties are now XML-based) to adapt it easierly.

What is also out since last month are TPoX performance results based on version 2.0. A 1 TB workload was tested against DB2 V9.7 on AIX 6.1 on a IBM BladeCenter JS43. The numbers were also compared against DB2 V9.5FP4 run on the same setup in the paper showing the benchmark details.

Please note that due to the changes in the benchmark specification, benchmark results from version 1.x cannot/should not be compared to those from version 2.0.

TPC fines Oracle for recent benchmark claims

The Register has an article about Oracle being fined by the TPC because of recent ads related to Exadata2. Oracle has to pay $10,000 and was ordered to remove/withdraw ads, webpages, etc. which Oracle apparently already did.

Added: The issue is here at the TPC website.

Monday, September 28, 2009

XMLTABLE - Explained, the easy way (Part 2, References)

Last week I wrote about XMLTABLE as the "all-in-one" function because it is a very versatile function. Many DB2 customers are using XMLTABLE to allow existing relational applications co-exist with XML data either in the database or on the wire. The first is obvious, XML data is stored in the database and made available by a relational table (view) built on top of the XML data. If XML data is fed to (not into) the database, e.g., via queues, it doesn't necessarily mean it needs to be stored in the XML format. Some customers use the XML format to exchange data with other companies or agencies (think of product information, tax data, payment information, brokerage data, etc.), but process only data stored in purely relational format - no XML involved. What they do is to feed their incoming XML data into the XMLTABLE function and then store the table output in the database.

Today's title "Explained, the easy way" refers to reusing existing excellent information. Two of my colleagues wrote a 2-part article about XMLTABLE that I recommend reading. Part 1 which is titled "Retrieving XML data in relational format" gives an overview, part 2 has lots of examples and is labeled "Common scenarios for using XMLTABLE with DB2".

Wednesday, September 23, 2009

A look at free database systems (from the XML perspective)

On my laptop I have most (not all, because something is always missing) software I need for my day job and some add-on tasks, including a database system. Right now it is DB2 9.7 Express-C, a free database system where I very often use the included pureXML feature to test out XPath or XQuery expressions or quickly manipulate XML data. The latter can be done by importing the XML files, applying some XQuery magic, maybe even combine it with relational data and then being done. Other people use it to find the closest ATM - always good to have DB2 pureXML handy.

I also took a look at other free database systems. First, there was Oracle XE. While it seems to offer XML support, the software is based on an old release level (10g R2). Support is through a forum, but requires registration to even look inside to see how the support is. Nothing for me.

Next on my list was MySQL which I use in some LAMP environments and - being open source - has lots of community support. However, MySQL's XML support is limited in that sense that, e.g., XQuery expressions are not supported and that XPath evaluation is embedded into SQL in a non-standard way. So nothing for me again.

Last on my list during my evaluation was PostgreSQL. Here the picture is basically the same as for MySQL. PostgreSQL's XML support is limited again in terms of functionality and how, e.g., XPath expressions are embedded into SQL.

DB2 Express-C is free, is based on the latest DB2 release, has an open support forum (no tricks), and offers the broad XML support that the pureXML feature has. So it is DB2 Express-C on my laptop.

[Don't get me wrong when I talk about PostgreSQL and MySQL. I especially like that they added XML support over the years because it widens the common functionality available across products and leads to more XML-related skills among developers and DBAs.]

Tuesday, September 22, 2009

XMLTABLE - The all-in-one function?! (Part 1, Syntax)

What can produce a relational table out of XML data or a sequence of XML fragments? What can be used to shred (or since the Enron scandal "decompose") data simply by using SQL when ingesting data into a warehouse? What can serve relational applications while managing XML data? Of course I am talking about the XMLTABLE function that is part of the SQL standard and the DB2 pureXML feature.

I plan to post a couple of entries about this very versatile function, hence the "Part 1" in the title. Today, I start with a focus on the syntax for typical usage scenarios.

At first sight the XMLTABLE syntax looks mostly straight-forward:
XMLTABLE "(" [namespace declaration ","] row-definition-XQuery [passing-clause] [COLUMNS column-definitions] ")"

Basically, you could first optionally declare some global namespaces (more later), then comes an XQuery expression similar to those in XMLQUERY and XMLEXISTS to define the row context, then the optional, but familiar PASSING clause and finally the COLUMN definitions similar to a CREATE TABLE statement.

There are usually different ways of writing a (X)query. For the XMLTABLE function, the XQuery clause needs some consideration because it defines the row context, i.e., what part of the XML document is available (and is iterated over) for the column values when each row of the resultset is produced. In some examples in future parts I will show the impact of the XQuery expressions.

The PASSING clause is optional because you could work with constants in your XQuery (not very likely) or use column names to reference the data (e.g., "$DOC" for the DOC column). In many cases you will want to use the PASSING clause to utilize parameter markers, e.g., when directly ingesting application data.

The (optional) column definition is similar to a simple CREATE TABLE statement. You specify the column name and its type (e.g., NAME VARCHAR(20)). After the type comes the most interesting part, the keyword "PATH" followed by a string literal that is interpreted as XQuery expression. Within that XQuery the context (".") refers to that set in the row context (see above). If you would iterate over employees in a department, you could then simply refer to the employees' first- and lastname like shown:

SELECT t.* FROM dept, XMLTABLE('$DEPT/dept/emp' COLUMNS first VARCHAR(20) PATH './first', last VARCHAR(20) PATH './last') as t

Note that for columns all types are supported which are supported by XMLCAST. The reason is that behind the covers XMLCAST is called to map the value identified by the column-related XQuery to the relational column value.

Earlier I mentioned that global namespaces could be declared. Imagine that the department documents all have a default namespace "foo" (e.g., "<dept xmlns="foo"><emp>..."). In order to properly navigate within the documents your query would need to look like shown:

select x.* from dep,xmltable('declare default element namespace "foo";$DOC/dept/emp' COLUMNS first VARCHAR(20) PATH 'declare default element namespace "foo";./first', last VARCHAR(20) PATH 'declare default element namespace "foo";./last') as x

All the different XQueries would need to declare the namespace "foo". To make our lifes simpler, the SQL standard allows to globally declare the namespace using the XMLNAMESPACES function (which usually is used for publishing purposes):

select x.* from dep,xmltable(XMLNAMESPACES(default 'foo'),'$DOC/dept/emp' COLUMNS first VARCHAR(20) PATH './first', last VARCHAR(20) PATH './last') as x

The namespace is declared only once, the statement looks much cleaner and is simpler to write.

That's it for today as an introduction. Please let me know if you have questions on XMLTABLE that you would like to have answered in a future post.

Friday, September 18, 2009

Selling pre-owned cars (and Oracle Exadata)

The video of the Exadata v2 launch webcast is now up. I wrote earlier this week about this event. Of that video I especially liked the first few minutes because they reminded me of the Interboot which starts this weekend here where I live. The entire video also reminded me of how used or "pre-owned" cars are sold.

To impress prospective buyers a salesman would point out that the car does not have just 4 wheels like every other car around, but even 5 or even 6. It would not just have one sun roof, but three, not 4 doors, but 7. It is the best, the fastest, the best, and to top it of, the best. It wouldn't matter whether it makes sense in terms of the big picture, only whether some parts are impressive. And don't even try asking such a sales guy about the price. You wouldn't get a straight answer, at least not that easily.

Switching back to the video, I highly recommend watching it - especially on a Friday. Regarding cost of the Exadata package, here is a closer look. For the big picture, you should ask yourself about what is happening to those customers who bought into the (now dead) Oracle/HP cooperation? Why not just use SSD disks instead of a cache? Why not SPARC instead of Intel? Does RAC really scale to Petabytes? And why sailing and an ocean of tears as introduction...?

Wednesday, September 16, 2009

Wow, Oracle combines database system with disks (again!)

Yesterday was the much overhyped event of a company announcing some dbms coupled with newer disk subsystems and an increased cache. The result is that as long as everything fits into memory and cache, performance will benefit from it. If not, it still sucks.

A nice summary of the event is over at The Register:

If Oracle is trying to convince Sun customers that it is committed to the Sparc platform, perhaps it is not trying hard enough.

and

Today's Exadata V2 launch event started 15 minutes late, if you missed the Webcast launch (you can see the replay here when it becomes available), and started with an "extreme performance" theme that showed Captain Larry Ellison and Oracle's World Cup catamaran striking impressive poses, and then cut to a live Ellison in the studio, who doesn't do his own clicking during presentations and started out by admonishing some worker with "Next slide, please, I already know who I am."

BTW: While you are at it, please read here how companies have moved from Oracle to DB2.

Monday, September 14, 2009

From kitchen to datacenter - consolidate and generalize

Consolidation is a hot topic for those operating a data center. The idea is to save money by saving on space, energy, people, etc. I was reminded of this while reading the "consumer information" - a.k.a. weekend advertising, from one of the supermarket chains. Some of the upcoming specials include a sandwich maker, a pancake maker, a popcorn maker, a muffin maker, and so on.

Who needs all of them? How much space do you have in your kitchen? Do you read all the manuals and operating instructions? Why don't you use a general purpose solution that is already available in a regular kitchen (stove, oven, pan, pot, baking sheet, ...)?

Something similar has happened over the years again and again with software, including database systems. Companies have invested in special-purpose systems and later realized that they need additional capacities (kitchen storage, counter space, money for the purchase, ...) or the general purpose system runs overall as well (no special instructions to read, no special cleaning afterwards, different portion sizes possible). Remember the days of object-oriented dbms or an influx of "native XML" dbms? In most cases it is back to heating up a pan for some really good pancakes...

BTW: What kitchen gadget do you have that falls into the pancake maker category?

Monday, September 7, 2009

New in 9.7: XML index creation/reorg with concurrent writes

The feature I am describing today is one of the smaller items (marketing-wise) on the "new in DB2 9.7" list. However, it is important for day-to-day operations and it is trickier to implement (why else did it take until 9.7?). The enhancement is that now concurrent transactions with insert, update, and delete operations are allowed parallel to a XML index creation or REORGanization.

In earlier release with pureXML functionality only read access was possible which required some kind of workload planning to creating new indexes. With the improvement the index building process will catch up with ongoing IUD activities and only will require exclusive access, i.e., blocking other activities, during that time. This is the same behavior as for non-XML indexes (see CREATE INDEX for details).

Thursday, September 3, 2009

Moving and transforming (XML) data

Storing data, regardless of whether it is relational or XML, is one thing. Moving it across databases and transforming data on-the-fly, is another. A two-part series about using (IBM InfoSphere) DataStage and DB2 pureXML together provides insight into how to extract, transform, and move XML data, e.g., to integrate XML data into a data warehouse. Part 1 is titled "Integrate XML operational data into a data warehouse", part 2 named "Building an XML-enabled data warehouse".

Thursday, August 27, 2009

XML and Big Data - And why DB2 is hybrid

I could have titled this post as "lesson from the field". In this blog post, "How XML Threatens Big Data" is a report about experience made with using XML for some data projects, like biotech, and then continues to give "Three Reasons Why XML Fails for Big Data" and then "Three Rules for XML Rebels".

I was pulled to some IT projects where the focus was on doing everything in XML. People got overboard in adopting new technology. However, DB2 didn't scrape its relational capabilities for a reason and supports both relational and XML data side-by-side. This is because there a projects where relational format (a.k.a. tables with columns and rows) is the right storage format and there are projects where it is XML. In many cases both are needed and it is good that DB2 is a hybrid system, supporting relational data and XML data very well.

Tuesday, August 18, 2009

Sent from my... (Information Overload)

Yesterday evening I stumbled over a funny (non-IBM!) video about information overload. IBM also has a podcast on Information Overload. Too much is too much. But how much information is ok? And what information do you need? Do you get the right information? At the right time? At the right place? Are you able to make use of that information? Do you pay too much to get the right information?

IBM calls the process around answering these questions to establish an Information Agenda. Not as funny as the video, but certainly effective.

BTW: This message was not sent from a Blackberry.

Thursday, August 13, 2009

Clicks, Hefeweizen, and Transactions

Maybe you noticed that I was on vacation. And even with almost no laptop/emails and not too many job-related thoughts, a database or IT guy is (almost) always seeing the infrastructure behind.

In the evenings we were enjoying (pre-ordered) 4 course dinners in the hotel restaurant. For the drinks, the waiters used a wireless handheld terminal. With few clicks on the touchscreen an order for my Hefeweizen was created, the order printed at the bar, the beer added to the sales records, and it also got charged to our room. All this is interesting, but who really cares about transactions when a fresh, cold Hefeweizen is arriving after a long day of hiking?

Wednesday, July 29, 2009

Wikibooks, XQuery, and Sudoku

You probably have heard about all three, Wikibooks, XQuery, and Sudoku, but maybe not in the same context.

Wikibooks is a wiki-based collection of free educational text books. One of the books is targeted at providing XQuery examples. Even if the system used for most of the examples is not (yet) DB2, it is worth a look. How come Sudoku puzzles into the big picture? Well, one of the chapters in the XQuery book deals with "Puzzles, Maths and Games" and one example shown is how to solve Sudoku puzzles using XQuery.

As I said, the examples are not based on DB2, but it should be possible to port the functions and query to DB2 (ping me for ideas). If you are teaching, writing something similar (from scratch) could be a nice student project.

Tuesday, July 28, 2009

Did you get it? Setting up analytic systems is simpler now

Some other interesting announcements today were about IBM acquiring SPSS as well as the IBM Smart Analytics System. The latter is a combination of IBM software and hardware, consisting of a foundation and add-on modules for different target areas.

The IBM Smart Analytics System delivers a comprehensive and flexible set of capabilities, including business intelligence reporting, analysis, dashboards, and scorecards, data mining, cubing services, text analytics, data warehouse management, storage and server platform. It can also be used by clients or cloud service providers to deliver the power of analytics to private or public cloud services.

Some more information is on the IBM Smart Analytics System product page.

Is it real? A virtual conference again

Today I received an invitation for IBM's next "Data In Action" virtual conference on August 19th:

This conference will feature effective data management solutions for smarter business outcomes. Hear from Donald Feinberg, Vice President and Distinguished Analyst from Gartner, about the state of the DBMS software market and data management challenges facing businesses today. The topics that Mr. Feinberg will examine include; the problem of exploding data volumes, new computing paradigms for handling this explosion, data management cost reduction techniques and technologies, and the perspective of the DBA now and in the future.

Following Mr. Feinberg will be a moderated panel of customers and partners who are leveraging the unique and desirable features only available in DB2 9.7.

In the invitation I got it says that the first 97 attendees will receive a complimentary memory stick. Sounds like a real conference with goodies. Hey, wait - it says IBMers are not eligible for the memory stick. Dang - reality is back.

Monday, July 27, 2009

Travel: Using humor to cope with stress

Last Friday evening I was on the way home using a regional express train from Stuttgart to Friedrichshafen (Germany). I was happy to catch it and make the connection because another train I had been on had been running late. About 45 minutes into the ride our train had to stop in the nowhere and the following announcement came over the PA: "Because of persons on the track, continuation will be delayed for an unknown time." Often "person on the track" means suicide and in the worst case it can take hours to reopen the tracks.

Many people got upset about the situation, but I decided to take the delay with humor, seeing it as positive that at least mobile Internet was working where the train was halted and that I still had some food, water, and battery life left. I was reminded about a "humor and stress" class IBM had payed for me a couple years back. The facilitator was Tim Gard, known for his travel stories and chicken feet. Enjoy.

BTW: The train continued after about 60 min. I made it home safely with my emails done.

Thursday, July 23, 2009

Beware Be aware of the invisible!

Searching the Internet Movie Database, IMDb, for titles with the word "invisible, it seems that fighting the invisible has been something fascinating for decades. XML and XQuery are fascinating, too, but I am not aware of any movies featuring them in the title (maybe still too early?). Anyway, what I am up to today is to show you an interesting XML/XQuery technicality which many beginners are struggling with - and it's caused by something invisible!

Let's start by creating a simple table and inserting an XML document.


create table foo(id int,doc xml);

insert into foo values(1,'<test>hello</test>');
DB20000I  The SQL command completed successfully.

The insert statement succeeds as expected. The string is implicitly parsed as XML document because the target column is of type XML. We already know of SQL/XML publishing functions like XMLELEMENT and XMLATTRIBUTES, so let's try to construct the same document and insert it again.


insert into foo values(2,xmlelement(name "test",'hello'));
------------------------------------------------------------------------------
insert into foo values(2,xmlelement(name "test",'hello'))
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL20345N  The XML value is not a well-formed document with a single root
element.  SQLSTATE=2200L

Surprisingly (?) the insert statement fails. DB2 is complaining that it is not a well-formed document with a single root element. What happened? We only have one element, why the error message? The answer is that DB2 is expecting a document node as specified in the XQuery and XPath Data Model (XDM). The latter is the standard which XQuery and DB2 pureXML are built on.

When we inserted the XML document as a string, the XML parser automatically (implicitly) inserted a document node for us because that node is kind of invisible when looking at XML strings. However, when we build or construct an XML document from ground up using the publishing functions, we have to explicitly create an document node. The SQL/XML standard has a special functions for it: XMLDOCUMENT. Let's try it again.


insert into foo values(2,xmldocument(xmlelement(name "test",'hello')))
DB20000I  The SQL command completed successfully.

The XQuery language has several ways of constructing XML documents, too. The easiest is to directly provide a string with tags as shown next. Wrapping it into the XMLQUERY function, we try to insert the generated XML:


insert into foo values(3,xmlquery('<test>hello</test>'))
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL20345N  The XML value is not a well-formed document with a single root
element.  SQLSTATE=2200L

Not suprisingly (this time!) it fails again and there are two ways to fix it. One is to use the document node constructor provided by the XQuery standard, the other one to use XMLDOCUMENT() as before.


insert into foo values(3,xmlquery('document{<test>hello</test>}'))
DB20000I  The SQL command completed successfully.

insert into foo values(4,xmldocument(xmlquery('<test>hello</test>')))
DB20000I  The SQL command completed successfully.

Fighting the invisble is not as scary as it seems knowing how and a movie like "The invisible node" probably wouldn't draw a big crowd. Moreover, the document node is not entirely invisible if you know how to make it visible (any parallels to the movies?). Showing you how is a future topic... (creating opportunities for a sequel here :)

Monday, July 20, 2009

Relational vs. pureXML: Some indexing differences

You probably know how relational indexing works. When you create an index on a column, then for every row in the table there is an index entry. There is a 1:1 relationship between the rows and the index entries.

In the XML world things can be quite different. Let's imagine that we have XML documents where as part of other personal information the names of our siblings are listed ("CREATE TABLE persons(id INT, doc XML)"). The XML fragment could like the following for a single sibling (2 kids for the parents):

<siblings>
  <name>Sarah</name>
</siblings>

If we would like to index the sibling names, we could do it in DB2 the following way:

CREATE INDEX sibNamesIdx ON persons(doc)
GENERATE KEY USING XMLPATTERN
'/siblings/name' as SQL VARCHAR(40)

Because we have one sibling in the above example, there would be a single entry in the index - the same as in relational. But what happens if you are an only child (<siblings/>)? Or what if you have multiple brothers and sisters?


<siblings>
  <name>Sam</name>
   <name>Sarah</name>
    <name>Sean</name>
  <name>Siegfried</name>
  <name>Simon</name>
  <name>Susie</name>
  <name>Sydney</name>
</siblings>

For an only child there wouldn't be a sibling name and hence no index entry. For multiple siblings there would be an index entry each, resulting in multiple index entries for a single row. This is quite different from the relational world.

What may look strange or at least unusual to you has some benefits to you. In the relational world you would need to model such data with joined-in side tables or with sparse columns. In the latter case you would manage NULL values in your index. In the XML world you only have data that is really present represented in your indexes, thus keeping them small.

Thursday, July 16, 2009

Fun with databases: How cereals turn into silverware

This morning my wife came back from grocery shopping. Among other things, she brought back a new organic cereal mix that just had started to hit the shelves. My wife was happy because she got it for free - more or less.

While paying at the cash register, my wife checked the receipt because the amount seemed to be higher than expected. Then she noticed an item on the list that wasn't in her cart: A 24 piece silverware set. What was in her cart, but not on the receipt was the new cereal mix. After some phone calls by the store clerk to the supervisor they together found out that both the silverware and cereals were in the product database, but somehow the system mixed it up. Was the barcode on the product incorrect? Were both products listed with the same barcode? Was it a database inconsistency or a system glitch?

My wife got the cereals for free as token of appreciation for waiting and the trouble and left some puzzled store employees behind.

Did you run into similar situations? Any ideas how this could have happened?

Tuesday, July 14, 2009

pureXML for your well-being!

The new IBM Data Management magazine is out and one of the articles is about how DB2 pureXML is helping to improve the healthcare system. Next time you visit a doctor's office, ask them how they manage your patient records...

Job vs. Life: The importance of tools and ROI

When you have a new home or a new garden - or both - one of your spare time jobs is to install things, assemble furniture, remove weeds, plant and water, etc. Usually you have a choice of really manually performing a tasks, e.g., using a regular screwdriver, or trying to simplify and speed up what you want to accomplish, e.g., using a power screwdriver. For most tasks you have the choice of using mostly already present tools or investing into new power tools or similar stuff. The questions are always, how much time and effort can I save, how often will I use the power tool again (and sometimes even: if I buy the tool, will my significant other actually do it instead of me)?

Often, the decision is to buy a tool and to be happy about the simplifications and the saved time. The actual problem is to select the right brand and version of tool.

When you are running database applications the situation is similar. There are many tasks that can be performed by a skilled person without tools or just with whatever the core software (OS, DBMS, JDK, etc.) offers. However, often an additional tool can help to save time and improve the quality of what needed to be done. Similar questions as above for the house and garden need to be answered, only thereafter it can become more confusing when shopping for the right tool [Did you happen to select a power screwdriver that was renamed/rebranded a couple times?]. IBM offers a wealth of tools, sometimes for every problem even two, ranging from application development over data life cycle management and data encryption to performance monitoring.

Going back to house and garden, I usually buy tools from the same brand/company because they have the same "look and feel", once I dealt with one tool, it's quicker to learn how to handle the next. BTW: That's the same principle behind the Optim suite of tools for Integrated Data Management.

Monday, July 13, 2009

Energy Consumption in our Passive House / Jahresstromverbrauch in unserem Passivhaus

[Fuer die deutsche Fassung bitte weiter unten schauen]

Today a year ago we could start with our apples to apples energy metering in our passive house, a low energy house. This morning I went down to our basement to read out the meters, something I typically try to do once a week. So how much electrical energy did we consume over the last 365 days? A total of 5026 kWh. This is less than anticipated. Let me try to put this into perspective and to give you more details.

Our house has ~150 square meters (~1620 sq. ft.) in two levels plus an unheated basement of almost 100 square meters. We are 2 adults and two kids, I am mostly working from home and my wife is domestic engineer (a.k.a. homemaker or stay-at-home mom).

We have two power meters, one for the heating/ventilation system and hot water, another one for the regular household (fridge, freezer, washer, dryer, computers, phones, oven, dishwasher, lights, ...). What is surprising to us is how much we still spend on heating compared to the household consumption:

Household consumption: 2244 kWh

Heating/ventilation/water: 2782 kWh

However, after analyzing the fact deeper, we realized that we only have new A+ or A++-rated household appliances and mostly use energy saving bulbs. Moreover, the computers do not use much energy even though they are often running 12+ hours.

For the heating/water part this is within the expected range. From April to October a solar panel is heating up the water. During the winter we try to keep the inside temperature around 22 degrees Celsius (72 degrees Fahrenheit).

I tried to find comparison data and according to some German sources a similar household would spend around 4000-5000 kWh just on the household part. Heating ranges from 8000-25000 kWh alone.

Jahresstromverbrauch in unserem Passivhaus
Vor einem Jahr konnten wir mit unserem Stromverbrauchsmessungen in unserem Passivhaus beginnen, vorher standen noch Bautrockner im Keller. Ueber das letzte Jahr hinweg haben wir 5026 kWh an Strom verbraucht. Der Verbrauch teilt sich auf 2 Zaehler/Bereich auf:

Haushaltsstrom: 2244 kWh
Heizung/Lueftung/Warmwasser: 2782 kWh

Beim Haushaltsstrom zahlt sich aus, dass wir nach unserem Umzug aus den USA nach Deutschland die meisten Elektrogeraete neu kaufen mussten und somit fast ausschliesslich A+ oder A++-Geraete haben. Der Haushaltsstrom umfasst alles von den Computern (Heimbuero) ueber Telefone bis zu den ueblichen Verbrauchern wie Kuehlschrank, Gefrierschrank, Waschmaschine, Trockner, Backofen, Herd und Spuelmaschine.

Der Heizstrom ist fuer die permanente Lueftung, die Kleinstwaermepumpe und Zusatzheizer sowie die Solarpumpe. Unser Haus hat 150 Quadratmeter an Wohnflaeche, die im Winter auf ca. 22 Grad Celsius gehalten werden.

Setzt man den Gesamtverbrauch in Perspektive zu den Durchschnittswerten in Deutschland, so zeigen sich die Einsparungen durch ein Passivhaus und durch Energiespargeraete. Wir haben neu gebaut und mussten umzugsbedingt auch neue Geraete anschaffen. Ob sich zusaetzliche Daemmung bzw. die Abloesung von Energiefressern durch moderne Geraete lohnen, muss jeder einzelfallbezogen entscheiden. Unsere Werte zeigen jedoch das Potenzial.

Wednesday, July 8, 2009

Oracle + TPoX = no results?

Conor has a nice post about Oracle and their lack of published TPoX benchmark results. Oracle has published all kinds of benchmark results (like those from TPC), but not for the XML benchmark "TPoX".
A while ago I had already "commented" about Oracle and their XML claims and lack of results. There is also a post about TPoX to give you some more background information.

Tuesday, July 7, 2009

Step back: I am a DBA! (or DB2 developer or IM specialist)

Yesterday, I read a news article about an airline announcing an expected delay of 8 hours because of a mechanical problem (on an island in the Mediterranean). They needed to fly in a mechanic. Then a passenger stepped forward, identified himself as aircraft mechanic, and later fixed the problem.

There are situations, when all kinds of people/occupations are sought ("We need a medic!"), but when was the last time you heard: "Is someone here a DBA?" When would be such a situation?

Friday, July 3, 2009

Lessons from the field: Simplicity wins - XMLGROUP, XMLROW, and XMLAGG

Simplicity and brevity win and that's why I have a longer post on keeping it short. :)

Programmers and engineers love sugar in all kinds of forms, shapes, and flavors - probably to keep the brain powered. Some of us especially like syntactic sugar as it helps to remove complexity and keeps us focused on the essential parts.

Some days ago while being with a customer I noticed a query like this (rewritten by me against the DB2 sample database):


with temp (columns) as 
  (select xmlelement(name "employee", xmlconcat(
            xmlelement(name "id", empno),
            xmlelement(name "first", firstnme),
            xmlelement(name "last", lastname)))
  from employee)
select xmlelement(name "all",xmlagg(columns)) from temp

Basically, all rows of a table (employee) should be returned within a single XML document (rooted with "all") like this:


<all>
 <employee>
    <id>some value</id>
  <first>some value</first>
  <last>some value</last>
 </employee>
 ...
</all>

The above, initial query already uses XMLAGG to aggregate the fragments coming from the individual rows into the big document - which is good. However, it uses a common table expression (the WITH expression) to first build the row-based XML fragments, then later combines them into the result document. This unnecessarily complicates the query. It can be rewritten simply into:


select xmlelement(name "all",
        xmlagg(xmlelement(name "employee",
          xmlconcat(
             xmlelement(name "id", empno),
            xmlelement(name "first", firstnme),
            xmlelement(name "last", lastname)))))
from employee

It is simpler now, but the XML construction still takes up some thoughts as it uses nested calls to XMLELEMENT and XMLCONCAT to build the row fragments. The latter can be accomplished shorter and more elegantly by using either XMLROW, or even better XMLGROUP:


select xmlelement(name "all",
        xmlagg(
          xmlrow(empno as "id",firstnme as "first",lastname as "last" option row "employee")))
from employee

select xmlgroup(empno as "id",firstnme as "first",lastname as "last" option row "employee" root "all")
from employee

XMLROW allows to very elegantly construct an XML fragment from row data, XMLGROUP to aggregate such or other fragments. XMLROW and XMLGROUP are simpler to use than the XMLELEMENT-based queries, they are easier on the eye, and what is best, should result in better performance than the other variants.

Let's take a look at the optimized query text for the initial query and for the XMLGROUP-based:
Original:


SELECT $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
  $INTERNAL_QNAME$(NULL, NULL, all), $INTERNAL_XMLNAMESPACES$((''),
  ('')), Q4.$C0)
FROM
(SELECT XMLAGG(Q3.$C0)
FROM
(SELECT Q2.$C0
FROM
(SELECT $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
  $INTERNAL_QNAME$(NULL, NULL, employee), $INTERNAL_XMLNAMESPACES$((''),
  ('')), XMLCONCAT(arity, $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
  $INTERNAL_QNAME$(NULL, NULL, id), $INTERNAL_XMLNAMESPACES$((''),
  ('')), $INTERNAL_SQLTOXML$(Q1.EMPNO)), $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
  $INTERNAL_QNAME$(NULL, NULL, first), $INTERNAL_XMLNAMESPACES$((''),
  ('')), $INTERNAL_SQLTOXML$(Q1.FIRSTNME)), $INTERNAL_XMLELEMENT$(elemNullHandlingOption,
  $INTERNAL_QNAME$(NULL, NULL, last), $INTERNAL_XMLNAMESPACES$((''),
  ('')), $INTERNAL_SQLTOXML$(Q1.LASTNAME))))
FROM HLOESER.EMPLOYEE AS Q1) AS Q2) AS Q3) AS Q4

The XMLGROUP-based query:


SELECT XMLDOCUMENT($INTERNAL_XMLELEMENT$(1, $INTERNAL_QNAME$(NULL,
  NULL, all), Q3.$C0))
FROM
(SELECT XMLAGG(Q2.$C0)
FROM
(SELECT $INTERNAL_XMLELEMENT$(1, $INTERNAL_QNAME$(NULL,
  NULL, employee), $INTERNAL_XMLELEMENT$(1, $INTERNAL_QNAME$(NULL,
  NULL, id), $INTERNAL_SQLTOXML$(Q1.EMPNO)), $INTERNAL_XMLELEMENT$(1,
  $INTERNAL_QNAME$(NULL, NULL, first), $INTERNAL_SQLTOXML$(Q1.FIRSTNME)),
  $INTERNAL_XMLELEMENT$(1, $INTERNAL_QNAME$(NULL,
  NULL, last), $INTERNAL_SQLTOXML$(Q1.LASTNAME)))
FROM HLOESER.EMPLOYEE AS Q1) AS Q2) AS Q3

The overall general access plan (not shown) is the same for both queries, however the optimized query text differs. What can be seen is fewer SELECT expressions and a simplified construction, resulting in less CPU cycles spent.

After all the words, let me recap: Simplicity and brevity win...

Pages