Data Henrik: September 2009

Wednesday, September 30, 2009

New TPoX release and performance numbers

[Seems like it is benchmark day today] Version 2.0 of the TPoX benchmark (Transaction Processing over XML) has been released. In an earlier post I explained what TPoX is and why it exists. The new release of the benchmark specification has some changes in how the data is generated as well as in some update statements of the workload. The workload driver has also been modified (its properties are now XML-based) to adapt it easierly.

What is also out since last month are TPoX performance results based on version 2.0. A 1 TB workload was tested against DB2 V9.7 on AIX 6.1 on a IBM BladeCenter JS43. The numbers were also compared against DB2 V9.5FP4 run on the same setup in the paper showing the benchmark details.

Please note that due to the changes in the benchmark specification, benchmark results from version 1.x cannot/should not be compared to those from version 2.0.

TPC fines Oracle for recent benchmark claims

The Register has an article about Oracle being fined by the TPC because of recent ads related to Exadata2. Oracle has to pay $10,000 and was ordered to remove/withdraw ads, webpages, etc. which Oracle apparently already did.

Added: The issue is here at the TPC website.

Monday, September 28, 2009

XMLTABLE - Explained, the easy way (Part 2, References)

Last week I wrote about XMLTABLE as the "all-in-one" function because it is a very versatile function. Many DB2 customers are using XMLTABLE to allow existing relational applications co-exist with XML data either in the database or on the wire. The first is obvious, XML data is stored in the database and made available by a relational table (view) built on top of the XML data. If XML data is fed to (not into) the database, e.g., via queues, it doesn't necessarily mean it needs to be stored in the XML format. Some customers use the XML format to exchange data with other companies or agencies (think of product information, tax data, payment information, brokerage data, etc.), but process only data stored in purely relational format - no XML involved. What they do is to feed their incoming XML data into the XMLTABLE function and then store the table output in the database.

Today's title "Explained, the easy way" refers to reusing existing excellent information. Two of my colleagues wrote a 2-part article about XMLTABLE that I recommend reading. Part 1 which is titled "Retrieving XML data in relational format" gives an overview, part 2 has lots of examples and is labeled "Common scenarios for using XMLTABLE with DB2".

Wednesday, September 23, 2009

A look at free database systems (from the XML perspective)

On my laptop I have most (not all, because something is always missing) software I need for my day job and some add-on tasks, including a database system. Right now it is DB2 9.7 Express-C, a free database system where I very often use the included pureXML feature to test out XPath or XQuery expressions or quickly manipulate XML data. The latter can be done by importing the XML files, applying some XQuery magic, maybe even combine it with relational data and then being done. Other people use it to find the closest ATM - always good to have DB2 pureXML handy.

I also took a look at other free database systems. First, there was Oracle XE. While it seems to offer XML support, the software is based on an old release level (10g R2). Support is through a forum, but requires registration to even look inside to see how the support is. Nothing for me.

Next on my list was MySQL which I use in some LAMP environments and - being open source - has lots of community support. However, MySQL's XML support is limited in that sense that, e.g., XQuery expressions are not supported and that XPath evaluation is embedded into SQL in a non-standard way. So nothing for me again.

Last on my list during my evaluation was PostgreSQL. Here the picture is basically the same as for MySQL. PostgreSQL's XML support is limited again in terms of functionality and how, e.g., XPath expressions are embedded into SQL.

DB2 Express-C is free, is based on the latest DB2 release, has an open support forum (no tricks), and offers the broad XML support that the pureXML feature has. So it is DB2 Express-C on my laptop.

[Don't get me wrong when I talk about PostgreSQL and MySQL. I especially like that they added XML support over the years because it widens the common functionality available across products and leads to more XML-related skills among developers and DBAs.]

Tuesday, September 22, 2009

XMLTABLE - The all-in-one function?! (Part 1, Syntax)

What can produce a relational table out of XML data or a sequence of XML fragments? What can be used to shred (or since the Enron scandal "decompose") data simply by using SQL when ingesting data into a warehouse? What can serve relational applications while managing XML data? Of course I am talking about the XMLTABLE function that is part of the SQL standard and the DB2 pureXML feature.

I plan to post a couple of entries about this very versatile function, hence the "Part 1" in the title. Today, I start with a focus on the syntax for typical usage scenarios.

At first sight the XMLTABLE syntax looks mostly straight-forward:
XMLTABLE "(" [namespace declaration ","] row-definition-XQuery [passing-clause] [COLUMNS column-definitions] ")"

Basically, you could first optionally declare some global namespaces (more later), then comes an XQuery expression similar to those in XMLQUERY and XMLEXISTS to define the row context, then the optional, but familiar PASSING clause and finally the COLUMN definitions similar to a CREATE TABLE statement.

There are usually different ways of writing a (X)query. For the XMLTABLE function, the XQuery clause needs some consideration because it defines the row context, i.e., what part of the XML document is available (and is iterated over) for the column values when each row of the resultset is produced. In some examples in future parts I will show the impact of the XQuery expressions.

The PASSING clause is optional because you could work with constants in your XQuery (not very likely) or use column names to reference the data (e.g., "$DOC" for the DOC column). In many cases you will want to use the PASSING clause to utilize parameter markers, e.g., when directly ingesting application data.

The (optional) column definition is similar to a simple CREATE TABLE statement. You specify the column name and its type (e.g., NAME VARCHAR(20)). After the type comes the most interesting part, the keyword "PATH" followed by a string literal that is interpreted as XQuery expression. Within that XQuery the context (".") refers to that set in the row context (see above). If you would iterate over employees in a department, you could then simply refer to the employees' first- and lastname like shown:

SELECT t.* FROM dept, XMLTABLE('$DEPT/dept/emp' COLUMNS first VARCHAR(20) PATH './first', last VARCHAR(20) PATH './last') as t

Note that for columns all types are supported which are supported by XMLCAST. The reason is that behind the covers XMLCAST is called to map the value identified by the column-related XQuery to the relational column value.

Earlier I mentioned that global namespaces could be declared. Imagine that the department documents all have a default namespace "foo" (e.g., "<dept xmlns="foo"><emp>..."). In order to properly navigate within the documents your query would need to look like shown:

select x.* from dep,xmltable('declare default element namespace "foo";$DOC/dept/emp' COLUMNS first VARCHAR(20) PATH 'declare default element namespace "foo";./first', last VARCHAR(20) PATH 'declare default element namespace "foo";./last') as x

All the different XQueries would need to declare the namespace "foo". To make our lifes simpler, the SQL standard allows to globally declare the namespace using the XMLNAMESPACES function (which usually is used for publishing purposes):

select x.* from dep,xmltable(XMLNAMESPACES(default 'foo'),'$DOC/dept/emp' COLUMNS first VARCHAR(20) PATH './first', last VARCHAR(20) PATH './last') as x

The namespace is declared only once, the statement looks much cleaner and is simpler to write.

That's it for today as an introduction. Please let me know if you have questions on XMLTABLE that you would like to have answered in a future post.

Friday, September 18, 2009

Selling pre-owned cars (and Oracle Exadata)

The video of the Exadata v2 launch webcast is now up. I wrote earlier this week about this event. Of that video I especially liked the first few minutes because they reminded me of the Interboot which starts this weekend here where I live. The entire video also reminded me of how used or "pre-owned" cars are sold.

To impress prospective buyers a salesman would point out that the car does not have just 4 wheels like every other car around, but even 5 or even 6. It would not just have one sun roof, but three, not 4 doors, but 7. It is the best, the fastest, the best, and to top it of, the best. It wouldn't matter whether it makes sense in terms of the big picture, only whether some parts are impressive. And don't even try asking such a sales guy about the price. You wouldn't get a straight answer, at least not that easily.

Switching back to the video, I highly recommend watching it - especially on a Friday. Regarding cost of the Exadata package, here is a closer look. For the big picture, you should ask yourself about what is happening to those customers who bought into the (now dead) Oracle/HP cooperation? Why not just use SSD disks instead of a cache? Why not SPARC instead of Intel? Does RAC really scale to Petabytes? And why sailing and an ocean of tears as introduction...?

Wednesday, September 16, 2009

Wow, Oracle combines database system with disks (again!)

Yesterday was the much overhyped event of a company announcing some dbms coupled with newer disk subsystems and an increased cache. The result is that as long as everything fits into memory and cache, performance will benefit from it. If not, it still sucks.

A nice summary of the event is over at The Register:

If Oracle is trying to convince Sun customers that it is committed to the Sparc platform, perhaps it is not trying hard enough.

and

Today's Exadata V2 launch event started 15 minutes late, if you missed the Webcast launch (you can see the replay here when it becomes available), and started with an "extreme performance" theme that showed Captain Larry Ellison and Oracle's World Cup catamaran striking impressive poses, and then cut to a live Ellison in the studio, who doesn't do his own clicking during presentations and started out by admonishing some worker with "Next slide, please, I already know who I am."

BTW: While you are at it, please read here how companies have moved from Oracle to DB2.

Monday, September 14, 2009

From kitchen to datacenter - consolidate and generalize

Consolidation is a hot topic for those operating a data center. The idea is to save money by saving on space, energy, people, etc. I was reminded of this while reading the "consumer information" - a.k.a. weekend advertising, from one of the supermarket chains. Some of the upcoming specials include a sandwich maker, a pancake maker, a popcorn maker, a muffin maker, and so on.

Who needs all of them? How much space do you have in your kitchen? Do you read all the manuals and operating instructions? Why don't you use a general purpose solution that is already available in a regular kitchen (stove, oven, pan, pot, baking sheet, ...)?

Something similar has happened over the years again and again with software, including database systems. Companies have invested in special-purpose systems and later realized that they need additional capacities (kitchen storage, counter space, money for the purchase, ...) or the general purpose system runs overall as well (no special instructions to read, no special cleaning afterwards, different portion sizes possible). Remember the days of object-oriented dbms or an influx of "native XML" dbms? In most cases it is back to heating up a pan for some really good pancakes...

BTW: What kitchen gadget do you have that falls into the pancake maker category?

Monday, September 7, 2009

New in 9.7: XML index creation/reorg with concurrent writes

The feature I am describing today is one of the smaller items (marketing-wise) on the "new in DB2 9.7" list. However, it is important for day-to-day operations and it is trickier to implement (why else did it take until 9.7?). The enhancement is that now concurrent transactions with insert, update, and delete operations are allowed parallel to a XML index creation or REORGanization.

In earlier release with pureXML functionality only read access was possible which required some kind of workload planning to creating new indexes. With the improvement the index building process will catch up with ongoing IUD activities and only will require exclusive access, i.e., blocking other activities, during that time. This is the same behavior as for non-XML indexes (see CREATE INDEX for details).

Thursday, September 3, 2009

Moving and transforming (XML) data

Storing data, regardless of whether it is relational or XML, is one thing. Moving it across databases and transforming data on-the-fly, is another. A two-part series about using (IBM InfoSphere) DataStage and DB2 pureXML together provides insight into how to extract, transform, and move XML data, e.g., to integrate XML data into a data warehouse. Part 1 is titled "Integrate XML operational data into a data warehouse", part 2 named "Building an XML-enabled data warehouse".

Pages