Data Henrik: 2010

Tuesday, December 7, 2010

Gaining access to DB2 data through indirect means

There are many ways of accessing data directly, e.g., through simply selecting from a table or view. But when we talk about data security, how many of you know about indirect data access? Well, if you want to know more about it, the DB2 Information Center has a page "Gaining access to data through indirect means".

Some of my favorite tools like db2cat and db2dart are listed, but also diagnostic (dump) files, monitoring data, explain output and of course the catalog views.

How long does it take to build workload optimized systems?

And I don't mean how long it takes to assemble one...

Monday, December 6, 2010

More on full text indexing: XML documents and XPath

Last week I explained how to build a full text index on PDF files stored in DB2. Today, I will look into how to search in XML documents using a full text index and DB2 Text Search. One of the benefits of XML documents is that their content is (semi-) structured. With full text search you cannot find a needle in a haystack, but you can find which of many haystacks a needle is in. With XML documents and information about their structure, search becomes more directed and results are more precise. Let's look into details.

db2 "create table xfiles(id int not null primary key, doc xml)"

db2 "insert into xfiles values (1,'<r><a>Friedrichshafen</a><b>Paris</b><c>San Jose</c></r>')"

 db2 "insert into xfiles values (2,'<r><a>Paris</a><b>San Jose</b><c>Friedrichshafen</c></r>')"

db2 "insert into xfiles values (3,'<r><a>San Jose</a><b>Paris</b><c>Friedrichshafen</c></r>')"

First, we create a table name xfiles which holds XML documents. Then we insert three different rows with similar XML documents. Next in our journey to insight is creating a full text index on column doc and then to update it:

db2ts "CREATE INDEX idxDocs FOR TEXT ON xfiles(doc)"

db2ts "UPDATE INDEX idxDocs FOR TEXT"

Note that when the column is of type XML, then the index is also of format XML. Now that we are done with the setup, we can actually perform the first search:

db2 "select id from xfiles where contains(doc,'Friedrichshafen')>0"

ID         

-----------

          1

          2

          3

  3 record(s) selected.

Well, we get what we asked for. All documents contain the word "Friedrichshafen" and we know as much as before. But then we remember that XML documents are (semi-) structured and that I mentioned XPath before. The good news is that we can use XPath expressions as part of full text search to cut down on the number of returned documents.

db2 "select id from xfiles where contains(doc,'@xpath:''/r[a contains(\"Friedrichshafen\")]''')>0"

ID         

-----------

          1

  1 record(s) selected.

Note that in the above query we are using the Linux command shell and need some escaping for the quote signs. Without them the statement would look like the following:

select id from xfiles where contains(doc,'@xpath:''/r[a contains( "Friedrichshafen")]''')>0

As part of the parameters to the CONTAINS function, we specify that an XPath expression should be used and then the actual expression. In our example we only want documents where the word "Friedrichshafen" is contained in the "a" elements. Hence, only a single document is returned. You can combine predicates inside the CONTAINS using and/or and also make use of of some advanced expressions. For the full syntax see here.

db2 "select id from xfiles where contains(doc,'@xpath:''/r[a contains(\"Friedrichshafen\") or b contains(\"Paris\")]''')>0"

ID         

-----------

          1

          3

  2 record(s) selected.

select id from xfiles where contains(doc,'@xpath:''/r[a contains("Friedrichshafen") or b contains("Paris")]''')>0

As we have seen above, a full text index allows both content and structural search on XML documents. If you don't want to use SQL as primary query language but XQuery, then the DB2 built-in function for XQuery, xmlcolumn-contains, can be used.

And remember, all of this already works in the free DB2 Express-C.

Tuesday, November 30, 2010

Indexing PDF Files with DB2 Text Search

This week I was asked how to build a full text index on PDF documents in DB2. Because my hard drive is full of them, I created a new DB2 database and set up DB2 Text Search with so-called rich text document filters. After creating a table, loading data and indexing the documents, I eventually used the CONTAINS and SCORE functions to search in my PDF files. But let me show you the steps in some more details...

Because DB2 Text Search support can be installed as part of the DB2 installation, my system already had the base support set up.

Thus my endeavors started with downloading the DB2 Accessory Suite. It contains additional document filters that help to extract full text information out of different file types such as PDF, word processor, or spreadsheet documents. The DB2 Information Center has instructions for downloading and installing the accessory suite. I checked that all necessary files from the regular DB2 installation were already in place and then succeeded with setting up rich text support.

Actually, setup requires two steps. Install the software on the server, then setting it up for an instance. Some confusion, but I succeeded. Next was enabling rich text support, instructions are on Information Center again. Now on to creating the actual test database (on my Linux shell):

db2 CREATE DB FT

DB2 Text Search uses an environment variable DB2DBDFT. This should be set to the database you work with to simplify administration. Hence:

export DB2DBDFT=FT

Finally, we are able to set up our new database for full text search:

db2ts "ENABLE DATABASE FOR TEXT"

The db2ts command is the DB2 Text Search command processor. Now we can create a simple table in the database to hold the documents after connecting to the db:

db2 connect to ft

db2 "create table hlDocs(id int not null primary key, description varchar(200), doc BLOB(10M))"

The second statement above creates a table named "hlDocs" with three columns. The first column is named "id" and of type integer. Note that DB2 Text Search requires a primary key column in the indexed table. The 2nd column will hold a short string describing the document. The last column is a BLOB that we use to store the PDF files. To easily insert PDF documents into the table, I created a small delimited file for import (named "blob.del"):

1,'train schedule Paris',ParisSchedule.pdf

2,'Schedule recycling paper collection',Papier_Abfallplan2010.pdf

3,'New resume',Current_resume.pdf

The above data can now be imported with the PDF files assumed in the same as our current directory:

db2 import from blob.del of del lobs from . insert into hlDocs

It is important to check that all rows were inserted:

Number of rows read         = 3

Number of rows skipped      = 0

Number of rows inserted     = 3

Number of rows updated      = 0

Number of rows rejected     = 0

Number of rows committed    = 3

Now the test data is in, but we don't have an index yet. That can be done using the db2ts command again. We first create the index, then update it:

db2ts "CREATE INDEX idx1 FOR TEXT ON hldocs(doc) FORMAT inso"

db2ts "UPDATE INDEX idx1 FOR TEXT"

CAUTION: Note that the option "FORMAT inso" is not clearly explained in the documentation (yet - hence this post). That option tells DB2 to use the additional document filters that we installed as part of the DB2 Accessory Suite.

Finally, we can test the index using the CONTAINS and the SCORE functions.
db2 "select id,description from hldocs where contains(doc,'Paris')=1"

ID          DESCRIPTION     --------------------------------------------------
          1 'train schedule Paris'

1 record(s) selected.

db2 "select id,description,cast(score(doc,'Friedri*') as decimal(5,3)) as score from hldocs order by 3 desc"

ID DESCRIPTION                                   SCORE
----------- ------------------------------------------
          2 'Schedule for paper collection'      0.046
          3 'Resume'                             0.022
          1 'Train schedule Paris'               0.000

3 record(s) selected.

Most of the steps are dedicated to proper setup of DB2 Text Search with the additional filters for PDF documents. However, once it is set up, inserting data and searching the documents is simple.

Friday, November 26, 2010

Black Friday Special: DB2 for free (no MIR required)!!!

We have THE most important shopping day in the US today: Black Friday. Shops are trying to lure in shoppers with rebates and specials, some require complicated mail-in rebates (MIRs). Let me point you to a very valuable product and it is entirely free and doesn't require a mail-in rebate: DB2 Express-C.

The free product includes free autonomics and free pureXML and much more. Get DB2 Express-C today and if you want some great free books and tools, let me know.

Friday, November 19, 2010

pureXMLness - Explained!

Earlier this week I had asked "What pureXMLness do you have?". Let me explain the query that computes the pureXMLness today.

xquery
let $i:=db2-fn:sqlquery("select xmlcast(count(*) as xml) from sysibm.sysindexxmlpatterns"),
$j:=db2-fn:sqlquery("select xmlcast(count(*) as xml) from sysibm.syscolumns where coltype='XML'"),
$k:=db2-fn:sqlquery("select xmlcast(count(*) as xml) from sysibm.sysxmlstrings"),
$l:=db2-fn:sqlquery("select xmlcast(count(*) as xml) from sysibm.sysxmlpaths"),
$m:=db2-fn:sqlquery("select xmlcast(count(*) as xml) from sysibm.sysxsrobjects"),
$n:=db2-fn:sqlquery("select xmlcast(cast (avg(xda_hit_ratio_percent) as float) as xml) from sysibmadm.bp_hitratio")
return <pureXMLness>{avg((($i+$j+$m) div 20,($k+$l) div 300,$n))}</pureXMLness>;

First, let me start with the first keyword, "xquery". DB2 is a hybrid database engine and it natively supports both SQL statements and XQuery statements. Prefixing the statement with "xquery" means that a native XQuery follows.

What then follows is a so-called let statement ("let $i [...]"). It is used to define the variables $i to $n. For each of the variables a value is assigned. All of them are computed using the sqlquery function which executes a SQL SELECT statement. One requirement is that the function sqlquery returns an XML sequence. The SQL statements that are executed are simple SELECT statements using the count() or avg() function. To convert the results of count() and avg() to an XML sequence, we use the XMLCAST function (xmlcast(count(*) as xml)). The SQL value is converted into an XML value (which is a XML sequence).

As input to the pureXMLness we take the number of entries in the indexxmlpatterns catalog table into account, i.e., the number of indexes over XML columns. Another variable is the number of XML-typed columns obtained from the columns catalog table. Three other variables are based on the number of stringIDs (sysxmlstrings), pathIDs (sysxmlpaths), and the registered XML schemas (xsrobjects). The last input variable is the average bufferpool hit ratio for XDA objects.

After all the variables have been computed, they can be used in the return clause to compose the pureXMLness which is packed into an XML element.

Wednesday, November 17, 2010

One editor, many tools: The SQL and XQuery editor has been enhanced

Earlier this month a new article "Creating scripts more efficiently in the SQL and XQuery editor" has been published on developerWorks. Some of us are switching from the deprecated DB2 Control Center to the Eclipse-based tooling. Good to know that the same SQL editor is part of Data Studio, InfoSphere Data Architect, and several Optim tools.
The articles describes how man new editor features can be used to efficiently develop SQL scripts or XQuery scripts. It is also a good introduction into how to use the statement editor. What is often done for editing XML-related queries, namely changing the statement terminator, is described as well.

Tuesday, November 16, 2010

What "pureXMLness" do you have...?

The longer and the more intense the work with XML in a database system of your choice (this is DB2, right?), the more often comes the question: What pureXMLness do I have?

To what degree am I using pureXML? How much XML does my system have? Is our system mostly relational or XML (remember DB2 is hybrid)? How do we measure how XML is used? Give me some numbers for our CIO, it's Q4.

To give a scientifically-proven, valuable answer, something that stands the test of time and holds up to all questions from the business side, I came up with THE pureXMLness factor. How is it computed? The following is the simple query that needs to be run against the DB2 database in question. It's an XQuery of course.

xquery
let $i:=db2-fn:sqlquery("select xmlcast(count(*) as xml) from sysibm.sysindexxmlpatterns"),
$j:=db2-fn:sqlquery("select xmlcast(count(*) as xml) from sysibm.syscolumns where coltype='XML'"),
$k:=db2-fn:sqlquery("select xmlcast(count(*) as xml) from sysibm.sysxmlstrings"),
$l:=db2-fn:sqlquery("select xmlcast(count(*) as xml) from sysibm.sysxmlpaths"),
$m:=db2-fn:sqlquery("select xmlcast(count(*) as xml) from sysibm.sysxsrobjects"),
$n:=db2-fn:sqlquery("select xmlcast(cast (avg(xda_hit_ratio_percent) as float) as xml) from sysibmadm.bp_hitratio")
return <pureXMLness>{avg((($i+$j+$m) div 20,($k+$l) div 300,$n))}</pureXMLness>;

Let me know what pureXMLness you have or if you think you have a better formula...

SQL and XML Limits in DB2

How many database partitions can a DPF system have (it is 999 partitions)? How many levels deep can an XML document stored in DB2 be (125 levels)? What is the maximum size of an SQL statement I can code (2097152 bytes)?

If you have questions like that around DB2, then the page SQL and XML limits in the DB2 Information Center is the right one for you. And if you have lots of users, rest assured, the maximum number of concurrent users for a system is 64000.

Monday, November 15, 2010

Oracle to DB2 terminology mapping

Sometimes, things you are searching for, are already there and wait to be found. Today, I have such an example. The DB2 Information Center features a page that lists both the Oracle concepts and DB2 concepts, the "DB2-Oracle terminology mapping".
If you need to explain the DB2 architecture to someone with Oracle background (or vice versa), this comes handy.

Wednesday, November 10, 2010

How long is long? Maximum size of an XML text node in DB2

I was recently asked about the maximum size of a single text node when stored in DB2 pureXML. Basically, when you create a table

CREATE TABLE test(id int, doc xml)

and insert an XML document with several elements, how long can the text between the open and closing tag be, e.g., the value of myText?

INSERT INTO test VALUES(1,'<root><myText>all the text that you want</myText></root>')

The background to this question is that XML values are formatted to data pages. If the XML document is larger than the page size, the document is broken up into subtrees that fit into the page. However, a text node is already a single node. Can a text value be larger than the page size?

My answer: It can be really, really large. Try it out and let me know the biggest value that you could insert (or produce in an update statement).

Tuesday, November 9, 2010

Some flights, appliances, and "one size fits all"

As I wrote before, I was traveling within Europe. Similar to the Skymall brochure in North America, some airplanes feature the Worldshop brochure. I used some spare time and read about the new G3Ferrari Pizzamaker. This brought up memories of the Popcorn Maker, the Ice Cream Maker, the Donut Maker, the bread machine, and all the other advertised appliances. How many of them do you own?

One reason I don't own (all of) them is the storage space required when they are not used. Another reason is that they usually can only do what they have been designed for (sometimes not even that!) and it often requires reading additional instructions, getting familiar with the do's and don'ts. Sometimes, some of those "specialty appliances" are something en vogue for few months, then disappear again. I trust my (hightech) oven and my induction stove on helping me out when I need a pizza or some popcorn and they work fine for base regular food. They are my "one size fits all" solution to my culinary requirements.

The same goes for my database system. I trust DB2 to handle all kinds of data: Strings, numbers, dates, timestamps, and - yes - XML documents. DB2 can manage my data for OLTP applications and in BI scenarios. Sometimes, the crust on the pizza is not as crispy as that from a true Pizzamaker, but I can prepare a pizza side by side with a nice plum tart in my convection oven. Hmm, a taste of Fall...

Thursday, November 4, 2010

DB2 9.7 and changes to DBADM authorities

I was traveling the last 2 weeks and met with business partners and customers. One of the topics that was discussed were the security-related changes that happened in DB2 9.7. Some new roles like DATAACCESS and ACCESSCTRL were introduced and authorities for, e.g., DBADM were reduced.

Everybody agreed that a separation of duties means better access control and higher security. However, less authorities for the DBADM can make the life harder or more complex when there are no changes to development and test processes.

What are the changes that you had to make? What are the best practices that were introduced in your company? And last: What don't you like about the security changes?

Monday, October 18, 2010

Some more pureXML bootcamps this year

Both in the USA:
Dec 1-2 in Minneapolis, MN
Dec 15-16 in Hardford, CT

Go to this link to sign up: https://www.ibm.com/developerworks/wikis/display/im/DB2+9.7+pureXML+Bootcamp

Friday, October 15, 2010

Performance: Statement concentrator, small knob with an impact

First off, let me start by pointing out that it is best practice to use parameter markers for repeating, similar dynamic SQL queries. Instead of issuing

select fname, lname from person where id=15

and

select fname, lname from person where id=266 

one could write

select fname, lname from person where id=?

and provide the 15 or 266 as parameter to the query. The reason is that every SQL statement needs to be compiled before it can be executed. Query compilation takes some time and if you need to repeat it gazillion times, it will add up. If a parameter marker is used, the statement is compiled once and DB2 remembers how to execute the statement (the code produced is called package) in its package cache. Whenever the statement is executed again, the compilation phase can be skipped, the package is taken from the cache, the parameter is replaced with its actual value, and you have the results back.

Now to the statement concentrator, a new feature introduced in DB2 9.7. Sometimes, it is not possible to use parameter markers or the application is third-party code and it cannot be changed. That's when the statement concentrator comes in handy. Once it is enabled for literals, the new logic detects repeating similar statements that don't use parameter markers. It then tries to reuse an already existing package from the package cache to skip compilation and to save time.

The statement concentrator is OFF by default and it can be configured on the server (STMT_CONC in the database configuration) and/or for clients (preferred) using, e.g., CLI/ODBC or JDBC/SQLJ. Remember: If you want to benefit from the statement concentrator, you need to be active and enable it. And yes, parameter markers and the statement concentrator work for XML-related queries, too...

Wednesday, October 13, 2010

Boring news? Yet another benchmark record for DB2 on POWER

Yesterday, IBM announced another world record, this time for the Two-Tier SAP Sales and Distribution (SD) Standard Application Benchmark. Again, this was based on DB2 for Linux, UNIX, and Windows running on the IBM POWER platform.

Now combine this very competitive speed and throughput with very competitive pricing and you should have a winner. If you are on Oracle right now then, yes, DB2 understands PL/SQL as well.

Friday, October 8, 2010

Where do you want to have a DB2 pureXML bootcamp?

A DB2 pureXML bootcamp is a 2 or 3 day long event where IBM experts teach business partners (BPs) and sometimes customers all about using XML together with DB2. A list of upcoming pureXML bootcamps is at the bootcamp and education overview wiki at developerWorks. The next scheduled pureXML bootcamp are in Zagreb (Croatia) at the end of October and in Atlanta (Georgia, US) in early November.

Now my question to you: Where do you want to see a bootcamp? Don't be shy...

DB2 Advanced Enterprise Server Edition: All you need in one package

On October 5th IBM announced a new edition of DB2 for Linux, UNIX, and Windows: The DB2 Advanced Enterprise Server Edition (AESE). It is based on the Enterprise Server Edition (ESE) and bundles several of the most valuable (in terms of usefulness) features into a single package.

Some of the Optim tools are already included in the AESE and on the engine side data compression (tables, indexes, XML data, temporary data), HADR and Q-Replication for realizing your high availability and disaster recovery solution of choice, label-based access control (LBAC) for greater data security, and data federation for DB2 and Oracle databases are some of the features that are part of the bundle.

And if you wonder: pureXML is always included free of charge, even in the free Express-C edition of DB2.

Monday, October 4, 2010

Using DB2 pureXML and ODF Spreadsheets

If you are using an office suite that supports the Open Document Format (ODF), such as OpenOffice.org or Lotus Symphony, and are a database/XML fan(atic) like I, then the artice "Using DB2 pureXML and ODF Spreadsheets" should be something for you. The most recent e-bulletin of the IDUG Solution Journal shows how you can process ODF-based documents by using XQuery inside DB2. Manipulate spreadsheets or text documents from the "tip of a query editor"...

New fixpack (FP3) for DB2 9.7

DB2 9.7.3 is now available at this link. An overview of the changes can be found in in the Information Center and a more detailed list should be available later on here.

Update on 9/29/2010: Apparently there are some issues with the fixpack image for the server and it has been removed. Some information is in this document.

Update on 10/04/2010: The images are now available.

Friday, October 1, 2010

Zeppelin: There are no whales in Lake of Constance

The Zeppelin NT is built in Friedrichshafen where I live. I can see the Zeppelin almost daily on its way from and to the airport (FDH). However, riding the zeppelin here in Germany is different from that one with home base in the San Francisco Bay Area: There are no whales.

The video below is a short report about the recent trip of Airship Ventures' zeppelin north to the Seattle, WA, area, supporting a whale watching mission. After watching it, I feel ready for the weekend...

Wednesday, September 29, 2010

Automatic compression in DB2? Good question and some answers

Image via Wikipedia

During the DB2 bootcamp I was teaching I was asked, why DB2 does not offer automatic compression. There are a lot of automatic and autonomous features inside DB2. Why could DB2 not automatically determine that compression makes sense and compress the data?

This is an excellent question. DB2 can indeed take control of several performance-related configuration settings once DB2 has been authorized (enabled) for that. Regarding compression it is interesting to see that DB2 compresses temporary tables automatically once a compression license has been applied and DB2 determines that it makes sense performance-wise. This same capability applied to regular data and indexes is what the question is about. Why not autonomous decisions for that?

The answer relates to several aspects: The scope of a decision and the business strategy. Let's start with the scope. For a temporary table DB2 has an idea of what to expect. It knows the query and has (rough) statistics about the data and expected intermediate result sets. Hence it estimate the impact, good or bad, more or less precisely and decide on whether it makes sense to apply compression. For regular data and indexes, the decision and the background knowledge are different. DB2 does not know which queries to expect, how the data might grow over time, what other concurrent activities might be planned, what resources would be available to compress and reorganize the data. The impact of the decision is by far of a much bigger scope and wide reaching.

The business side of the question whether to compress or not are also nothing DB2 knows about. Service level agreements, storage costs, costs for CPU cycles, response times, throughput goals, business priorities, and many more are something DB2 does not know about. Smart humans are needed to decide on what is best, not just for a single table or index, but for the overall design and how it fits into the business requirements and (cost) strategy.

And that is where skilled DBAs come into the picture. Excellent question, some answers, and overall good news for DBAs...

Monday, September 27, 2010

Just in time - autonomics in action

Last week I was teaching a DB2 Bootcamp. The class room had big window fronts to the South-West and to the North-West and shortly after lunch the sun started to shine inside. First, it was only very bright, then even I in the front could notice how the temperature inside the room started to increase. After a few minutes the outside sun shades started to come down, finally stopping in a position that still left the nice daylight inside, but blocking the sun.

About half an hour later, it was time for me to start the presentation about "Practical Autonomics in DB2". They let you keep focused on your actual high-value job by taking care of mostly routine tasks. Autonomic Building Maintenance or Building Automation is exactly the same. I didn't need to worry about regulating the temperature or light. Sensors and some "smart algorithms" took care of it, so that I could concentrate on a more valuable job, delivering a presentation and transferring skills.

Do autonomics always give the best, the optimal results? No, but they get close to it without any human intervention. For the class room, it would have required keeping an eye on the temperature and pressing buttons to control the sun shades. Most of all, it would have required to take focus off the teaching and worrying about things a computer, in most of the cases, could do better. The same for autonomics in DB2.

Friday, September 3, 2010

Obtaining information about the installed DB2 version (level)

Somehow, I had a mental blackout earlier today when I tried to obtain information about the installed DB2 version and couldn't remember the command. Of course, you can always look into the db2diag.log file. Because DB2 writes an entry with version information and data about the system it is running on to that diagnostic file whenever it starts up.

However, the command I couldn't come up with is db2level. This command prints out the same information. And then, for those who need to obtain that information using plain SQL, they can utilize a special administrative view, ENV_INST_INFO.

How does the output look like?

From my db2diag.log:

DATA #1 : Build Level, 128 bytes
Instance "hloeser" uses "32" bits and DB2 code release "SQL09071"
with level identifier "08020107".
Informational tokens are "DB2 v9.7.0.1", "s091114", "IP23033", Fix Pack "1".
Output from db2level:
DB21085I Instance "hloeser" uses "32" bits and DB2 code release "SQL09071"
with level identifier "08020107".
Informational tokens are "DB2 v9.7.0.1", "s091114", "IP23033", and Fix Pack
"1".
Product is installed at "/opt/ibm/db2/V9.7".
I spare you the output from "select * from sysibmadm.env_inst_info". And why did I want to look at it? I was checking whether I already had applied the fixpack 2.

Tuesday, August 31, 2010

Food: Where business and life meet

Working at home has advantages. There is the "lunch is ready in 5 minutes" call to remind me to get ready to make the few meters from my desk to the table on time. Sometimes, those instant messaging windows pop up (US East Coast wakes up or urgent European business) and it becomes a struggle to be at the table when food is served. And I usually don't help to prepare lunch during the work week. But all that changed today.

I can point to the following video, actually a recent IBM commercial, and claim: "Darling, see how I helped with the food?".

Monday, August 30, 2010

Beer or wine, elements or attributes? Designing XML schemas

Image via Wikipedia

One of the questions asked repeatedly during DB2 pureXML workshops is about the use of elements and attributes when designing XML documents/schemas. What is better, using elements or attributes?

The answer given is that it depends on the data, the requirements, and other factors. Instead of going deeper into the reasoning for elements and attributes, I am referring to the summary at Cover Pages. Michael Kay has already in 2000 put the entire discussion into the following wise words found here:

Beginners always ask this question.
Those with a little experience express their opinions passionately.
Experts tell you there is no right answer.

Friday, August 27, 2010

Yes or no: What is better for your data?

In two posts in the past, I explained how to say no to invalid data and how to find out what data was wrong and causing errors. With the REJECT INVALID VALUES clause during CREATE INDEX it is possible to reject XML documents that have values in the specified XPath location not compliant with the specified data type. Now the question I got is: What is better, to reject invalid values or to ignore them?

Well, let me answer with THE answer: "It depends." I will try to be more specific in a minute. Using XML is often for flexibility reasons and so you may not know what the future will hold. However, in most cases the XML schemas, i.e., the structure of the documents and the data types to expect, are known. And then you want to be sure the data matches what you expect. Because the data at the path specified in the CREATE INDEX statement needs to be casted to the target data type anyway, regardless of whether the value ends up in the index (valid) or not (invalid), the performance is the same. Moreover, with the approach to reject you can be sure that IF the path is present, its value is in the index. That leaves room for optimizations at runtime.

So my recommendation is that if you don't need to be as flexible as possible, then use the REJECT INVALID VALUES clause. If you should receive an error because of a casting issue, see my post on how to find out why that I mentioned above.

Wednesday, August 25, 2010

To APPEND or not to APPEND: Insert and Import Performance

When you create a table, by default, DB2 is resource-conscious in terms of space. So-called FSCRs (free space control records) are used to keep track of free space in the pages. When you delete a row or a row has to move because of an update statement expanding the row size, the new gap in the data page is added as free space information to a FSCR. During insert or import operations the FSCRs are searched to fill up the pages.

The DB2 registry variable DB2MAXFSCRSEARCH controls how many of those space management records are searched when a row is added to a table. The default is 5 FSCRs, a value of "-1" means that in the worst case all records are searched for the best fit. However, searching for free space takes some time, even with efficient algorithms, and it is faster to not search for free space when adding data. This behavior can be turned on by executing ALTER TABLE with the APPEND ON clause. If large quantities of data are inserted (or imported), a performance gain can be realized by omitting the free space search and directly append the data to the table, i.e., to add it to the end of the table.

The drawback of the approach of using APPEND ON is that even when data gets deleted the free space is not reused, unless the table is REORGanized. A REORG is also required if the table is altered again to switch to APPEND OFF to update the free space information.

Of course, if bulk data insert needs to be performed, using LOAD is a good (or most often better) option. All the above applies to relational as well as to XML data.

Tuesday, August 24, 2010

"BY REF" in XML processing

Several XML-related functions that DB2 offers allow to specify the "BY REF" keyword. Is it worth using it? And what does it mean anyway? Let's take a look at it.

DB2 functions like XMLQUERY, XMLEXISTS, and XMLTABLE have a PASSING clause to provide input values, some of the functions also return either a sequence or column values. For all of those values it is possible to specify the option BY REF (see the linked documentation for the functions above). What the BY REF option tells DB2 is to not create copies of the input or output values, but to only move a reference to those values around. This reduces temporary data, uses less of the valuable memory on the database servers, and it is much faster.

SELECT doc FROM myDocs WHERE XMLEXISTS('$d/name' PASSING BY REF doc AS "d" BY REF)

But what happens when the option BY REF is not specified? DB2 takes the default which in all places is BY REF. So sit back, relax, and enjoy a resource-conscious, highly efficient DB2 - by design.

SELECT doc FROM myDocs WHERE XMLEXISTS('$d/name' PASSING doc AS "d")

SELECT doc FROM myDocs WHERE XMLEXISTS('$DOC/name')

BTW: The latter, omitting the PASSING CLAUSE, only works in DB2 for Linux, UNIX, and Windows.

Real life: May all your transactions commit!

Image via Wikipedia

About two weeks ago I had to buy train tickets for a trip to Paris. Because I had a special promotion code that could only be applied by a sales agent I went to the train station. There, a loooong journey into the inner workings of the sales and reservation system started. One hour later, I left with tickets, a long line of waiting customers, and a frustrated sales agent.

Recently I received a promotion letter (20 EUR off) by Deutsche Bahn, the state-owned German train operator. In the fineprint it stated that it could only applied when booking through a sales office (instead of online), was good till September, had minimum requirements on the ticket price and the type of train, and most importantly, once applied was non-refundable. Because I wouldn't use it for private travel, I thought about saving some bucks for IBM and planned to use it for a train-based business travel for which I needed to apply the IBM corporate client tariff. And I had so-called BahnCard, a frequent traveler card for 50% discount. Looking back, combining all this, it already called for disaster.

Once I was next in line, I mentioned the corporate client tariff, showed my identification for the corporate client tariff, the BahnCard, and the promotion letter. Few minutes later, I had my ticket, the promotion was applied and the credit card was swiped. Unfortunately, when I held the ticket in my hands, I couldn't see the corporate client tariff. No problem, the sales agent said, we'll look into it. After consulting with another sales agent, the "application support" was called in on how to apply the corporate client tariff. Next, the old ticket was canceled and a new one generated, all looking good. The idea was to apply the refund for the old to the new ticket, putting the difference back to the corporate credit card. However, because of the non-refundable promotion, the system went into a loop. It could not fully cancel the old ticket because parts of it were paid for by a non-refundable payment. The sales agent went into different menu options for payment, trying to charge a negative amount to the credit card, trying to refund the difference as cash, etc. All actions produced error messages because of incompatible offer conditions.

After several long minutes and an increasingly long line behind my (now sweaty back), the sales agent decided to call in help from the application support center again. The support engineer was able to remotely log into the transaction and see details, but was neither able to cancel the transaction nor to complete it. Together, they discussed options on how to convince the system to "get the job done". Finally, with my OK my credit card was charged again - paying the ticket a second time. Then a cash refund outside the system was initiated after performing and documenting a full cash inventory. Eventually, I left the sales office after about one hour, smiling to the "next in line" call.

When you design promotions, make sure the system is not only able to handle them, but also all the error cases. Those are best practices of engineering.

Wednesday, August 18, 2010

Smaller, but way faster and cheaper: IBM sets new TPC-C record

I am traveling right now, but wanted to point you to a new TPC-C benchmark result. IBM set a new record running DB2 9.7 on a POWER7-based system. The IBM system is more than 35% faster than the so-far top Oracle result, providing 41% better price/performance and 35% better energy efficiency per transaction. This should set an end to the misleading and silly Oracle advertisements (I hope, but am not sure).

Tuesday, August 10, 2010

pureXML Indexing - Behind the scenes

I was asked whether I could explain a little bit how the different XML indexes are used for performing an index scan. Well, this probably requires several blog entries, but today I at least try to cover the basics.

When you create a table with an XML column, DB2 automatically creates two different XML indexes. One is the XML Path index, the other the XML Region Index. The INDEXTYPE in SYSCAT.INDEXES lists them as XPTH and XRGN. If you create a table with more than one XML column, you will notice that there is a XPTH for each XML column, but only one XRGN for the entire table. The path index maps an encoded path (e.g., /root/department/employee) to a unique pathID, an integer value. The pathID is unique within the entire database. The region index is used to manage parts (regions) of an XML document, so that documents can be larger than a single data page.

Similar to indexes on non-XML columns, users can create indexes on XML columns utilizing some features of the CREATE INDEX statement. Thereafter, the system catalog might look like shown in this article in the Information Center. But how are those indexes used?

One of the XML-related operators is XISCAN. When you perform an index scan using a non-XML index, an IXSCAN operator shows up in the access plan. For XML data, because some more effort is needed, the operator is named XISCAN, XML Index Scan. Once the optimizer decides that an index scan should be used, some auxiliary information to set up the scan is needed, including of course the start or stop keys for the values in the predicate (similar to regular b-tree access). What is needed in addition is the pathID of the path to search for. In the XML world we are not comparing "id=4711", but the value at "/root/department/employee/id" needs to be 4711.

If the entire absolute path is known when the query is compiled, the path can be turned into a pathID. However, this is not the case when wildcards (e.g., "//employee/id") are used. That's when the XML path index (XPTH) comes in handy. Because it stores the encoded paths in reverse order (our example would be "id/employee/department/root" in clear text) it is possible to find all paths ending (now starting) with "id" or "employee/id". To retrieve the pathIDs, at runtime an index scan on the path index is performed and the range is X-locked. With the locking DB2 can prevent any new path to be created during the actual index scan for the query predicate.

Once the pathIDs are known, either from the compile phase or the index look-up, the actual scan on the XML index (XVIP type in the catalog) can be performed and docIDs, nodeIDs (to pinpoint the node having that desired value) and RIDs (record identifiers) are obtained from the b-tree. The docIDs identify the qualifying XML documents, the RIDs point to the records (rows) in the table which the XML documents are part of. Now, based on the docID and nodeID, the part of the document can be located using the XML region index. Then, regular XML processing kicks in, e.g., the XSCAN operator or others could be applied.

Note that the above description covers the standard case and that optimizations for some cases are possible. There is also the XANDOR operator which speeds up index evaluation when two or more indexes for the same column can be used. But that's another blog post. Stay tuned...

Friday, August 6, 2010

Performance scale-out and continous availability: DB2 pureScale now available on System x

One of the most heard questions after the DB2 pureScale announcement last year and the initial "Wow!!!" was "Will it be available on System x?". Now that question can be answered officially: Yes, it is. Yesterday, IBM announced DB2 pureScale on System x. Now you have a choice of building your system on either POWER hardware or on select IBM System x servers.

Now I hear you say "Wow!" and ask: "When will there be an Express-C edition?"....

Tuesday, August 3, 2010

Fat-free and now in a light edition: DB2 Express-C download image

I am one of the gazillion users of DB2 Express-C. And that's why I am really happy that (from now on I hope) there is a "light edition" of the DB2 Express-C download image. You can still download the full edition which features text search, the Control Center, languages other than English, and some other things I consider minor. For the most of us it means less to download with the next (or this) release, thus an even faster upgrade cycle.

And when your spouse asks what you are doing, remember to point out: It's the light edition now (and fat free).

I wish they had...

Image via Wikipedia

The past few days we drove from the South of Germany to Westphalia and back. In most cases this is more or less some relaxed driving on a long stretch of Autobahn (our navigation system: "follow the road for 480 km"). This time however, we had to deal with strong downpours and related accidents as well as many construction zones. That's when I had to think about DB2 - naturally....

Some minutes on the Autobahn, a heavy downpour from almost black clouds started. Still, some drivers continued without headlights and with Bleifuss. The result was a complete stop of traffic because of an accident. One driver in a sports car had lost control of his Ultimate Driving Machine and slammed it into the divider and, with some rotations, back onto the highway which then was blocked for traffic. After one lane was mostly cleared in heavy rain, we continued for not even 5 minutes until we reached another similar accident scene. Later, we were passed again by some cars without headlight and in (too) high speed.

How does that relate to DB2 other than that DB2 can be a sports car and fun to use? I was thinking about the automatic and autonomic features. They allow you to drive your (database) engine on autopilot. Sure, some cars already switch on headlights automatically, adjust the speed of wipers, and have enough electronic to control the brakes and keep the car stabilized. But they don't seem to have a health monitor or utility throttling. Simple rules to be implemented could be "if too much rain and no sight, slow down" or "lift right foot from accelerator when no sight". Even a display to show alerts about, e.g., no grip because of 10 cm of water on the road is missing. So my hope is that with more heavy rain, people will finally learn and adjust. Which brings me to learning optimizers...

BTW: Just based on the information from the navigation system and the map of the Autobahn system, which route did we take?

Wednesday, July 28, 2010

Catalog view xmlstrings for simpler access to stringID information

The XML support in DB2 is tightly integrated into the database engine and provides fast and sophisticated processing of XML data. In the past I had explained why - for compactness and speed - element and attribute names, namespace information is replaced with so-called stringIDs. The string to stringID mappings are stored in a cached dictionary which is persisted in a system catalog.

That system catalog, SYSIBM.SYSXMLSTRINGS, an internal table, has undergone some changes over the past database versions. In DB2 9.1, pureXML support was restricted to databases using a Unicode codepage. Hence, the string information was stored in clear text in the database codepage. Users could easily access AND display the system information. In DB2 9.5, the pureXML feature could also be used in non-Unicode databases. The VARCHAR-based string column was then changed into a VARCHAR FOR BIT DATA column to store the UTF-8 codes properly. Via a new function XMLBIT2CHAR it was possible to turn the encoded information back into a readable string.

Now, in the current version DB2 9.7, life got much simpler because a catalog view SYSCAT.XMLSTRINGS was introduced. It shows the stringID, the string in the database codepage (by calling the mentioned XMLBIT2CHAR function), and the string as bit data (hex format).

Thursday, July 22, 2010

New generation of mainframes and an integrated accelerator for warehouse workloads

IBM today announced a new generation of its mainframe computers, labeled zEnterprise. Based on a hybrid architecture, zEnterprise BladeCenter Extensions (zBX), it allows to integrate and manage POWER7 blades or System x blades from within the mainframe infrastructure. (One of) the first to exploit the zBX feature is the IBM Smart Analytics Optimizer. The latter allows to significantly speed up analytic queries running on DB2 for z/OS. Queries that qualify a transparently routed to the accelerator and return in a fraction of the time.

A small step for mankind, but a good day for mainframe customer, especially those using DB2.

Wednesday, July 21, 2010

Two days in France (in August for pureXML)

If you live in France or are in Paris in the middle of August or want to be in Paris on August 17/18, here is what I **would** do: Attend a DB2 pureXML bootcamp.

Susan Malaika and I will be running an intense, packed bootcamp which covers everything pureXML from basics over XML storage and indexing to performance best practices. The bootcamp will be held at the IBM Forum Paris. And yes, it's free to attend.

Tuesday, July 13, 2010

Another year gone by: Energy consumption in our passive house

We had a long and cold Winter in Germany and also several upgrades to our electrical equipment in our passive house. So I was curious how our energy consumption would be for the last 12 months compared to the year before. This morning was the time to read out our two energy meters (still old fashioned and not smart):

Household consumption: 2473 kWh (previous year 2244 kWh)
Heating/ventilation/water: 2858 kWh (previous year 2782 kWh)

As mentioned, the Winter was long and cold and thus the small increase for heating makes sense. It would have been even bigger if we would not have "cheated" when we bought (and used) a table-top fireplace. That small fireplace has a nice flame, burns Ethanol and, of course, produces some heat. It's stronger than some candles and only needs about 0.15 l of Ethanol an hour.

For the increased household consumption one big factor is my work: I traveled less over the past 12 months and worked more and longer at home, started using a 2nd screen, and am now using an automatic coffee/espresso maker to keep the spirits high. Now add in some kids asking for more TV and computer time, some more washing cycles, and extended cooking, then the increase for the household is surprisingly small.

Friday, July 2, 2010

Passive Houses in the US

One of the many questions I got regarding our passive house (see the Wikipedia entry for many links) is whether there is a similar standard in the USA or whether I know of resources to learn more about what is available. I promised that in a blog post I would try to list some resources I found and learned about the last few years. Guess what, you just started reading that promised post.

Image via Wikipedia

When my wife and I started to look into building a passive house we - by chance - also heard an interesting radio story on NPR about such a house built out of straw in Southern California and also saw a documentary in the German children TV show "Die Sendung mit der Maus". I found the documentary on youtube:

Although it is in German, you will find several videos in your language, e.g., English, by searching for "passive house".

In Germany, the Passivhaus-Institut is trying to set and coordinate standards, provide help and research technologies. It also offers information in English. In the US there is now a Passive House Institute US. In addition to information material it also has contact data for Certified Passive House Consultants all over the US.

A passive house combines several technologies to reduce energy consumption. So it is a good idea to learn about those components individually:

solar panels are used to either heat water or to produce power
ventilation systems with heat recovery are in place to allow constant ventilation while preserving energy in the Winter and helping to keep the inside cooler in the Summer
in many houses a heat pump is used for heating and warm water
windows have to be of good quality and should be constructed such that they let sunshine in during the Winter and are covered by shade in the Summer

That's probably already more than I wanted to write in this first installment. Let me know if you have specific questions.

Forrester Analyst: Customers report success with DB2's out-of-the-box compatibility to Oracle's SQL

Forrester Research analyst Noel Yuhanna posted a blog entry titled "Database Migrations are Finally Becoming Simpler". Based on customer feedback he got, one can see 90% or more compatibility and hence the migration effort for moving from Oracle to DB2 requires only days or weeks instead of months or years. Noel Yuhanna says "This is huge".

Database customer want to save millions of dollars, but earlier a migration was costly and painful and required a larger investment of their own. With the DB2 compatibility layer the migration becomes simpler.

Tuesday, June 29, 2010

Experience: Switching the laptop from Windows to Linux

Recently I started to use a Linux-based laptop computer for work, after many years on Windows XP. Over the past years most of my private activities (on another machine) have already been on Linux, so the changes were not that big. However, there are different requirements for a private environment (emails, pictures, some games, some videos, some word processing) than for a business environment. My new system is now running on Ubuntu 9.10, my private machine is still on an older Mandriva.

The actual move was quick and included copying over all my two data directories with all kinds of documents and the Lotus Notes databases. I also needed some selected configuration files, but overall I was up and running again after about an hour.

The first trouble I had was to (persistently!) configure a two-screen environment with my laptop on the right and the monitor on the left. The default is the monitor on the right side. Depending on the hardware in the machine there are different tools to solve that. Just using the Display Preferences works fine now (most of the time).

On Windows I was a big user of hibernation and tried to avoid rebooting the machine as long as possible (2 months or longer). Hibernation is supported on Linux/Ubuntu, too, but it takes significantly longer to revive the machine than on Windows. Overall, the felt one or two minutes more is ok since it is once a day. After logging in I most of the time run into the issue that both screens display the same. Using Control+Alt+F1 and then Control+Alt+F7 switches to the correct settings. It took me a while to figure out this workaround.

As a heavy user of MS Powerpoint and MS Word I feared the switch to Lotus Symphony and OpenOffice.org the most based on experiences with my private computer. For most documents the import filters are "ok", but sometimes macros or special formatting does not work.
The biggest shock was to try to give a presentation without a presenter mode which Powerpoint has. However, after some research I found out that there is a suitable presenter console extension which seems to be even more powerful than the one in Powerpoint.

After few weeks with the new machine and OS environment, I am mostly fine as the problems above are so far the only ones I ran into. For most software there are Linux versions or suitable replacements. My USB UMTS stick for mobile Internet works (mostly? some speed issues) fine out of the box, DB2 is available on Linux (big surprise here, right?), and Firefox and Flash (here that, A!) work as well. And for one Windows-based program I had to make use of wine.

Is it worth switching? So far I would say it is a definitive yes.

Monday, June 28, 2010

Housing Transactional and Data Warehouse Workloads on System z

While screening new Redbook titles and noticed an interesting new book currently in draft status. It is labeled Housing Transaction and Data Warehouse Workloads on System z. Remember that the IBM Smart Analytics System (ISAS) is no longer an offering only available to DB2 LUW customers, but with ISAS 9600 also an offering on for the Information Management community on System z.

Thursday, June 24, 2010

Moving? Pack your tablespaces - Transportable tablespaces and schemas (sets)

Starting with DB2 9.7 FP2 it is possible to transport tablespaces and schemas. What does that mean? You take a backup image of an existing database and RESTORE the database schema(s) and associated tablespaces into another existing database.

The associated schemas and tablespaces are named a transportable set, neither of them can be partial. That is, you need all tablespaces that hold data of database objects for a given schema or given schemas as well as all schemas that have objects in the transported tablespace need to be restored. This is something to consider when planning new database layouts.

Sometimes mini-databases are realized as objects in dedicated schemas in a single database. With transportable sets, they can be moved across databases and servers.

Note that another method of moving data and schema information is db2move. db2move is based on using export and import or load and is utilizing IXF files for holding the data. In contrast, transportable sets (tablespaces and schemas) are enhancements to backup/restore.

Tuesday, June 22, 2010

SSacSA - A typical IBM acronym, but not the typical product (Move from Sybase to DB2)

When I first heard about SSacSA (it's part of the new features in DB2 9.7 FP 2), I thought that this is another typical IBM acronym I had to learn. But when I looked closer at it, and it stands for DB2 SQL Skin Feature 1.0 for applications compatible with Sybase ASE (what a name for itself!), I wasn't scared of the long name and its acronym anymore. The biggest reason is that this feature is very useful and can save customers lots of money.

With this SQL skin (it's on the very outside of the DB2 engine, in the JDBC layer) it is possible to take applications written against Sybase ASE and let them run against DB2 without too much migration effort. The trick is that SQL statements written in the Sybase dialect are caught in the JDBC driver and rewritten to DB2's version of the SQL language. The statements can then be executed in DB2 and the application doesn't notice it is now running on top of DB2.

A chat with the lab (cwtl?) is scheduled for tomorrow, Wednesday 23rd, to discuss this new offering. You can register here.

Friday, June 18, 2010

Obfuscate your DDL statements (aHJgFSG127_henRIK)

It's always interesting to take a look at the new functionality in a new (fixpack) release. DB2 9.7 FP2 adds functionality to obfuscate DDL statements. You might state that with all the parameters and options or page-long spaghetti code inside stored procedures DDL statements already are confusing. Well, now you can add another layer of obfuscation with the help of DB2.

The key to obfuscation is to know about the new WRAP function and CREATE_WRAPPED procedure in the DBMS_DDL module and the WRAPPED keyword. You can pass a string with a regular DDL statement to either routines. In the case of WRAP a string (CLOB) with the encoded, i.e., obfuscated statement is returned. When CREATE_WRAPPED is used, the statement is directly deployed into the database, i.e., it is executed and thereby the obfuscated statement text stored in the system catalog.

Obfuscation works for the following types of objects:

procedure
functions
triggers
views
PL/SQL packages and package bodies
modules to which functions or procedure are added or which are published

An obfuscated statement could look like the following:

CREATE PROCEDURE henrik.otest(col1 varchar(20)) WRAPPED SQL09072 aHJgFSG123_henRIKobFU3cAtEDtHIsSTRinGANdmaDEITun4eaDABle

The SQL09072 is a version identifier (DB2 9.7 FP2), so that DB2 remembers what version of the encoding function was used.

So try to not use any spaghetti code anymore, but start WRAPing your procedures...

Wednesday, June 16, 2010

Catching up - 5 weeks in review

Well, I am currently catching up. A very busy time, some offline vacation, a surgery with time off, and some busy days getting back "into" work have left me not posting anything for the past 5 weeks. And what has happened!

DB2 9.7 FP2 (9.7.2) is out now with some fixes and new features - more on that in a separate post. For those on DB2 Express-C, the good news is that IBM has updated the free to develop, free to deploy, free to distribute version of DB2, too.

Speaking of free products, there is a free eBook Getting started with IBM Data Studio for DB2. Actually, this is old news, but what is new is that there is now a free German version of that book, named Einstieg in IBM Data Studio für DB2.

While I am at German and Germans, I was surprised to read that SAP is buying Sybase. Apparently, this deal is to move SAP into the mobile space, not so much about Sybase's dying database business. And there was another interesting acquisition in the database area, too.
Teradata has bought xkoto and apparently during that process stopped selling new GRIDSCALE licenses. There is discussion, not just at IBM business partners like Triton Consulting, about the increased importance of the DB2 pureScale technology as it suddenly becomes the only available solution to address scalability and cluster high availability. Speaking of DB2 pureScale, the related Information Center is now open to the public.

There was so much more in the past 5 weeks, I will update you once Germany has won the soccer world championship... ;-)

Friday, May 7, 2010

Automate DB2 database maintenance

On developerWorks is a new article "Automate DB2 9.7 database maintenance in an embedded database environment". In an environment where DB2 is an embedded database system it is important to hide the existence of the DBS by making sure it just works. The article discusses how to set up the various automatic and autonomic features that DB2 provides. It's not only interesting for those dealing with embedded environments, but for any "lazy DB2" in general. There is so much other work to do that you don't want to deal with maintenance that can be automated... (and it is Friday and the weekend is up next...).

Wednesday, May 5, 2010

New uses for the DATE function?

I just read about a new dating site that will cater only to fans of Apple products. The site's name? Cupidtino. If you know the San Francisco Bay Area and Silicon Valley (or you are an Apple fan) you know that Apple's headquarter is in Cupertino, CA. What a nice name and wordplay.

When I told my wife about the dating site I also mentioned that in the database world everything is much simpler. Let's just invoke DATE() and wait for your lucky match...

Disclaimer: I and my family don't own a single Apple product.

Tuesday, May 4, 2010

Passive houses, windmills, clouds, ice saints, DB2, and some nice prizes

Well, what a headline! Full of energy and still kind of energy-conscious, full of nature and still kind of technology-centric, serious stuff and still fun. Let me explain...

As most of you probably know, my familiy is living in a passive house (very low energy house). The Winter in Germany has been long and cold and everybody spent a good chunk of money on heating. Everybody? It turns out that even though our entire house was heated throughout the Winter, we only consumed a little bit more than the year before. Yesterday evening I checked the energy meters (still not smart meters) and we are on track on being only less than 5% higher consumption for heating than last year. We will see the result around mid of July.

After some really warm weeks it is cloudy, rainy, windy, and chilly outside. Seems like the so-called ice saints have an early arrival (this is a weather pattern caused by warm land masses and still cold ocean in Northern Europe). Speaking of wind and clouds: Google just invested in windmills and IBM is buying Cast Iron for cloud integration.

What is left? DB2 and the nice prizes. IDUG started a contest "Do you DB2?" where - if you are based in North America - you can win a Wifi-enabled HDTV or an iPad. To win you need to tell your personal DB2 story. Running a DB2 server in a passive house is probably not a good story ("with product X the house keeps warmer than running DB2 because more CPU cyles are burned and more disks are needed for X and hence more heating is produced") or is it? What is your story?

Monday, April 26, 2010

New "Best Practices" for DB2 on AIX 6.1 on Power

A new IBM Redbook has been published titled "Best Practices for DB2 on AIX 6.1 on POWER Systems". The book covers both DB2 9.5 and DB2 9.7. A good chunk of the 400+ pages is spent on AIX configuration and DB2 (performance) monitoring.

Thursday, April 22, 2010

Overview: DB2 tablespaces and bufferpools

An older article on developerWorks, "DB2 Basics: Table spaces and buffer pools" has been updated for DB2 9.7. If you want a quick introduction to SMS, DMS, large and regular tablespaces, row size and column count limits, extent and prefetch sizes, etc., this is a good starter.

Monday, April 19, 2010

Happy eating: Separation of duties and DB2 security

One of the security enhancements that were delivered with DB2 9.7 is the full support for separation of duties. First of, what exactly is separation of duties? In short you are splitting up an important task or process in smaller steps and then require several people/user to complete the task. The objective is to prevent fraudulent or accidental activities and to reduce the risk and possible damage.

Combine that with auditing to keep track of who is/was doing what, encryption of data on disk plus encrypted DB2 connections. On top of that add regular roles and authorization for different tasks. Sometimes I wish I would have all that for that kitchen cabinet where my/our candy is stored.

With separation of duties my wife would be in charge of the cabinet, providing the space to store the candy (and sometimes clean up?). I would grant insert privilege to everyone because they could then add their candy to the cabinet and remove almost all other privileges. Secured connections would allow me to access (eat) without the kids eavesdropping on my chocolate consumption. Encrypted storage would make sure that even if someone would have access to the cabinet could not look inside and determine how much and what is left (my secret). With auditing I could keep track of who would try to gain access.

That would be happy eating for me. Henrik, dream on and share the candy till it becomes true...