Data Henrik: warehouse

Showing posts with label warehouse. Show all posts

Friday, April 25, 2025

Have issues? Communicate! A brief look at Db2 SQLCA

Db2 command line output with printed SQLCA

With many conflicts and problems, big and small, going on, communication is important. So, why not take a brief look at the Db2 SQL Communications Area (SQLCA)? According to the Db2 documentation, an SQLCA is a collection of variables that is updated at the end of the execution of every SQL statement. It provides the structure to pass important information from the database product to the user or application.

Db2 for your data lakehouse

Db2 as engine and data source
for the data lakehouse

A while ago, I was working with IBM watsonx.data to prepare a presentation on data lakehouse solutions. When talking about the (query) engines for a data lakehouse, typically, it is mostly about Presto and Spark. Did you know that Db2 can be used both as data source AND as query engine in watsonx.data (see screenshot)? Let's take a look...

Lakehouse: Bronze, silver, and gold levels of data

Is this a Data Lakehouse?

While working with customers and IBMers on data processing projects (to keep it as broad as possible), I often hear the talk about bronze, silver, and gold standards. These standards sometimes refer to the systems the data is stored in in terms of reliability, availability, performance, bandwith, and more. The IBM mainframe in a geographically dispersed parallel sysplex configuration may be considered such a gold standard. Lately, bronze / silver /gold standards are more frequently heard in the context of Data Lakehouse architectures and data sources or data zones. So, what is bronze, silver, and gold when discussing data and data lakehouse?

New Redbook: Leveraging DB2 10 for High Performance of Your Data Warehouse

A new (almost) and very interesting IBM Redbook has been published last week: Leveraging DB2 10 for High Performance of Your Data Warehouse. Though the title says "DB2 10", parts of the book deal with DB2 10.5 and the BLU Acceleration technology. And the reason I said "almost new" is that the book is partially based on the redbook from 2010 "InfoSphere Warehouse: A Robust Infrastructure for Business Intelligence". As you can see, DB2 has a proven track record of supporting Data Warehouses and Data Marts.

The new book shows you how to get started with column-oriented storage that is key to benefitting from BLU Acceleration. It covers how to create data marts, load them, understand query plans (including the CTQ operator) and also how to monitor the system.

Similar to many other IBM Redbooks, this one is available for download as PDF, as EPUB or to view in HTML.

BTW: You can find all my DB2 BLU-related articles using this link.

Monday, October 17, 2011

WOW for MDC on MQT - How to speed up the load process for a warehouse

When I was teaching a performance bootcamp earlier this month, one of the topics was about combining multi-dimensional clustering (MDC) with range-partitioned tables. Then a question came up about using MDC with materialized query tables (MQT) and whether it was possible to combine the two techniques. The customer hadn't succeeded before. As I didn't see a reason for why not combining the two I looked for documentation and examples: The DB2 Information Center describes how to combine MDC with MQT.

With the MQTs some of the heavy queries can be sped up by precomputing the answers for common complex subqueries. Using MDC for the MQTs can improve performance even further, depending on what can be clustered and how many dimensions are present.

I also briefly tested db2look and how it reproduces the DDL for it and the combination is supported by the tool, too.

For the customer it was a "Wow!" for the MDC on MQT, for the users it will mean faster queries...

Thursday, July 22, 2010

New generation of mainframes and an integrated accelerator for warehouse workloads

IBM today announced a new generation of its mainframe computers, labeled zEnterprise. Based on a hybrid architecture, zEnterprise BladeCenter Extensions (zBX), it allows to integrate and manage POWER7 blades or System x blades from within the mainframe infrastructure. (One of) the first to exploit the zBX feature is the IBM Smart Analytics Optimizer. The latter allows to significantly speed up analytic queries running on DB2 for z/OS. Queries that qualify a transparently routed to the accelerator and return in a fraction of the time.

A small step for mankind, but a good day for mainframe customer, especially those using DB2.

Friday, December 4, 2009

The importance of correct and integrated data

You can deliver a product as fast and "good" as promised, but still loose a repeat customer. Here is a real life example.

Over the past days I ordered several books online. Usually, I use one specific company "A" (any guesses?) for two reasons. First, in Germany by law all German language books have to be sold for the list price, neither discounts nor coupons can be applied. "A" is offering free shipping on all books, regardless of their list price. Second, "A" is usually very quick in delivering the books and provides tracking information. I used this company as usual for some books.

Then, for an English book where every book seller can offer individual prices, I shopped around and found that I could save 2 EUR over at company "B". The product details showed that the book was in stock and would be sent the next day. I ordered the book in the evening and then started to wait for a confirmation email. And waited. The next morning I received 2 emails. The first was the acknowledgement of my order. The second was a status change email - now the book wasn't in stock anymore, but had a delivery time of 3 weeks. Ouch.
I checked the order status online and then the product page again. Both showed the 3 weeks for delivery. Why not cancel the order and shop somewhere else? I sent an email via contact form, tried to cancel online, and called the service hotline. On the phone they were friendly, confirmed the change to 3 weeks, wondered why it had changed over night and pointed out that it would be hard to cancel, but they would try anyway. The reason was they are working with a 3rd party fulfillment center. One day later, first the status said "trying to cancel", then it changed to "package sent", and in the afternoon the book arrived.

Company "B" delivered the product I ordered on time. However, their data isn't integrated and both customer and customer service are left to wonder about what is correct. Will I shop there again? Not this year... ;-)

Sunday, November 1, 2009

How much data do I generate each day?

During vacation, I somehow got wondering how much data I generate each day. This is for all (in-) activities in a 24 hour slot. To make it simpler, I decided to only consider "private" activities.

Here is a list of activities, starting from night to day

Over night a lot of mail and packages are moved, leaving tracking data in databases.
There are also all kind of batch jobs running to produce statements, dig into details of transactions from the day before, and much more.
Because several alarm clocks are running, devices being on standby, and of course fridge, freezer, heating system, etc. performing their duties, energy consumption could be (e-)metered, leaving entries at the utilities. During the day, all kinds of electrical devices (stove, computer, microwave, washer/dryer, lights, etc.) will add data.
By making phone calls, connection records (and who knows what more?) is left. If I call a hotline, sometimes phone calls are recorded, often records added to CRM databases, etc.
When doing (grocery) shopping, a list of my items is added to the transactions. Maybe automatically some items are re-ordered.
Using electronic payment, a credit card/check card debit action is performed.
For paying bills, an electronic fund transfer with associated records at several banks is done.
When surfing, my traces are left in web logs, cookies are produced, "recommendations" are processed, Google and other ad networks/analytics/tracking companies are adding to my profile, my IP connections are stored, my DSL provider probably keeps additional information.
When parking in a downtown garage, my license plate is recorded during entry/exit.
The local library keeps a list (and maybe a history) of books on my and the kids' account.
and much more

What bigger items do I miss? How muck kBytes or MBytes is it every day? Should I be concerned or even frightened? Your comments are welcome...

Friday, September 18, 2009

Selling pre-owned cars (and Oracle Exadata)

The video of the Exadata v2 launch webcast is now up. I wrote earlier this week about this event. Of that video I especially liked the first few minutes because they reminded me of the Interboot which starts this weekend here where I live. The entire video also reminded me of how used or "pre-owned" cars are sold.

To impress prospective buyers a salesman would point out that the car does not have just 4 wheels like every other car around, but even 5 or even 6. It would not just have one sun roof, but three, not 4 doors, but 7. It is the best, the fastest, the best, and to top it of, the best. It wouldn't matter whether it makes sense in terms of the big picture, only whether some parts are impressive. And don't even try asking such a sales guy about the price. You wouldn't get a straight answer, at least not that easily.

Switching back to the video, I highly recommend watching it - especially on a Friday. Regarding cost of the Exadata package, here is a closer look. For the big picture, you should ask yourself about what is happening to those customers who bought into the (now dead) Oracle/HP cooperation? Why not just use SSD disks instead of a cache? Why not SPARC instead of Intel? Does RAC really scale to Petabytes? And why sailing and an ocean of tears as introduction...?

Thursday, September 3, 2009

Moving and transforming (XML) data

Storing data, regardless of whether it is relational or XML, is one thing. Moving it across databases and transforming data on-the-fly, is another. A two-part series about using (IBM InfoSphere) DataStage and DB2 pureXML together provides insight into how to extract, transform, and move XML data, e.g., to integrate XML data into a data warehouse. Part 1 is titled "Integrate XML operational data into a data warehouse", part 2 named "Building an XML-enabled data warehouse".

Tuesday, May 12, 2009

DB2 9.7 (Cobra) for extended vacations?!

One of the things you have to adjust to when moving from Germany to the US or back is the number of vacation days. I had a hard time explaining to friends and family why I - with a Ph.D., work experience, a contract with 28 vacation days and plenty of public holidays - would accept a position in the US with only 10 vacation days and few holidays. After arriving in the new place, it took a while to figure out how to live with fewer vacation days (and the secret behind it).

Now, after moving back to Germany and suddenly HAVING to spend 30 vacation days a year it also requires some mental changes. You have to plan ahead and decide when to take vacation because you don't want to end up in November with two months off in a row (which does not work anyway). Having vacation days can be hard.

In the past, I got to hear discussions about how efficient different societies or workplaces are and different speed of work, why in some countries there is more or less vacation, and different strategies on how to deal with it. That's not what I wanted to discuss today.

I was reminded about the vacation discussions over the last week when we compared the DB2 9.5 and DB2 Cobra performance for a customer warehousing workload. Many queries completed on Cobra in only 80% of the time it took on DB2 9.5, some even only needed 30% of the previous time. The improvements can be attributed to the compression of temp tables and indexes, local index support for range-partitioned tables and general runtime enhancements. I was and I still am impressed with the results. And it got me thinking about what that customer and its employees could do as a consequence of the outstanding results. They could either do more in less time now or enjoy some additional days off. What would you do...?

Friday, April 24, 2009

XML arrives in the warehouse

More and more data is generated, sent, processed, (even!!!) stored as XML. Application developers, IT architects, and DBAs get used to XML as part of their life. Let's get ready for the next step, XML in the enterprise warehouse. Why? To bring the higher flexibility and faster time-to-availability (shorter development/deployment time) to the core information management systems, the base for critical business decisions, the enterprise warehouses. Greater flexibility and shorter reaction times are especially important during economic phases like the currently present one.

DB2 9.7 ("Cobra") adds support for XML in the warehouse. What is even better is that with XML compression and index compression cost savings can be realized. Right now not many of the analytic and business intelligence tools support XML data. Fortunately, both the SQL standard and DB2 feature a function named XMLTABLE that allows to map data from XML documents to a regular table format, thereby enabling BI tools to work with XML data.

My colleagues Cindy Saracco and Matthias Nicola have produced a nice summary and introduction of warehousing-related enhancements of pureXML in the upcoming DB2 9.7 release. The developerWorks article is titled "Enhance business insight and scalability of XML data with new DB2 V9.7 pureXML features". Among other things the paper explains how XML can now be used with range partitioning, hash (or database) partitioning, and multi-dimensional clustering. Enjoy!

Data Henrik

Pages