Data Henrik: enterprise warehouse

Showing posts with label enterprise warehouse. Show all posts

Wednesday, March 12, 2025

Db2 for your data lakehouse

Db2 as engine and data source
for the data lakehouse

A while ago, I was working with IBM watsonx.data to prepare a presentation on data lakehouse solutions. When talking about the (query) engines for a data lakehouse, typically, it is mostly about Presto and Spark. Did you know that Db2 can be used both as data source AND as query engine in watsonx.data (see screenshot)? Let's take a look...

CeBIT: Goldsmith in the Hybrid Cloud - How to Create Value from Enterprise Data

Gold Nuggets - Data as Gold

Data, data, data. There is a lot of data, data already stored or archived, and data about to be produced, generated, measured, or even missing data. But there is not always value in the accessible data, even though data is considered the new gold. Similar to the real gold and a goldsmith creating jewels, data first needs to be worked on, refined, transformed and made interesting to consumers, turned into information or insight.

New Redbook: Leveraging DB2 10 for High Performance of Your Data Warehouse

A new (almost) and very interesting IBM Redbook has been published last week: Leveraging DB2 10 for High Performance of Your Data Warehouse. Though the title says "DB2 10", parts of the book deal with DB2 10.5 and the BLU Acceleration technology. And the reason I said "almost new" is that the book is partially based on the redbook from 2010 "InfoSphere Warehouse: A Robust Infrastructure for Business Intelligence". As you can see, DB2 has a proven track record of supporting Data Warehouses and Data Marts.

The new book shows you how to get started with column-oriented storage that is key to benefitting from BLU Acceleration. It covers how to create data marts, load them, understand query plans (including the CTQ operator) and also how to monitor the system.

Similar to many other IBM Redbooks, this one is available for download as PDF, as EPUB or to view in HTML.

BTW: You can find all my DB2 BLU-related articles using this link.

Monday, October 17, 2011

WOW for MDC on MQT - How to speed up the load process for a warehouse

When I was teaching a performance bootcamp earlier this month, one of the topics was about combining multi-dimensional clustering (MDC) with range-partitioned tables. Then a question came up about using MDC with materialized query tables (MQT) and whether it was possible to combine the two techniques. The customer hadn't succeeded before. As I didn't see a reason for why not combining the two I looked for documentation and examples: The DB2 Information Center describes how to combine MDC with MQT.

With the MQTs some of the heavy queries can be sped up by precomputing the answers for common complex subqueries. Using MDC for the MQTs can improve performance even further, depending on what can be clustered and how many dimensions are present.

I also briefly tested db2look and how it reproduces the DDL for it and the combination is supported by the tool, too.

For the customer it was a "Wow!" for the MDC on MQT, for the users it will mean faster queries...

Wednesday, December 9, 2009

Holiday Preparations - Your action required!

I recently wrote about how companies make use of their address data during the last weeks of the year. Today, I thought to join the **fun** of ongoing holiday activities and start my own giveaway/sweepstakes. I will be crowning my "Reader of the Year" later this month. To be eligible, you don't have to submit silly forms, post holiday videos to YouTube, solve puzzles like "December is the ____ month of the year" or get your kids to paint your family.

All I am asking is that you leave some basic information, similar to other sweepstakes. Please comment on this post, with your name, your full address, your birthday, your annual income (full dollars/Euros/etc. is ok), your occupation, your detailed family status, and your highest degree of IT certification (no school information required!).

P.S.: And remember the fun part.

Tuesday, December 8, 2009

Let's phone, shop, and whatever.... - Testing new limits

For most of the activities these days data will be produced, ranging from your life and household to administration, manufacturing or services. A lot of the data will end up in data warehouses, either directly or in some condensed, aggregated, distilled way. And while people were talking about 1 TB or 10 TB warehouses only few years ago, scaling up to 100s of Terabyte or even Petabytes (PB) is discussed often now.

One of the enhancements in DB2 9.7 is for addressing this trend. Up to version 9.5 distribution maps were limited to 4096 entries (4 kB), now up to 32768 entries are possible. In a partitioned database the distribution key, i.e., the columns used to figure out on which database partition the entire row ends up, is important because it determines how evenly the data is split between the partitions. The more evenly balanced the distribution is, the better balanced typically the performance is.

To assign a database partition, the distribution key is hashed to an entry in the distribution map. The more entries in the maps, the smaller the skew. With the new increased distribution map in DB2 9.7, the skew remains small even for databases with a larger number of database partitions.

How do you test it? Increase your calling, shopping, driving, consuming. This will not only kick-start the economy, but also grow the enterprise warehouse and make sure new limits are tested (and introduced)...

Monday, October 12, 2009

Mixing cookies, sales records, and DB2 pureScale

This is one of the many stories where personal life and IT life intermix - or is it even always the case? Today my post is about cookies (real ones, nothing about visitor tracking or search engine optimization, SEO), it is about the upcoming holiday season (or are we in it already?), and about database technology, namely DB2 pureScale. But let's get started with the cookies first...

In Germany and some other countries, we have some types of cookies which are only available in the Christmas/holiday season. One of them, and my favorite, are Speculaas (or in German Spekulatien). It's a spiced shortcrust biscuit and they taste very well on their own or when soaked in milk. As mentioned, they are only available throughout the "Season" which these days seems to be from September to December. My wife has been trying to keep the "Christmas is not in September" tradition. And so I have been arguing, pleading, begging for a couple weeks that Speculaas are just some regular cookies. I tried it by pointing out that there is nothing special about these cookies, they could have been sold out this year so far and that we are lucky to have them back. I tried it with arguing that if we don't buy them, they could be sold out by Christmas. Anyway, after much back and forth we now are close to opening the second box. I emptied most of the first box one morning last week in my home office when everybody else was out. My kids also seem to like Speculaas which makes it easier for me in the afternoons...

With Speculaas in the house and the shops full of holiday articles, it dawned on me that we are approaching the Christmas season (and in some countries, like the US, Thanksgiving or other big festivities coming up even earlier). This is the season where additional store clerks are needed and when additional processing power needs to be available for web shops, in the database systems, and all back infrastructure that enables the upcoming peak sales/revenue period. It's the time where sales records are reached and everybody is busy in doing their part to help the economy.

Last week, DB2 pureScale was announced. It helps to horizontally scale up database processing power and achieve high availability of the database system. The key is that new machines can be seamlessly added to a database cluster to increase the system throughput. While there may not be much performance needed in, e.g., Summer, peak performance is needed throughout the season. Using pureScale it is simple to added that needed additional capacity. While it may be possible to move to a bigger machine (vertical scaling), it is not practical in terms of effort or benefits. Having a DB2 cluster also helps with even higher system availability. With DB2 pureScale it is possible to quiesce one member (machine) and service it. Or if parts of your cluster fails, the others are still available and let your business continue. All this is transparent to the database application. It doesn't know whether it is running on a regular or clustered DB2.

Many new computers (desktops, laptops, nettops) are sold during the season, often replacing older, less powerful systems. If there would be something like DB2 pureScale, you would just add another module to your existing computer and add processing power. If one module is broken, your photos, videos, audio streaming, etc. would still be accessible and you could continue with parts of your processing power and repair the failing components. What a thought!

Now it is time for the morning coffee and some cookies (guess which!)...

Tuesday, June 2, 2009

Buy groceries, produce XML, feed DB2, gain weight insight

By now some of you might have figured I was on vacation. It was very relaxing with almost no emails and no IBM - and no blog. Once during the vacation I was reminded by my kids about what I do. When we shopped for some groceries, the kids had to point out (loudly and in public of course) that the cash register/POS terminal was from IBM.

And then I had to briefly think about what we were doing. By buying the groceries we would create a nice transaction - actually several because we paid with electronic cash. Later, the different sales slips would be transferred in XML format to the company's headquarter and eventually fed into the central enterprise warehouse. In many companies the POS transactions are already shipped as XML because of its flexibility and simple way of looking into transmission issues. However, once received most companies today shred (or "decompose") the XML files before the data even reaches a database. A lot of information could be lost during that phase, sometimes even data integrity or security is at risk. These are some of the reasons why more and more companies are looking into directly feeding the XML files with the POS slips into the database, e.g., the enterprise warehouse, operational data stores, or central staging areas for other backbone systems.

DB2 with its pureXML functionality can be of great help because it allows to store the XML data in its native format, keeping all the information. Using SQL/XML and XQuery it is possible to look into the data and analyze it. Functions such as XMLTABLE allow to present XML data in relational table structures to support BI tools that are not (yet) XML-enabled.

It's good to know that the upcoming DB2 9.7 (see the early acces program and the announcement) improves the functionality for gaining "XML Insight" in data warehouses even further (XML data in range partitioning, multi-dimensional clustering, database partitioning). Really all information that is contained in the POS data can be used, the flexibility that XML offers can be carried along the entire chain, IT processes can be simplified and costs saved, and the goods and prices in the stores adapted to the market needs.

And the latter is what we - as shoppers - are looking for. But I did not tell all of the above to my kids when we left the supermarket as it would have certainly lowered their level of happiness with the new ice cream...

Friday, April 24, 2009

XML arrives in the warehouse

More and more data is generated, sent, processed, (even!!!) stored as XML. Application developers, IT architects, and DBAs get used to XML as part of their life. Let's get ready for the next step, XML in the enterprise warehouse. Why? To bring the higher flexibility and faster time-to-availability (shorter development/deployment time) to the core information management systems, the base for critical business decisions, the enterprise warehouses. Greater flexibility and shorter reaction times are especially important during economic phases like the currently present one.

DB2 9.7 ("Cobra") adds support for XML in the warehouse. What is even better is that with XML compression and index compression cost savings can be realized. Right now not many of the analytic and business intelligence tools support XML data. Fortunately, both the SQL standard and DB2 feature a function named XMLTABLE that allows to map data from XML documents to a regular table format, thereby enabling BI tools to work with XML data.

My colleagues Cindy Saracco and Matthias Nicola have produced a nice summary and introduction of warehousing-related enhancements of pureXML in the upcoming DB2 9.7 release. The developerWorks article is titled "Enhance business insight and scalability of XML data with new DB2 V9.7 pureXML features". Among other things the paper explains how XML can now be used with range partitioning, hash (or database) partitioning, and multi-dimensional clustering. Enjoy!

Data Henrik

Pages