Some years back IBM was advertising a very special box. In the commercial they showed what it could be good for. One thing is definitely missing: ice cream. Today is a hot day here in the South of Germany and I am looking forward to some ice cream after work and then some fun at the Kulturufer ("shore of culture" - a well-known tent festival at the shores of Lake Constance in Friedrichshafen with music, theater, comedy, magic, and much more).
But now back to the magic box:
Henrik's thoughts on life in IT, data and information management, cloud computing, cognitive computing, covering IBM Db2, IBM Cloud, Watson, Amazon Web Services, Microsoft Azure and more.
Friday, July 27, 2012
Tuesday, July 17, 2012
Another year passed by: Energy consumption in a passive house
In the first year of living in the new house I kind of read the energy meters more or less weekly, then it turned into bi-weekly to monthly activity. The past year I looked at the meters mostly during the Winter months and almost forgot to read them out last weekend when another year had passed. When you look at the energy consumption, there are surprising numbers:
Even though the past year was a leap year (and had a leap second!!!) the consumption was the same. I had hoped to have a smaller number, but we had some very chilly weeks in February. Household consumption has also stayed constant with most consumers now in place. Most of it should come from cooking, laundry and my home office with computer, monitor, phones, etc.
Our house does not have a smart meter where you can track consumption online and real-time. However, I talked with some "owners" and the bottom line is that after the first few weeks they barely make use of it. It is similar to reading our old-fashioned meters more frequently in the beginning to learn and look for exceptions. Then leaning back once everything seems to run smoothly. Any similarities to a well-tuned database system and regular workloads...?
2012 | 2011 | 2010 | 2009 | |
---|---|---|---|---|
Household (kWh) | 2597 | 2605 | 2473 | 2244 |
Heating/Ventilation/Water (kWh) | 2713 | 2713 | 2858 | 2782 |
Even though the past year was a leap year (and had a leap second!!!) the consumption was the same. I had hoped to have a smaller number, but we had some very chilly weeks in February. Household consumption has also stayed constant with most consumers now in place. Most of it should come from cooking, laundry and my home office with computer, monitor, phones, etc.
Our house does not have a smart meter where you can track consumption online and real-time. However, I talked with some "owners" and the bottom line is that after the first few weeks they barely make use of it. It is similar to reading our old-fashioned meters more frequently in the beginning to learn and look for exceptions. Then leaning back once everything seems to run smoothly. Any similarities to a well-tuned database system and regular workloads...?
Labels:
energy saving,
Life,
passive house,
Passivhaus,
power consumption,
smart metering
Friday, July 13, 2012
Big Data, Big Problems, Right Questions?
The term "Big Data" has been around for a while. Most of my time I have dealt with problems around traditional data (relational data), new data structures, non-traditional data, semi-structured data, in-memory data, on-disk data, hard-to-find data, lost data, object-oriented data, object-relational data, old data, new data, archive data, wrong data, corrupted data, THE right data, and all kinds of other data.
Almost daily I receive emails or letters where my name is misprinted, the address is incorrect, or where the email was not even intended for me (long story). Few days ago I listened to a longer discussion about problems with various kind of applications at several companies, all non-IBM-related. One key problem was that due to failing components in a too complex environment, data got outdated, was missing or is incorrect. The consequences impacted the "regular guys", ranging from having no or incorrect parts in an assembly line over not updated timesheets to not being able to "badge in". When I talk with my students at the Cooperative State University (who all have a job in the industry), many of them have seen or had to deal with incorrect or outdated data and "information" based on it.
The issues remind me of one of the first lessons in physics at school: "Garbage in, garbage out". For Big Data, the amount of data is in a different dimension, but the same principles as with the other data apply. It is important to know that there are or could be issues with the input data. How is that dealt with? Does it impact the algorithms and the result? Is the derived "information" really information? How is the data interpreted? Is it ok to base decisions on them or is it taken as just one of many indicators? Do those dealing with information found from Big Data know the source, is everything fully disclosed? There are many more questions, many of them non-technical. One important question is what questions to ask against the data. Is this to "proof" some speculation and guesswork or to really find out something new that may be validated by further work? And then we are back to a problem we still face since years with existing, "small" data.
Now to a small problem and statistics...:
Over the weekend we will have big fireworks locally here over the Lake Constance. My kids want to watch them for sure. I could point out that only 20% of the persons I considered for my statistical findings will attend. My kids could respond that 70% will attend. They chose people aged 8-80 years, I those aged 0-8 years...
The context counts and asking the right questions, regardless of small or big data.
Almost daily I receive emails or letters where my name is misprinted, the address is incorrect, or where the email was not even intended for me (long story). Few days ago I listened to a longer discussion about problems with various kind of applications at several companies, all non-IBM-related. One key problem was that due to failing components in a too complex environment, data got outdated, was missing or is incorrect. The consequences impacted the "regular guys", ranging from having no or incorrect parts in an assembly line over not updated timesheets to not being able to "badge in". When I talk with my students at the Cooperative State University (who all have a job in the industry), many of them have seen or had to deal with incorrect or outdated data and "information" based on it.
The issues remind me of one of the first lessons in physics at school: "Garbage in, garbage out". For Big Data, the amount of data is in a different dimension, but the same principles as with the other data apply. It is important to know that there are or could be issues with the input data. How is that dealt with? Does it impact the algorithms and the result? Is the derived "information" really information? How is the data interpreted? Is it ok to base decisions on them or is it taken as just one of many indicators? Do those dealing with information found from Big Data know the source, is everything fully disclosed? There are many more questions, many of them non-technical. One important question is what questions to ask against the data. Is this to "proof" some speculation and guesswork or to really find out something new that may be validated by further work? And then we are back to a problem we still face since years with existing, "small" data.
Now to a small problem and statistics...:
Over the weekend we will have big fireworks locally here over the Lake Constance. My kids want to watch them for sure. I could point out that only 20% of the persons I considered for my statistical findings will attend. My kids could respond that 70% will attend. They chose people aged 8-80 years, I those aged 0-8 years...
The context counts and asking the right questions, regardless of small or big data.
Tuesday, July 10, 2012
Index something non-existing - functional indexes in DB2
After discussing hidden (non-visible) columns last week, indexing something non-existing fits into the same theme. Let's shed some light into this seemingly mysterious topic...
XML documents and XML-based data modeling have been around for quite some years now. DB2 started its XML support with the XML extender in the late 1990ies (version 7), DB2 pureXML shipped initially with version 9.1. Advantages of using XML include flexibility of what is included in a document (or set of documents) and how sparse data can be represented. In relational NULL values or other "empty" values need to be stored even if data for a property (column) does not exist, in XML documents that particular property could be just left off, i.e., nothing is stored. However, the "nothing is stored", the non-existing data introduced a problem for searching efficiently in some use cases. How do you find all customers that do not have a certain property, e.g., have not received the marketing letter yet or do not own a car?
To address this kind of problem, DB2 10.1 now allows certain functional indexes over XML data (when does it come for relational data - any guesses...?). One of the functions allowed in such an index definition is fn:upper-case(). That way an index supports case insensitive searches over strings:
The other supported function is fn:exists() which allows to index existence or non-existence of an element or attribute. Thus, an index can be utilized to search even for something that is not (directly) stored in the database, i.e., implied information.
SELECT * FROM customers WHERE
XMLEXISTS('$CINFO/customers/sw[not(fn:exists(db2version))]');
The above query would return all customers who don't have any DB2 version of software installed. Maybe they are reading this article and then will install DB2 soon...?!
XML documents and XML-based data modeling have been around for quite some years now. DB2 started its XML support with the XML extender in the late 1990ies (version 7), DB2 pureXML shipped initially with version 9.1. Advantages of using XML include flexibility of what is included in a document (or set of documents) and how sparse data can be represented. In relational NULL values or other "empty" values need to be stored even if data for a property (column) does not exist, in XML documents that particular property could be just left off, i.e., nothing is stored. However, the "nothing is stored", the non-existing data introduced a problem for searching efficiently in some use cases. How do you find all customers that do not have a certain property, e.g., have not received the marketing letter yet or do not own a car?
To address this kind of problem, DB2 10.1 now allows certain functional indexes over XML data (when does it come for relational data - any guesses...?). One of the functions allowed in such an index definition is fn:upper-case(). That way an index supports case insensitive searches over strings:
CREATE INDEX customers_email_idx ON customers(cinfo)
GENERATE KEYS USING XMLPATTERN '/customer/contact/email/fn:upper-case(.)'
AS SQL VARCHAR(80);
SELECT * FROM customers WHERE
XMLEXISTS('$CINFO/customer/contact/email[fn:upper-case(.)="HENRIK.LOESER AT GMAIL.COM"]');
XMLEXISTS('$CINFO/customer/contact/email[fn:upper-case(.)="HENRIK.LOESER AT GMAIL.COM"]');
The other supported function is fn:exists() which allows to index existence or non-existence of an element or attribute. Thus, an index can be utilized to search even for something that is not (directly) stored in the database, i.e., implied information.
CREATE INDEX customers_db2vers_idx ON customers(cinfo)
GENERATE KEYS USING XMLPATTERN '/customer/sw/fn:exists(db2version)'
SQL AS VARCHAR(1);
XMLEXISTS('$CINFO/customers/sw[not(fn:exists(db2version))]');
The above query would return all customers who don't have any DB2 version of software installed. Maybe they are reading this article and then will install DB2 soon...?!
Monday, July 9, 2012
Another quiz: How to index something non-existing?
To start off this week with the right attitude, here is another quiz:
Imagine that I would like to store address data and information about my contacts ("friends"?). I want to keep track of when I met them. In my database I want to efficiently query whom of my contacts I met on a specific date. From database theory I know that indexing the data would help. But I would also like to search for persons I never met. How could I support that with an index in DB2? How could the data be stored?
SQL statements will be in the solution... [Update: The solution has been posted.]
Imagine that I would like to store address data and information about my contacts ("friends"?). I want to keep track of when I met them. In my database I want to efficiently query whom of my contacts I met on a specific date. From database theory I know that indexing the data would help. But I would also like to search for persons I never met. How could I support that with an index in DB2? How could the data be stored?
SQL statements will be in the solution... [Update: The solution has been posted.]
Wednesday, July 4, 2012
Implicitly hidden columns in DB2
On Monday I posted a small quiz about an SQL error reported by DB2. In the comments were guesses about the solution, including generated columns, triggers, or the table actually being a view causing the error. However, the reason for the processing problem is a different one and it was also mentioned in the comments: The presence of an implicitly hidden column.
The concept of implicitly hidden columns was first introduced in DB2 9.5 for the purpose of easier and faster optimistic locking. A column defined as IMPLICITLY HIDDEN is not returned as part of a SELECT statement if not explicitly referenced. In DB2 9.5 and DB2 9.7 this feature was limited to a ROW CHANGE TIMESTAMP column type to implement the optimistic locking. With DB2 10.1 this has changed as you can see at the CREATE TABLE statement and a special page with an introduction to the concept of hidden columns. Now, you can hide all kinds of columns.
Here is how I defined the table that I used in the quiz:
As you can see, the third column is hidden. Then I performed an insert which resulted in the reported error:
db2 => insert into ih values(1,'a')
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL0407N Assignment of a NULL value to a NOT NULL column "TBSPACEID=2,
TABLEID=257, COLNO=2" is not allowed. SQLSTATE=23502
What is interesting to note is the following:
db2 => describe select * from ih
Column Information
Number of columns: 2
SQL type Type length Column name Name length
-------------------- ----------- ------------------------------ -----------
496 INTEGER 4 ID 2
449 VARCHAR 30 S 1
Performing a DESCRIBE TABLE shows all three columns.
Finally, if you are sick of all the hidden stuff, you can act as a magician and let the columns reappear again:
db2 => alter table ih alter s2 set not hidden
DB20000I The SQL command completed successfully.
db2 => describe select * from ih
Column Information
Number of columns: 3
SQL type Type length Column name Name length
-------------------- ----------- ------------------------------ -----------
496 INTEGER 4 ID 2
449 VARCHAR 30 S 1
448 VARCHAR 30 S2 2
Now I will disappear by altering my state, coffee is waiting...
The concept of implicitly hidden columns was first introduced in DB2 9.5 for the purpose of easier and faster optimistic locking. A column defined as IMPLICITLY HIDDEN is not returned as part of a SELECT statement if not explicitly referenced. In DB2 9.5 and DB2 9.7 this feature was limited to a ROW CHANGE TIMESTAMP column type to implement the optimistic locking. With DB2 10.1 this has changed as you can see at the CREATE TABLE statement and a special page with an introduction to the concept of hidden columns. Now, you can hide all kinds of columns.
Here is how I defined the table that I used in the quiz:
db2 => create table ih(id int unique not null, s varchar(30), s2 varchar(30) not null implicitly hidden)
DB20000I The SQL command completed successfully.
DB20000I The SQL command completed successfully.
As you can see, the third column is hidden. Then I performed an insert which resulted in the reported error:
db2 => insert into ih values(1,'a')
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL0407N Assignment of a NULL value to a NOT NULL column "TBSPACEID=2,
TABLEID=257, COLNO=2" is not allowed. SQLSTATE=23502
What is interesting to note is the following:
- Best practice and strongly advised is to have a default value defined for hidden columns to not run into such errors.
- In computer science many enumerations start with zero, not one. The column numbers, table identifiers, and tablespace identifiers in DB2 are no exception. Hence is the column with COLNO=2 the THIRD column which could mislead you. In the description for the error SQL0407N is actually a SQL query provided that helps resolve the mystery.
- Similar issues are possible with so-called data movement utilities like IMPORT and LOAD as you might have or not have data for the hidden column.
db2 => describe select * from ih
Column Information
Number of columns: 2
SQL type Type length Column name Name length
-------------------- ----------- ------------------------------ -----------
496 INTEGER 4 ID 2
449 VARCHAR 30 S 1
Performing a DESCRIBE TABLE shows all three columns.
db2 => describe table ih
Data type Column
Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
ID SYSIBM INTEGER 4 0 No
S SYSIBM VARCHAR 30 0 Yes
S2 SYSIBM VARCHAR 30 0 No
3 record(s) selected.
Data type Column
Column name schema Data type name Length Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
ID SYSIBM INTEGER 4 0 No
S SYSIBM VARCHAR 30 0 Yes
S2 SYSIBM VARCHAR 30 0 No
3 record(s) selected.
Finally, if you are sick of all the hidden stuff, you can act as a magician and let the columns reappear again:
db2 => alter table ih alter s2 set not hidden
DB20000I The SQL command completed successfully.
db2 => describe select * from ih
Column Information
Number of columns: 3
SQL type Type length Column name Name length
-------------------- ----------- ------------------------------ -----------
496 INTEGER 4 ID 2
449 VARCHAR 30 S 1
448 VARCHAR 30 S2 2
Now I will disappear by altering my state, coffee is waiting...
Monday, July 2, 2012
A small DB2 SQL quiz - why the SQL error?
I have been playing with some new DB2 10.1 features and thought that a small SQL quiz at the beginning of the week is appropriate. Consider the following SQL statements and the related output:
db2 => insert into ih values(1,'a')
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL0407N Assignment of a NULL value to a NOT NULL column "TBSPACEID=2,
TABLEID=257, COLNO=2" is not allowed. SQLSTATE=23502
db2 => select * from ih
ID S
----------- ------------------------------
0 record(s) selected.
db2 => describe select * from ih
Column Information
Number of columns: 2
SQL type Type length Column name Name length
-------------------- ----------- ------------------------------ -----------
496 INTEGER 4 ID 2
449 VARCHAR 30 S 1
How was the table defined or why is it giving an error? Any guesses? Post your answer and explanation as comment. I will post the solution in a couple days. [Edit: done and linked]
db2 => insert into ih values(1,'a')
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL0407N Assignment of a NULL value to a NOT NULL column "TBSPACEID=2,
TABLEID=257, COLNO=2" is not allowed. SQLSTATE=23502
db2 => select * from ih
ID S
----------- ------------------------------
0 record(s) selected.
db2 => describe select * from ih
Column Information
Number of columns: 2
SQL type Type length Column name Name length
-------------------- ----------- ------------------------------ -----------
496 INTEGER 4 ID 2
449 VARCHAR 30 S 1
How was the table defined or why is it giving an error? Any guesses? Post your answer and explanation as comment. I will post the solution in a couple days. [Edit: done and linked]
Subscribe to:
Posts (Atom)