Wednesday, March 16, 2016

CeBIT: Goldsmith in the Hybrid Cloud - How to Create Value from Enterprise Data

Gold Nuggets - Data as Gold
Data, data, data. There is a lot of data, data already stored or archived, and data about to be produced, generated, measured, or even missing data. But there is not always value in the accessible data, even though data is considered the new gold. Similar to the real gold and a goldsmith creating jewels, data first needs to be worked on, refined, transformed and made interesting to consumers, turned into information or insight.

Friday, March 11, 2016

Coincidence? CeBIT visitors and weather featuring Jupyter Notebooks, Spark and dashDB

Jupyter Notebook via Bluemix
Next week I am going to talk at the CeBIT fair in Hanover. As usual I am interested in how the weather will be. And with every conference or fair a common question is about attendance. Why not combine the two, analyse past CeBIT weather and visitor count for some Friday fun? Today I am going to look into Jupyter Notebooks on Apache Spark with some Open Data stored in dashDB, all available via IBM Bluemix.
(Note that I am in a hurry and don't have time for detailed steps today, but that I share the sources and will add steps later on.)

The screenshot on the right is the result of what I am going to produce today. The source file for the notebook, the exported HTML file, input data, etc. can be found in this GitHub repository. If you came here for DB2 or dashDB you might wonder what Jupyter Notebooks are. Notebooks are interactive web-pages where you have sections ("cells") that contain text or code. The text can be in different input formats including Markdown. The code cells support various programming languages, can be edited inline and are executed on demand. Basically a notebook is an interactive, on-demand business/database report. And as you can see in the screenshot, the code is able to produce graphs.

The IBM Analytics for Apache Spark service on Bluemix provides those analytic notebooks and it is the service I provisioned for my tests. Once you launch the service you can start off with sample notebooks or create them from scratch. I started with samples to get up to speed and the composed my own one (see my notebook source on GitHub). It has several cells written in Python to set up a connection to dashDB/DB2, execute queries, fetch data and process that data within the notebook. The data is used to plot out a couple graphs.

For my example I am using a dashDB (a DB2-based service) that I provisioned on Bluemix as a data store. I used the LOAD wizard to create and fill one table holding historic CeBIT dates and visitor counts and another table with historic weather data for Hanover, Germany (obtained from Deutscher Wetterdienst). Within the notebook those tables are queried and the data fetched into so-called data frames. The data frames are used to transform and shape the data as needed and as source for the generated graphs. Within the notebook it is possible to combine data frames, execute queries on them and more - something I didn't do today.

To get to my dashDB-based graphs in a Jupyter Notebook on IBM Analytics for Apache Spark I needed to get around some issues I ran into, including data type casts, naming of result columns, labeling of graphs, sourcing columns as input for a graph and more. For time reason I refer to the comments in the source code for my notebook.

After all that introduction, here is the resulting graph. It shows that during a sunny and warm week with close to no rain there were fewer CeBIT attendees. A little rain, some sun and average temperature yielded a high visitor count. So could it be that the weather to attendee relationship is bogus for computer fairs and may only hold for museums? Anyway, it was fun learing Jupyter Notebooks on Bluemix. Now I need to plot my weekend plans...
Historic CeBIT Weather and Attendance

Tuesday, March 1, 2016

Mom, I joined the cloud! (or: Use old stuff with new stuff - DB2 federation)

Everybody is talking about Hybrid Clouds, combining on-premises resources like database systems and ERMs with services in the public or dedicated cloud. Today I am showing you exactly that, how I combined my on-prem DB2 with a cloud-based DB2 that I provisioned via Bluemix. The interesting thing is that really old technology can be used for that purpose: database federation. So relax, sit back, and follow my journey in joining the cloud...
Database Services in the Bluemix Catalog

For my small adventure I used a local DB2 10.5 and a Bluemix-based SQLDB service. The steps I followed are an extended version of what I wrote in 2013 about using three-part names in DB2 to easily access Oracle or DB2 databases. Smilar to the entry I started by enabling my DB2 instance for Federation (FEDERATED is the configuration parameter).
[hloeser@mymachine] db2 update dbm cfg using federated yes