Data Henrik: 2016

Friday, December 16, 2016

New Mod Pack and Fix Pack for DB2 V11 available now

New DB2 Fix Pack and Mod Pack available

Over the past years I blogged about new fixpacks and features in DB2. This is the first time I have to mention "Mod Packs" because for DB2 V11 a new Mod Pack and Fix Pack was released. DB2 changed to distinguish between fixes to existing features and adding new features in interim releases. The terminology of version, release, modification and fix pack is explained in this support document.

If you are one of the administrators or developers impacted by system and code freezes over the last weeks of the year, then the good news is that you can use the time to explore some great enhancements to DB2. Check out the summary page in the DB2 Knowledge Center for an overview. Here are my favorites:

DB2 supports PKCS #11 keystores now, i.e., Hardware Security Modules (HSMs) can now be used with DB2 extending the choice to local keystores, centrally managed keystores and hardware modules.
Lots of improvements to DB2 BLU. If you are on Linux OS on IBM z Systems ("z/Linux") then you will be happy about DB2 BLU performance improvements on z13. There are improvements for other processors, too.
The Workload Manager (WLM) and related monitoring have been enhanced, giving deeper insight and more control of long running complex queries. There are also new features related to CPU management.

Sounds interesting? The updated version of DB2 can be downloaded from the usual support page offering all the fix packs (and now Mod Packs) for the recent versions of DB2.

Tuesday, November 29, 2016

SQL Magic in Notebooks in the IBM Data Science Experience

New Notebook in IBM Data Science Experience

At the recent IDUG DB2 Tech Conference in Brussels I gave a talk on using Jupyter Notebooks with IBM DB2 or dashDB. For the presentation I used a local installation of the notebooks and DB2 (never trust Internet connectivity). Part of the talk was about using SQL Magic in a notebook as simple interface to the database, e.g., for testing and prototyping.
After the conference I received a question about whether it is possible to use the SQL Magic with Jupyter Notebooks in the IBM Data Science Experience. The answer is yes and here is how.

Stuff - The Day of the BLOB and Object Storage

Regardless of whether it is turkey, cranberry sauce, stuffing, gravy, sweet potatoe pie, mashed potatoes or more that you eat, independent of whether it is a new iPhone, tablet, big screen, Bluetooth soundbar, household robot or other gadget on sale, good to know that you can stuff almost anything into a DB2 BLOB or into the Bluemix Object Storage or Block Storage service.

In that sense "Happy Thanksgiving"! I am currently looking into the Content Delivery Network service to get my stuff faster to my folks. Talking about "stuff", enjoy this classic on "stuff" and "storage":

Tuesday, November 22, 2016

DB2/dashDB Security: Implicit Privileges Through Group Membership

DB2 Data Security

I recently saw an interesting DB2 question on Stack Overflow. Someone asked how it is possible to find out privileges for a user when the privileges were granted to a group the user is member of. DB2 does not manage group membership within the database, it is done in the operating system. But DB2 offers functions and views to retrieve that information and to simplify analysis of the security-related metadata. And remember that this applies to IBM dashDB as well.

To look up which groups a specific user belongs to, DB2 offers the table function AUTH_LIST_GROUPS_FOR_AUTHID. The returned groups are not necessarily used within the database and can be any operating system group. The following query returned several of those typical Linux groups:

SELECT * FROM TABLE (SYSPROC.AUTH_LIST_GROUPS_FOR_AUTHID('HLOESER')) as T

An administration view that comes handy is SYSIBMADM.AUTHORIZATIONIDS. It lists all authorization IDs along their respective type, i.e., groups, users and roles. When combined with another view, SYSIBMADM.PRIVILEGES, which lists all explicit privileges for all authorization IDs (that is for users, groups and roles), it allows to filter, e.g., for specific group or role privileges. Joining in the groups for a specific user and not forgetting to factor in PUBLIC privileges, I came up with the following query. It should list all the implicit privileges I have.

SELECT distinct p.AUTHID, p.PRIVILEGE, p.OBJECTNAME, p.OBJECTSCHEMA, p.OBJECTTYPE
FROM SYSIBMADM.PRIVILEGES P, SYSIBMADM.AUTHORIZATIONIDS A,
TABLE (SYSPROC.AUTH_LIST_GROUPS_FOR_AUTHID('HLOESER')) as U
WHERE p.privilege='CREATEIN' AND a.authidtype='G'
AND a.authid=p.authid
AND (u.group=a.authid or a.authid='PUBLIC')

If you want to know all your privileges, just UNION the result above with a query on SYSIBMADM.PRIVILEGES for your authid:

SELECT distinct p.AUTHID, p.PRIVILEGE, p.OBJECTNAME, p.OBJECTSCHEMA, p.OBJECTTYPE
FROM SYSIBMADM.PRIVILEGES P, SYSIBMADM.AUTHORIZATIONIDS A, TABLE (SYSPROC.AUTH_LIST_GROUPS_FOR_AUTHID('HLOESER')) as U
WHERE p.privilege='CREATEIN' and a.authidtype='G' and a.authid=p.authid
AND (u.group=a.authid or a.authid='PUBLIC')
UNION
SELECT distinct p.AUTHID, p.PRIVILEGE, p.OBJECTNAME, p.OBJECTSCHEMA, p.OBJECTTYPE
FROM SYSIBMADM.PRIVILEGES P
WHERE p.authid='HLOESER'

Wednesday, November 9, 2016

IDUG EMEA: Henrik's Session Recommendations

Screenshot of Weather Graph

Next week will see the IDUG EMEA Tech Conference in Brussels, Belgium. I hope to meet many DB2 users again. Part of preparing for a conference is to read over the agenda and get a rough idea of which sessions to attend. Here are the sessions you will likely find me in the audience (or on stage), my focus is DB2 LUW.

Monday and Tuesday start off with keynotes in the morning. The DB2 conference closes with a keynote on Thursday. All are a must. For the rest of the time there are up to 6 tracks to choose from and here are some of the sessions I already picked and recommend:

Monday, 10:40 to 11:40, C01 - DB2 LUW 11.1 : A Technical Overview with Matt Huras is kind a must for the DB2 LUW crowd to get all the insights into the current release.
Monday, 12 to 13, D02 - To Cloud, or not To Cloud? Reasons to go "off premise" with DB2 with Steve Rees will probably see great discussions. This is a topic I am discussing with many customers.
Tuesday, 14:30 to 15:30, F06 - R You Ready to be a Data Scientist? with Paul Turpin hopefully gives me more background in R and data analysis. That session also nicely plays with:
Tuesday, 17 to 18, E08 - Interactive Reports and Presentations - Powered by DB2 and Jupyter Notebooks with Henrik Loeser. In that session I am going to show notebooks as interactive computational environments and using different APIs for DB2.
Wednesday, 8:30 to 9:30, C09 - 11 Great DB2 Questions from Stack Overflow (Answers Included) with Henrik Loeser. The day starts early with this introduction into how to use the Q&A site Stack Overflow to solve your DB2 problems and to make utilize their "open data" knowledge base for deeper analysis.
Wednesday, 9:40 to 10:40, C10 - OMG! Experience is a Hard Teacher – Lessons Learned #1with Melanie Stopfer. I plan to stay in the same room for an entertaining and experience-rich session with Melanie.
Wednesday, 13:40-14:40, is the DB2 LUW Expert Panel, another must.
Wednesday, 16:20 to 17:20, D13 - db2audit in a nutshell – advantages and challenges with Markus Fraune covers a topic many companies are looking into due to increased compliance requirements and rules & regulations.
Thursday, 8:30 to 9:30, C15 - Exploiting Cloud Storage with DB2 for LUW with Phil Nelson is another early, but interesting session digging into new territory.

See you in Brussels next week!

Friday, October 28, 2016

Bluemix: How to Register Your Own Service Broker

Dashboard from Sample Service

In Cloud Foundry and hence in IBM Bluemix so-called service brokers manage the provisioning and removal of service instances. They provide the necessary metadata about the managed service to the catalog, so that users can find and request that service. Bluemix offers 100+ services in its catalog, but what if you want to add your own service? The answer is to register your own private broker and there are even two different kinds. Want to know how to do it? Then read on.

Extend the Bluemix CLI Through Plugins

Put a smile in your cloud

As much as I like nice graphical user interfaces (UIs) and working with the browser, for most tasks I prefer using the command line. I have done so and still do for working with DB2 and dashDB, I also use the Cloud Foundry (CF) Command Line Interface (CLI) to manage my Bluemix environments. The "cf" tool is quite handy, allows to perform what I need to accomplish. A great feature is its support of plugins that allow to extend its functionality. There are IBM-provided plugins and several coming from the Cloud Foundry ecosystem. Here are some brief instructions on how to get started.

Once you have installed a recent version of the "cf" tool, just invoking it without any parameters should display some help text on available commands and features. In the lower part of that text should be two sections related to plugins:

ADD/REMOVE PLUGIN REPOSITORY:
   add-plugin-repo                        Add a new plugin repository
   remove-plugin-repo                     Remove a plugin repository
   list-plugin-repos                      List all the added plugin repositories
   repo-plugins                           List all available plugins in specified repository or in all added repositories

ADD/REMOVE PLUGIN:
   plugins                                List all available plugin commands
   install-plugin                         Install CLI plugin
   uninstall-plugin                       Uninstall the plugin defined in command argument

The command "add-plugin-repo" allows to add new repositories from which plugins can be installed, "list-plugin-repos" lists those repositories already available in your environment. Try the following two commands to add the plugin repositories from Bluemix Public and from the Cloud Foundry community collection:

cf add-plugin-repo BluemixPublic http://plugins.ng.bluemix.net/

cf add-plugin-repo CF-Community http://plugins.cloudfoundry.org/

Once done, try the following command to list the repositories. It should return something like shown:
henrik>> cf list-plugin-repos
OK

Repo Name      Url
BluemixPublic http://plugins.ng.bluemix.net
CF-Community   http://plugins.cloudfoundry.org/

With the command "repo-plugins" it is possible to list all the available plugins ready to be installed. You can read more about them at the repository descriptions linked above. For switching quickly between my different Bluemix environments I have installed the "targets" plugin from the Cloud Foundry community repository:

henrik>> cf install-plugin targets -r CF-Community

**Attention: Plugins are binaries written by potentially untrusted authors. Install and use plugins at your own risk.**

Do you want to install the plugin targets? (y or n)> y
Looking up 'targets' from repository 'CF-Community'
8230816 bytes downloaded...
Installing plugin /tmp/cf-targets-plugin...
OK
Plugin cf-targets v1.1.0 successfully installed.

After I am done when calling just "cf" again, there is a new section in the help text available:
INSTALLED PLUGIN COMMANDS:
   targets                                List available targets
   set-target                             Set current target
   save-target                            Save current target
   delete-target                          Delete a saved target

If you have more plugins installed then there will be more commands displayed. Using the "targets" plugin I can save the current environment, i.e., the Bluemix or Cloud Foundry instance I am logged into, and quickly can switch between the Bluemix Dallas (NG) and Bluemix London (EU-GB) platforms using the "set-target" command. "cf set-target eu" would replace several "cf api" and "cf login" commands and a lot of typing, great for scripting and more efficient work.

Thursday, October 6, 2016

Easy to identify: Does the table have a primary key?

Primary Key

Next week I am going to teach students database basics again. One of the topics will be primary keys and how they help enforcing uniqueness and identify each of the stored objects. Recently I stumbled over the question how it is easily possible to tell whether a DB2 or dashDB table has a primary key. The answer, as often, is in the catalog, the database metadata.

For performance reasons almost all database systems use an unique index to implement a primary key. So the key (pun intended) is to look for such an index. Both DB2 for Linux, UNIX, and Windows (LUW) and DB2 for z/OS store information about indexes in a system table SYSIBM.SYSINDEXES. On DB2 for z/OS that table is exposed to the user and documented here. DB2 LUW has catalog views on top and the view to use is named SYSCAT.INDEXES, however querying the table still works:

SELECT COLNAMES
FROM SYSIBM.SYSINDEXES
WHERE TBNAME = 'MYTABLE'
AND UNIQUERULE = 'P'

The query returns the columns on which the primary key is defined for the table MYTABLE. As can be seen in the documentation, the UNIQUERULE provides information about whether the index is an index with duplicates, an unique index, or it is used to implement a primary key (value P). On DB2 LUW we could write the query utilizing the catalog view SYSCAT.INDEXES. The following query returns the table name and schema as well as the column names for all tables which have a primary key defined:

SELECT TABNAME, TABSCHEMA,COLNAMES
FROM SYSCAT.INDEXES
WHERE UNIQUERULE='P'

So the key to quickly working with primary keys are indexes and their metadata...

Monday, August 22, 2016

Notes on Notebooks, Data, DB2, and Bluemix

Weather Graph in Jupyter Notebook

Some time ago I shared with you how I used open data like weather data and CeBIT statistics to have some fun with Jupyter Notebooks. Notebooks are an old, but now - in the age of Cloud Computing - trendy way of collaborative data exploration (see this article for some background reading). Notebooks consist of so-called cells, places to put code, instructions, text and more. Cells can hold Markdown-formatted text, code written in Python, Scala and other languages. It is possible to fetch data from DB2, dashDB and other database systems and process it in the notebook, creating stunning graphics. And with extensions such as RISE (reveal.js IPython Slideshow Extension) those notebooks replace Powerpoint & Co as source for great data-driven presentations. How to use notebooks and DB2 is what I plan to present at the IDUG EMEA 2016 Conference in Brussels later this year.

If you can't wait until then I recommend to take a look at these recent blog posts on how to get started with Notebooks and data on Bluemix:

The first blog gives you basics on processing data in so-called data frames and to generate tables and graphs. The data is Open Data and is loaded off the Analytics Exchange on Bluemix. The notebook utilizes the computing power of an Apache Spark cluster.
The second blog covers how to analyze Twitter data for market trends. The example uses dashDB/DB2 to hold the data. The scripts are written in Python and plug into Sentiment Analysis and Natural Language Processing to understand tweets.
The last in my list for today is a blog on how to use Apache Spark GraphFrames in notebooks. Airports and flight routes between them are used as base for some computations. It is something every (business) traveler understands.

In case you have trouble programming your notebooks, head over to Stack Overflow and search for "ipython-notebook" or "jupyter-notebook".

That's it for today with my notes on notebooks.

Wednesday, July 13, 2016

Data, Dialogs, and Databases

Watson Dialog Specification

I recently wrote about managing IBM Watson dialogs from the command line and that I wanted to bring database records into a dialog, combining different IBM Bluemix services. I succeeded and here is the follow-up on how I perform dialog-driven lookups in dashDB / DB2 and update the conversation for Watson to return user-specific answers. The application is written in Python and available on GitHub again and builds on the experience and code from the watson-dialog-client I published earlier.

The user interacts with the Python app which basically is a loop, waiting for the user to end the dialog. The script uses the dialog API to send the user input and obtain a response from Watson. It also checks the dialog profile variables to determine whether there are changes or any variable of interest is set. Depending on state information obtained from the profile variables dashDB is queried. Database results are either directly returned to the user or are used to update the dialog profile.

Data and Dialog Combined

To demonstrate (read: experiment with) data-driven dialogs I created a simple table "dialogdata" in dashDB and populated it with few records. The dialog prompts the user for the name. When entered it is used to look up the corresponding database record which then is fed into the dialog variables. As an example the dialog allows to ask for the birthday or age. The answer is composed of a template with the user-related information filled in.
Another example for combining dialog and database is to enter a service name, i.e., to request an action. This is detected by the Python script again by checking the profile variables. Then, dashDB is queried and the result is directly returned to the user. This way it would be possible to code up a dialog-driven database client ("Ok DB2" :).

As mentioned above, the code for the combination of dialog and database is available on GitHub along with some instructions. The README contains the output from a sample dialog including lots of debugging information which should help you understanding the data-driven dialog.

Thursday, July 7, 2016

Bluemix: Where Python and Watson are in a Dialog

Right now I am working on a side project to hook up the Watson Dialog Service on Bluemix with dashDB and DB2. The idea is to dynamically feed data from DB2 into a conversation bot. To register and manage dialogs with the Watson Dialog Service, there is a web-based dialog tool available. But there is also a dialog API and a Python SDK for the Watson services available. So why not manage the dialogs from the command line...?

Converse with Watson from the Command Line

Here is a small overview of my Python client that helps to register, update, delete and list dialogs and that can even drive a dialog (converse with Watson) from the shell window on your machine. The code and for now some short documentation is available on GitHub as watson-dialog-client.

In order to use the tool, you need to have the Watson Dialog Service provisioned on IBM Bluemix. The service credentials need to be stored in a file config.json in the same directory as the tool "henriksDialog". The credentials look like shown here:

{
    "credentials": {
        "url": "https://gateway.watsonplatform.net/dialog/api",
        "password": "yourServicePassword",
        "username": "yourUserIDwhichIsALongString"
    }
}

The credentials are read by the tool to "chat" with the dialog service. The following commands are available:

register a dialog by providing a new dialog name and the XML definition file
"henriksDialog -r -dn dialogName -f definitionFile"
update a dialog by identifying it by its ID and providing a definition file
"henriksDialog -u -id dialogID -f definitionFile"
delete a dialog identified by its ID
"henriksDialog -d -id dialogID"
list all registered dialogs
"henriksDialog -l"
converse, i.e., test out a registered dialog which is identified by its ID
"henriksDialog -c -id dialogID"

Sample invocations and their output is available in the GitHub repository for this dialog tool. Let me know if something is missing or you had success chatting with Watson from the command line.

Friday, July 1, 2016

Store and Query XML Data with dashDB on Bluemix

XML Column in dashDB

I recently got asked whether it is possible to process XML data with dashDB on IBM Bluemix. The answer to that is that it is possible. dashDB is based on DB2 with its industry-leading pureXML support which I wrote many blog entries about. In the following I give you a quick start into preparing dashDB to store XML data and how to query it.

If you are using the regular dashDB service plans which are tailored to analytics, then by default all tables use columnar storage. That format provides deep compression and high performance query processing capabilities for analytic environments, but it is not suited for natively storing XML data. That is the reason why tables need to be created by explicitly stating ORAGNIZE BY ROW in the "Run SQL" dialog (see screenshot above):

CREATE TABLE myTable(id INT, doc XML) ORGANIZE BY ROW

The above statement creates the table "myTable" with two columns, the second of type XML, and in the classic row-oriented table format.

SQL/XML Query with dashDB

Once the table is created, data can be inserted. This can be done by using INSERT statements in the "Run SQL" dialog or by connecting other tools to dashDB. The "Load Hub" is designed for analytic data sets and does not support XML-typed columns. An introduction to inserting XML data can be found in the pureXML tutorial in the DB2 documentation.
After the XML data is in, the "Run SQL" dialog can be used again to query the documents. Queries can be either in SQL (SQL/XML) or in XQuery, see the screenshots with examples.

I hope that gives you a rough idea how to utilize the pureXML feature in dashDB, even though its main focus is analytics.

XQuery with dashDB

Thursday, June 16, 2016

Now available: DB2 Version 11 for Linux, UNIX, and Windows

The new version 11.1 of DB2 for Linux, UNIX, and Windows (DB2 LUW) is now available. Enjoy many product improvements for analytic and OLTP scenarios. Here is how to get started:

This DB2 support site lists all available DB2 versions and fixpacks from DB2 9.1 fixpack 1 to the new DB2 11.1.
As usual, I recommend by taking a look at the "What's new", the "What's changed" and the "Highlights of DB2 Version 11.1" pages in the DB2 documentation for an overview and links to more detailed information.
Another good introduction is provided by recent editions of the DB2Night Show which had several shows dedicated to new features of DB2 11.1.
As part of the security enhancements centralized key managers for the native encryption are now supported.
If you are into Oracle or Netazza you will find this overview page for "SQL Compatibility Enhancements" useful.
Last but not least, I always take a look at "functionality in DB2 product editions and DB2 offerings" to understand what editions/offerings are available and what is included.

With that, let's get started and have a save and successful journey with the newest DB2 version, DB2 11.1.

P.S.: Don't forget to try out IBM dashDB, the DB2-based cloud offering.

Friday, June 10, 2016

Learn DB2 and dashDB with Stack Overflow

Top-voted DB2 questions on Stack Overflow

When I want to learn more about the ins and outs of DB2 or dashDB or when I have some spare time and want some fun sharing my knowledge I visit Stack Overflow. Stack Overflow (SO) is at the core of a network of question & answer websites. Here is a quick introduction with hopefully some deeper discussion in a future blog post.

If you are not signed up with Stack Overflow, you can use these links to see DB2 and dashDB questions:

newest dashDB questions on SO
newest DB2 questions on SO
highest voted DB2 questions on SO
newest DB2 questions on Stack Exchange for Database Administrators
highest voted DB2 questions on Stack Exchange for Database Administrators

Once you are logged into Stack Overflow it is possible to combine several tags (such as "db2" or "dashDB") in a single search. It also is possible to search for a specific keyword related to a tag. Putting "[db2] XML" into the search box results in all questions and answers that were labeled (tagged) "db2" and contain the word "XML". Want to know more about question related to dashDB and R? Try "[dashdb][r]".

Questions related to programming belong to Stack Overflow, topics around operating DB2 should be in the DBA channel for Stack Exchange.

I am sure there will be more questions coming up once DB2 V11 for Linux, UNIX, and Windows is released - hopefully next week.

Monday, May 30, 2016

New IBM Knowledge Center for DB2 and other products

New DB2 Knowledge Center

Maybe you have already seen this, but the Knowledge Center for DB2 and other IBM products just changed. When you go to your bookmarked link for the DB2 10.5 Knowledge Center you will notice a slightly different layout for the known content. However, there are more differences and I really like them.

The first thing I tried was the language picker. At the bottom right of each page you can now switch between supported languages. Something that is useful especially when your first language is not English and you want to check language-specific terms or clarify a feature description. On the welcome page for DB2 10.5 you can also switch between DB2 versions and going back to even DB2 9.5 is still supported.

Switching Languages in the IBM Knowledge Center

What I first missed due to layout changes was the navigation tree. It is visible after clicking the icon on the upper left and topics can be expanded much faster than in the old version of the Knowledge Center. What also is much faster is the search functionality. After clicking on "Search" on the upper right, a page with a search box is coming up. It is possible to select the DB2 version to be searched and enter the search term. Suggestions for possible keywords are made and when you hit "enter" the search results appear. Everything as expected. However, what I find very useful is the option to preview individual search results by expanding them within the result list (see screenshot below). That way you can stay with the result page without switching back and forth between documentation and search results.

Once you are on a regular documentation page, you can again switch between different DB2 versions and thus easily compare what has changed or check out syntax for a specific version. There are also new forward and backward buttons on top of each page to walk through a topic or section split over multiple pages - less navigation and clicks required to consume the content.

That's my update for today. If you feel nostalgic, check out my blog entry from 2009 about changes to what was called "Information Center" at that time. And in case you are cloud-based already, the new Knowledge Center for IBM dashDB is here.

Simplified search in the DB2 Knowledge Center

Wednesday, May 4, 2016

Starter Kit for Data-Driven Cloud App with Access to On-prem Resources

Cloud App with On-prem Integration

Over the past week I have been trying to create a simple yet versatile sample to get developers started with data-driven cloud apps on Python. The application uses the Flask framework for the web and SQLAlchemy for the database part. The app is database-agnostic and can be used with dashDB, DB2, MySQL and other relational database systems. As many companies do not start from scratch, a bigger part of this (kind of) starter kit demonstrates how to integrate an existing database with new cloud app on Bluemix.

To get quickly started and to avoid needing to deal with corporate security standards and administrators, the on-prem database is simulated by a virtual machine or Docker container. The app itself is a variation of a classic, displaying and adding to a list of bookmarks or reading material. It is stripped to the core and documented to focus on the "getting started" aspect.

The sample app is available on GitHub in the repository Bluemix-onprem-data. As usual, let me know if you have questions and something needs to be clarified.

My Bluemix Readlist

Monday, April 11, 2016

Data Protection, Privacy, Security and the Cloud

Protecting your bits

(This is the first post in a planned series on data protection, security, and privacy related to DB2/dashDB in the cloud and IBM Bluemix)

As a data/database guy from Germany, security and data protection and privacy have been high on my list of interests for many, many years. As a banking customer I would hate it when someone not authorized would access my data. I also don't like to go through the hassle of replacing credit cards, changing passwords, take up a new name (user name only :), or more because a system my data is or was on had been hacked. With more and more data being processed "in the cloud" it is great to know how much effort has been put into designing secure cloud computing platforms, into operating them according to highest security standards, and how international and local data protection standards and laws are followed for legal compliance.

New Bluemix Data Sink Service Tackles Data Overload

IBM today announced a new experimental service for its Bluemix cloud platform that provides Data Sink capabilities to its users, helping companies to tackle data overload scenarios, enhancing data archiving throughput and solving data retention issues. Building on the global network of SoftLayer data centers the new data sink as a service will feature triple redundancy for high performance and increased fault tolerance, and hence tentatively is named DSaaSTR (Data Sink-as-a-Service with Triple Redundancy).

Initially the experimental Bluemix service is free and allows to pipe up to 1 TB (one terabyte) of data a month to the data sink. Customers already on direct network links will be able to utilize the full network bandwith. This gives the opportunity to test the DSaaSTR offer for the following scenarios:

The abundance of sensors and their generated data, whether in Internet of Things (IoT) or Industry 4.0 scenarios, leaves companies struggling with data storage. Utilizing the new service they can leverage the DSaaSTR in the cloud to get rid of local data.
The more data and data storage options, the more intruders. By piping data to the Bluemix DSaaSTR it will become unavailable for attackers.
Local data archives require active data management, enforcement of retention policies, and rigorous disposal. DSaaSTR offers easy choices for data retention and disposal.
Many enterprises have learned that even Hadoop Clusters need actively managed storage. DSaaSTR can be configured to be part of a local or cloud-based Hadoop system (hybrid cloud), thus eliminating storage costs and simplifying the overall administration tasks for the cluster.

The new service is made available to select customers worldwide as part of the experimental Bluemix service catalog. If you are interested in signing up please send me an email or get in touch with your regular Bluemix seller or support contact.

Wednesday, March 16, 2016

CeBIT: Goldsmith in the Hybrid Cloud - How to Create Value from Enterprise Data

Gold Nuggets - Data as Gold

Data, data, data. There is a lot of data, data already stored or archived, and data about to be produced, generated, measured, or even missing data. But there is not always value in the accessible data, even though data is considered the new gold. Similar to the real gold and a goldsmith creating jewels, data first needs to be worked on, refined, transformed and made interesting to consumers, turned into information or insight.

Coincidence? CeBIT visitors and weather featuring Jupyter Notebooks, Spark and dashDB

Jupyter Notebook via Bluemix

Next week I am going to talk at the CeBIT fair in Hanover. As usual I am interested in how the weather will be. And with every conference or fair a common question is about attendance. Why not combine the two, analyse past CeBIT weather and visitor count for some Friday fun? Today I am going to look into Jupyter Notebooks on Apache Spark with some Open Data stored in dashDB, all available via IBM Bluemix.
(Note that I am in a hurry and don't have time for detailed steps today, but that I share the sources and will add steps later on.)

The screenshot on the right is the result of what I am going to produce today. The source file for the notebook, the exported HTML file, input data, etc. can be found in this GitHub repository. If you came here for DB2 or dashDB you might wonder what Jupyter Notebooks are. Notebooks are interactive web-pages where you have sections ("cells") that contain text or code. The text can be in different input formats including Markdown. The code cells support various programming languages, can be edited inline and are executed on demand. Basically a notebook is an interactive, on-demand business/database report. And as you can see in the screenshot, the code is able to produce graphs.

The IBM Analytics for Apache Spark service on Bluemix provides those analytic notebooks and it is the service I provisioned for my tests. Once you launch the service you can start off with sample notebooks or create them from scratch. I started with samples to get up to speed and the composed my own one (see my notebook source on GitHub). It has several cells written in Python to set up a connection to dashDB/DB2, execute queries, fetch data and process that data within the notebook. The data is used to plot out a couple graphs.

For my example I am using a dashDB (a DB2-based service) that I provisioned on Bluemix as a data store. I used the LOAD wizard to create and fill one table holding historic CeBIT dates and visitor counts and another table with historic weather data for Hanover, Germany (obtained from Deutscher Wetterdienst). Within the notebook those tables are queried and the data fetched into so-called data frames. The data frames are used to transform and shape the data as needed and as source for the generated graphs. Within the notebook it is possible to combine data frames, execute queries on them and more - something I didn't do today.

To get to my dashDB-based graphs in a Jupyter Notebook on IBM Analytics for Apache Spark I needed to get around some issues I ran into, including data type casts, naming of result columns, labeling of graphs, sourcing columns as input for a graph and more. For time reason I refer to the comments in the source code for my notebook.

After all that introduction, here is the resulting graph. It shows that during a sunny and warm week with close to no rain there were fewer CeBIT attendees. A little rain, some sun and average temperature yielded a high visitor count. So could it be that the weather to attendee relationship is bogus for computer fairs and may only hold for museums? Anyway, it was fun learing Jupyter Notebooks on Bluemix. Now I need to plot my weekend plans...

Historic CeBIT Weather and Attendance

Tuesday, March 1, 2016

Mom, I joined the cloud! (or: Use old stuff with new stuff - DB2 federation)

Everybody is talking about Hybrid Clouds, combining on-premises resources like database systems and ERMs with services in the public or dedicated cloud. Today I am showing you exactly that, how I combined my on-prem DB2 with a cloud-based DB2 that I provisioned via Bluemix. The interesting thing is that really old technology can be used for that purpose: database federation. So relax, sit back, and follow my journey in joining the cloud...

Database Services in the Bluemix Catalog

For my small adventure I used a local DB2 10.5 and a Bluemix-based SQLDB service. The steps I followed are an extended version of what I wrote in 2013 about using three-part names in DB2 to easily access Oracle or DB2 databases. Smilar to the entry I started by enabling my DB2 instance for Federation (FEDERATED is the configuration parameter).
[hloeser@mymachine] db2 update dbm cfg using federated yes
DB20000I The UPDATE DATABASE MANAGER CONFIGURATION command completed
successfully.

Building a Solution? The Cloud Architecture Center has Blueprints

Cloud Architecture Center

Remember the days when a simple text client and a small database server were the core enterprise solution? These days data flows from various endpoints to data lakes or data reservoirs, data streams are analyzed in real time to trade stacks, prevent fraud, to react to sensor data. How are other companies building their solutions or what are best practices? What products or services can be used? Great that the new Cloud Architecture Center offers blueprints.

Right now the IBM developerWorks Cloud Architecture Center features an architecture gallery where you can filter the available blueprints by overall area like data & analytics, Internet of Things (IoT), Mobile or Web Application. Another filter criterias are by industry or capability, i.e., you could look for sample solution for the insurance industry or a use-case featuring a Hybrid Cloud scenario.

Partial view: Architecture for Cloud Solution

For the selected architecture and solution you are presented with the overall blueprint (as partially shown in the screenshot) and are offered information about the flow, the included components are deployed services and products, and get an overview of the functional and non-functional requirements. Depending on the solution there are links to sample applications, code repositories on GitHub, and more reading material. See the Personality Insights as a good example.

The Architecture Center offers great material for enterprise architects and solution designers and the linked samples and demos are also a good start for developers.

(Update 2016-02-21): There is a new and good overview article with focus on Big Data in the cloud and possible architecture.

Thursday, February 4, 2016

How to Navigate Bluemix - My Starter Guide

Bluemix Account

Coming from a product like DB2 with a focus on operational and feature stability, consistency and high availability, and now working with a product or, better, platform like Bluemix feels like an entirely different world. At least at first sight. Truth is that I feel at home once I learned to navigate it. Here is the first installment of my "how to bluemix"...

Bluemix by Region and Organization
Once you have logged into IBM Bluemix, in most of the cases the dashboard should be show. Its content and also the services offered to you in the Bluemix catalog depend on the selected region and the organization (see screenshot on the right). Bluemix and its services are hosted in different data centers around the world (regions) and not all services are available in each data center. You can find out which regions are available and which services are supported in a region by checking out the Bluemix status page (also see the section below).

Parse shutting down, move your data

Parse shutting down

This week Parse.com, Facebook’s Mobile Backend as a Service offering, surprised their users. The service will shut down next year and all users are asked to move on. The Parse backend server has been released as open source project, a tool has been made available to migrate data. My Bluemix colleagues have created migration guides.

Mike Elsmore has created a quick overview of how to provision the required services on Bluemix to move over your data from Parse.com. Reading his instructions probably requires more time than the actual migration process. If you are not that deep into Bluemix, want more details, or a simple click of a button to deploy the required components, I would recommend reading the extended tutorial that Andrew Trice wrote. He walks you through the process, step by step and screenshot by screenshot, on how to provision and configure the services, how to move the data, and eventually testing the migrated application.

Parse is using the NoSQL MongoDB to store the data. You can take a look at the DatabaseAdapter.js and ExportAdapter.js files to see how Parse is using the database and, if you like, write your own adapter for Cloudant/CouchDB or maybe even a relational database like MySQL or DB2.

Given that several Cloud service providers and PaaS hosters have announced shutting down, it is an interesting time. It seems that a new chapter in the Cloud story has begun, market consolidation has started.

Friday, January 29, 2016

Combining Bluemix, Open Data on Tourism and Watson Analytics for some Friday Insight

Inbound and Outbound Tourism, Watson Analytics

Yup, it is Friday again and the weekend is coming closer and closer. The Carnival or Fasnet/Fastnacht season is close to its peak and some school holidays and inofficial holidays are coming up late next week. Tourists are pouring into carnival strongholds. Why not take some time today to test drive the Bluemix Analytics Exchange and Watson Analytics with tourism data and try to get some insight?

A Cache of Identities and a Sequence of Events

Bits and Bytes of
Sequences

Recently I received an interesting question: DB2 and other database systems have a feature to automatically generate numbers in a specified order. For DB2 this generator is called a sequence. When it is directly used to assign values to a column of a table, the term identity column is used. The next value of a sequence or identity column is derived by incrementing the last value by a specified amount. This works well, but sometimes there is a gap. Why?

The Cloud, Mood-Enhancing Substances, World Economic Forum, and More

DataWorks and Connect & Compose

Right now, the Winter sky is mostly covered by some low hanging clouds, giving way only for some random rays of sun. The past weeks I have been plagued by a cold which drew most of my energy. Now I am back, thanks to some mood-enhancing substances (a.k.a. lots of dark chocolate) and some rest. So what else, in addition to the usual World Economic Forum, is going on?

Pages