Monday, August 25, 2014

Setting up and using a DB2 in-memory database on IBM Bluemix

[Update 2014-11-04: The Analytics Warehouse service on Bluemix is now called dashDB.]
Last Friday I was on the way back from some customer visits. While traveling in a German highspeed train I used the Wifi service, connected to IBM Bluemix and created a DB2 in-memory database. Let me show you how I set it up, what you can do with it and how I am connecting to the cloud-based database from my laptop.

Unbound DB2 service on Bluemix
The first thing to know is that on Bluemix the DB2 in-memory database service is called IBM Analytics Warehouse. To create a database, you select "Add service" and leave it unbound if you want, i.e., it is not directly associated with any Bluemix application. That is ok because at this time we are only interested in the database. Once the service is added and the database itself created, you can lauch the administration console.

The console supports several administration and development tasks as show in the picture. It includes loading data, to develop analytic scripts in R, to execute queries and link the data with Microsoft Excel for processing in a spreadsheet, and it has a section to connect external tools or applications to the database.
Administration/development task in DB2 BLU console on Bluemix
One of the offered task is very interesting and I twittered about it on Friday, too:

You can set up replication from a Cloudant JSON database to DB2, so that the data stream is directly fed in for in-memory analyses. I didn't test it so far, but plan to do so with one of my other Bluemix projects.

A task that I used is to (up)load data. For this I took some historic weather data (planning ahead for a vacation location), let the load wizard extract the metadata to create a suitable data, and ran some queries.

Uploading data to DB2 on Bluemix

Specify new DB2 table and column names

For executing (simple) selects there is a "Run Query" dialogue. It allows to choose a table and columns and then generates a basic query skeleton. I looked into whether a specific German island had warm nights, i.e., a daily minimum temperature of over 20 degrees Celsius. Only 14 days out of several decades and thousands of data points qualified.

Last but not least, I connected my local DB2 installation and tools to the Bluemix/Softlayer-based instance. The "CATALOG TCPIP NODE" is needed t make the remote server and communication port known. Then the database is added. If you already have a database with the same name cataloged on the local system, it will give an error message as shown below. You can work around it by specifying an alias. So instead of calling the database BLUDB, I used BLUDB2. The final step was to connect to DB2 with BLU Acceleration in the cloud. And surprise, it uses a fixpack version that officially is not available yet for download...

DB:  => catalog tcpip node bluemix remote server 50000
DB20000I  The CATALOG TCPIP NODE command completed successfully.
DB21056W  Directory changes may not be effective until the directory cache is
DB:  => catalog db bludb at node bluemix
SQL1005N  The database alias "bludb" already exists in either the local
database directory or system database directory.
DB:  => catalog db bludb as bludb2 at node bluemix
DB20000I  The CATALOG DATABASE command completed successfully.
DB21056W  Directory changes may not be effective until the directory cache is
DB:  => connect to bludb2 user blu01xxx
Enter current password for blu01xxx:

   Database Connection Information

 Database server        = DB2/LINUXX8664 10.5.4
 SQL authorization ID   = BLU01xxx
 Local database alias   = BLUDB2

I will plan to develop a simple application using the DB2 in-memory database (BLU Acceleration / Analytics Warehouse) and then write about it. Until then read more about IBM Bluemix in my other related blog entries.

Tuesday, August 19, 2014

Sample IBM Bluemix Application: Extended IP logging with node.js and DB2

As I wrote in my last blog article I have been playing around with node.js and DB2 on IBM Bluemix. Now I am done with my first server-side JavaScript application which I named "db2visitors".

Screenshot of db2visitors index page
When accessing the main page (index page), the app logs the visitor's IP address. Then a free geo location service is called to obtain extended information, such as country, region and city information of the visitor. This is how Google Analytics and other services know more about a website user, with the exception that many have their own database. That extended IP record is then inserted into a DB2 table and the information displayed on a "DB2 got you" welcome page. To report back about from where the visitors to my demo application came, I created a second page that returns the aggregated country and region information for all visitors so far. How does it work? Let's look into the details.

First of all, I uploaded my sample code to my Github "db2visitors" repository, so that you can get an impression of the directory structure and files. It is pretty simple and you will notice that I am new to node.js. The application consists of a main file "app.js" that sets up the general infrastructure and defines the handling of so-called routes. A route basically determines what needs to be executed for a specific request to the Web server. When a user visits the main or index page, the node/Express framework would serve the request by calling my new function getIP(). That function is implemented in a file "db2access.js" in the "routes" directory. The current version as of now is shown below.

SQLDB Control Center with Table DDL
The function getIP obtains the visitor's IP address. Then it uses that information to provide it to an external IP address geo location service. That service returns a JSON structure with country, region, city and other data it has on file for the IP address or the related address block. Sometimes it nails it, (fortunately) in many cases only the country is correct. Anyway, it is a free service and you get an idea of Google and other advertising companies can do with your data. Once that extended record is available, we call into the function insertIP(). There, a DB2 connection is opened, a SQL statement prepared and executed to insert the IP address data into a relational table IP.VISITORS (schema shown on the right, it was created using the Bluemix SQLDB Control Center). Some form of basic IP logging. Last in the function getIP, a function render() is called with the parameter "index". It tells the rendering engine to process the HTML templates and page segments for the view "index". All the view-related files are in exactly that directory (see my Github repository).

The last function in the file db2access.js is listCountries(). It demonstrates some basic reporting, i.e., how to issue a DB2 query from node.js using the ibm_db API, fetch the result and pass it to a template page. That page is displayed when going to the path "/visits" (as defined in app.js). You can test it here. The query uses a simple GROUP BY to aggregate the country and region data.

That is all for today. Let me know if you have questions (comment, tweet or email) and read my other database on Bluemix-related blog entries.

1:  // Written by Henrik Loeser  
2:  // Some functions to retrieve extended information for a IP address,  
3:  // to insert that extended IP record into a DB2 table, and to  
4:  // report some basic statistics as part of a second HTML page.  
8:  // Insert the JSON record with the IP information as SQL record into DB2.  
9:  exports.insertIP = function(ibmdb,connString,ipinfo) {  
10:            console.log("insertIP called",ipinfo);      
11:  , function(err, conn) {  
12:               if (err ) {  
13:                res.send("error occurred " + err.message);  
14:               }  
15:               else {  
16:                // prepare the SQL statement  
17:                conn.prepare("INSERT INTO IP.VISITORS(vtime,ip,country_code,country,region_code,region,city,zip,latitude,longitude,metro,area) VALUES (current timestamp,?,?,?,?,?,?,?,?,?,?,?)", function(err, stmt) {  
18:                  if (err) {  
19:                   //could not prepare for some reason  
20:                   console.log(err);  
21:                   return conn.closeSync();  
22:                  }  
24:                  //Bind and Execute the statment asynchronously  
25:                  stmt.execute([ipinfo["ip"],ipinfo["country_code"],ipinfo["country_name"],ipinfo["region_code"],ipinfo["region_name"],ipinfo["city"],ipinfo["zipcode"], ipinfo["latitude"], ipinfo["longitude"],ipinfo["metro_code"],ipinfo["area_code"]], function (err, result) {  
26:                   console.log(err);  
27:                   // Close the connection to the database  
28:                   conn.close(function(){  
29:                    console.log("Connection Closed");  
30:                   });  
31:                  });  
32:                });  
33:              }  
34:          })};  
37:  // Get the caller's IP address from runtime environment and call  
38:  // geo location service to obtain extended data  
39:  exports.getIP=function(request,ibmdb,connString) {  
40:   return function(req, res) {  
41:   var ip = req.headers['x-client-ip'] || req.connection.remoteAddress;  
42:   var ipurl = '' + ip;  
43:   console.log("yes, called");  
44:   // fetch ip info   
45:   request.get( {  
46:    url: ipurl,  
47:    json : true},  
48:    function(error, response, body) {  
49:       var ipinfo;  
50:      if (!error) {  
51:        ipinfo=body;  
52:       // insert IP info into DB2  
53:       exports.insertIP(ibmdb,connString,ipinfo);  
54:       // finish by rendering the HTML page  
55:       res.render('index',{ ipinfo : ipinfo});  
56:       }  
57:     });  
58:  }  
59:  };  
63:  // Very simple country/region-based reporting done using GROUP BY.  
64:  exports.listCountries = function(ibmdb,connString) {  
65:          return function(req, res) {  
67:  , function(err, conn) {  
68:                  if (err ) {  
69:                   res.send("error occurred " + err.message);  
70:                  }  
71:                  else {  
72:                      conn.query("SELECT country, region, count(region) as rcount FROM ip.visitors group by country,region order by 3 desc", function(err, tables, moreResultSets) {  
75:                      if ( !err ) {   
76:                          res.render('tablelist', {  
77:                              "tablelist" : tables  
79:                           });  
82:                      } else {  
83:                        res.send("error occurred " + err.message);  
84:                      }  
86:                      /*  
87:                          Close the connection to the database  
88:                          param 1: The callback function to execute on completion of close function.  
89:                      */  
90:                      conn.close(function(){  
91:                          console.log("Connection Closed");  
92:                          });  
93:                      });  
94:                  }  
95:              } );  
97:          }  
98:      }  

Monday, August 18, 2014

Accessing DB2 from node.js on IBM Bluemix and locally

Some days ago I started experimenting with node.js. Other than JSON and some click functions on webpages I don't have much experience with JavaScript. The reason for this "adventure" is that node.js is offered as one of several programming languages on IBM Bluemix, IBM's platform-as-a-service (PaaS). I wanted to find out how complex or easy it would be to bring both node.js and DB2 (IBM's relational and in-memory database system) together.

When I start with some new language I typically produce errors. Thus I wanted to avoid pushing my app to Bluemix all the time, but instead wanted to test it locally first. Hence I downloaded and installed a local node.js environment, including the so-called node.js package manager "npm" first. npm allows you to install additional modules/code libraries. They are placed into the directory "node_modules". Within the program (or script), the modules are included and referenced via the "require" statement:

var express = require('express');

The above binds the "ExpressJS Web Application Framework for node". That framework is part of the node.js starter application on IBM Bluemix. With that basic application which is offered for download you can easily test whether the local installation work ok:

node app.js

The command which I executed in a regular shell launches the node.js runtime with the sample application. Based on the configuration it provides a small web application available on my laptop on port 3000. Accessing "" in my web browser shows the demo page. All ok.

To combine node.js and DB2 I require the DB2 database driver:

var ibmdb = require('ibm_db');

Just running the application again would return an error because the module has not been installed. Hence my next step in the command shell is:

npm install ibm_db

This invokes the node package manager and instructs it to download and install the IBM database client driver and related node.js API. Waiting for a minute it returned an error because it couldn't find the file "sqlcli1.h". This is an indicator that my local DB2 was missing the application development environment. Running "db2setup" again (as root), selecting "work with existing" and marking the application development package for installation solved the issue. After db2setup finished, I ran "npm install ibm_db" again and it was able to download, build and install that module.

To test my small app both locally and on Bluemix, I needed to obtain user and DB2 instance information for either the local environment or the Bluemix SQLDB service (DB2). This is done with the following code snippet (not that beauty as I just started...):

1:  // get DB2 SQLDB service information  
2:  function findKey(obj,lookup) {  
3:    for (var i in obj) {  
4:     if (typeof(obj[i])==="object") {  
5:       if (i.toUpperCase().indexOf(lookup) > -1) {  
6:        // Found the key  
7:        return i;  
8:       }  
9:       findKey(obj[i],lookup);  
10:     }  
11:    }  
12:    return -1;  
13:  }  
14:  var env = null;  
15:  var key = -1;  
16:  var db2creds=null;  
17:  if (process.env.VCAP_SERVICES) {  
18:     env = JSON.parse(process.env.VCAP_SERVICES);  
19:     key = findKey(env,'SQLDB');  
20:  }  
21:  if (!env) {  
22:    console.log("We are local");  
23:    var file = __dirname + '/db2cred.json';  
24:    try {  
25:     db2creds = require(file);  
26:    } catch(err) {  
27:     return {};  
28:    }  
29:  //  db2creds = JSON.parse(fileJSON);  
30:    console.log(db2creds);    
31:  } else {  
32:  var db2creds = env[key][0].credentials;  
34:  }  
35:  var connString = "DRIVER={DB2};DATABASE=" + db2creds.db + ";UID=" + db2creds.username + ";PWD=" + db2creds.password + ";HOSTNAME=" + db2creds.hostname + ";port=" + db2creds.port;  
37:  app.get('/db2', routes.db2test(ibmdb,connString));  

In the code I first search for the object with the Bluemix environment information. If it is not found the code assumes it is a local invocation. In that case the DB2 access information is loaded from the file "db2cred.json". It is a file I created in the application directory with a content like here:
Logo for my DB2 node.js app

    "hostname": "",
    "host": "",
    "port": 50000,
    "username": "hloeser",
    "password": "mytopsecretpassword",
    "db": "CLOUDDB"

The code uses the information about the hostname, port, and the user/password combination to create a connection string. That information together with the IBM Database Driver interface can be passed to a request handler in the node.js/Express runtime infrastructure (the "app.get()" call).

My small test application runs successfully both on my laptop as well as on IBM Bluemix. I plan to write more about it over the next days and to upload the code to my Github account. Bluemix-related posts can be accessed by this link.

Update: The follow-up article has been published here, showing geo IP lookup and logging into DB2.

Wednesday, August 13, 2014

Using some Workload Management for free in non-Advanced Editions of DB2

One of the new features of DB2 10.5 is BLU Acceleration. In introduces a couple of default Workload Management objects that are intended to control heavy queries running against column-organized tables. The objects are automatically created with every database, independent of the product edition. They are only enabled when DB2_WORKLOAD has been set to ANALYTICS before creating the database, i.e., a database for in-memory analytics is set up. But what is available for the regular guy like myself? What can be used for free and as foundation for some monitoring and understanding the system workload? Let's take a look.

Typically I use a DB2 Developer Edition which includes all features including WLM. So I removed the db2de license and organized (being IBMer has some benefits!) a Workgroup Server Edition (db2wse) which I added to the system using db2licm. I also turned on hard license enforcement, so that any attempts of using an unlicensend feature are directly blocked. Here is what db2licm returned thereafter:

mymachine> db2licm -l

Product name:                     "DB2 Workgroup Server Edition"
License type:                     "Authorized User Single Install"
Expiry date:                      "Permanent"
Product identifier:               "db2wse"
Version information:              "10.5"
Max amount of memory (GB):        "128"
Enforcement policy:               "Hard Stop"
Number of licensed authorized users: "25"

With that in place I created a new database named WLMTEST and connected to it. My first test was to create a workload object which should not be possible given my DB2 edition:

DB: WLMTEST => create workload freeride applname('xploit')
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL8029N  A valid license key was not found for the requested functionality.
Reference numbers: "".

Ok, this look right. I don't have a license to use DB2 WLM (Workload Manager). My next query was intended to check what service classes are present in my system.

DB: WLMTEST => select varchar(serviceclassname,30), varchar(parentserviceclassname,30), enabled from syscat.serviceclasses

1                              2                              ENABLED
------------------------------ ------------------------------ -------
SYSDEFAULTSYSTEMCLASS          -                              Y     
SYSDEFAULTMAINTENANCECLASS     -                              Y     
SYSDEFAULTUSERCLASS            -                              Y     

  7 record(s) selected.

The DB2 Knowledge Center has an overview of related default WLM objects and which parts can be modified with DBADM or WLMADM authority. Having the names of the system objects I tried my luck altering a work class set to reduce the cost barrier for the managed heavy queries (SYSMANAGEDQUERIES):

DB: WLMTEST => alter work class set sysdefaultuserwcs alter work class SYSMANAGEDQUERIES for timeroncost from 1000   
DB20000I  The SQL command completed successfully.

The threshold SYSDEFAULTCONCURRENT defines how many of those queries can run concurrently in the system. Why not change that threshold definition?

DB: WLMTEST => alter threshold SYSDEFAULTCONCURRENT when sqlrowsreturned > 20 stop execution
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL4721N  The threshold "SYSDEFAULTCONCURRENT" cannot be created or altered 
(reason code = "7").  SQLSTATE=5U037

Well, it seems that you cannot modify the entire threshold to your liking. However, following the documentation on what can be done, I successfully reduced the number of parallel activities.

DB20000I  The SQL command completed successfully.

To test the impact of my changes, I opened 4 different shells, connected to DB2 in each window, and more or less simultaneously executed the following query:

select * from syscat.tables,syscat.columns

I have to restate that I tried to execute it in all four windows. It only ran in three of them. Why? Because the threshold kicked in for this heavy query and stopped the execution for the 4th session ("concurrentdbcoordactivities> 3 stop execution"). So some basic workload management seems to work even without a license.

Can I change the threshold to force the application off, i.e., to not allow running the query?

DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL4721N  The threshold "SYSDEFAULTCONCURRENT" cannot be created or altered 
(reason code = "13").  SQLSTATE=5U037

No, changing the entire definition of the threshold is not possible, but at least parts of it can be modified. You can then use the adapted default WLM objects to better understand what work is running on your system, e.g., testing what would fall into the category of "heavy queries". As a last step, I used a monitoring function to return the CPU time spent by service subclass. Most was in the managed subclass into which my queries from above were mapped:

DB: WLMTEST => SELECT varchar(service_superclass_name,30) as service_superclass, varchar(service_subclass_name,30) as service_subclass, sum(total_cpu_time) as total_cpu, sum(app_rqsts_completed_total) as total_rqsts FROM TABLE(MON_GET_SERVICE_SUBCLASS('','',-2)) AS t GROUP BY service_superclass_name, service_subclass_name ORDER BY total_cpu desc

SERVICE_SUPERCLASS             SERVICE_SUBCLASS               TOTAL_CPU            TOTAL_RQSTS        
------------------------------ ------------------------------ -------------------- --------------------
SYSDEFAULTUSERCLASS            SYSDEFAULTSUBCLASS                          1207794                  552
SYSDEFAULTUSERCLASS            SYSDEFAULTMANAGEDSUBCLASS                    852547                    0
SYSDEFAULTMAINTENANCECLASS     SYSDEFAULTSUBCLASS                           466436                 1374
SYSDEFAULTSYSTEMCLASS          SYSDEFAULTSUBCLASS                                0                    0

  4 record(s) selected.

With that I leave more testing to you. Happy monitoring!

BTW: The same tests can also be done on the SQL DB service on IBM Bluemix. That service is a DB2 Enterprise Server Edition.

Friday, August 8, 2014

Manage a DB2 database on IBM Bluemix

I recently showed you how to set up a Python-based application that makes use of the SQL DB service (a.k.a. DB2) on IBM Bluemix. Today I give you a quick introduction into what you can do with the DB2 administration interface that is provided as part of the IBM Managed Database Service.

In short, the web interface offers to manage and monitor the database.
IBM Managed Database Service - Interface
Manage means that it is possible to drill down into existing database objects or, by running uploaded SQL scripts, execute DDL statements. Data can be loaded into tables, backup and restore are also offered. The monitor part focuses on watching which applications are connected to the database, what SQL statements have been executed and what is the status of existing tablespaces. A special monitor feature is to run Guardium reports.

Run DB2 DDL statements after uploading them

I tested the interface in order to create a couple tables for an application I plan to write, then load data. After I succeeded I verified that the objects were in the database, then took a quick tour through the monitoring options. All worked well, but I am missing the feature to run SQL statements or have more monitoring options like offered by the DB2 Technology Explorer. Based on the forum comments there is more to come.

You can try out the interface yourself by signing up for a free trial of IBM Bluemix. What did you do with DB2 on Bluemix so far?

Load data into DB2 tables

Tuesday, August 5, 2014

Using DB2 with Python on Cloud Foundry-Based IBM Bluemix

Wow, what a title for this article. Today I will describe exactly that, nothing more. Recently I created a SQLDB service (DB2) in my IBM Bluemix account. Looking at the documentation for the SQLDB service, it describes how to use DB2 for programming languages like Java, Ruby, and Node.js and how to bind the SQLDB service to Liberty instances. There is no word on Python which I wanted to use like in my previous tests. I got it to work nonetheless and here is what I did.

SQLDB / DB2 service on IBM Bluemix
Adding DB2 as a service to a Bluemix application is quite simple. Just click on "add service" and then pick "SQLDB". It is possible to bind it to an application if present. To access DB2 from Python IBM provides a library named "ibm_db". When testing with the buildpack I described in a previous article for my weather app, I ran into an error because installation of ibm_db requires the DB2 drivers accessible locally. Reading over the SQLDB documentation for Ruby I found that a special buildpack for DB2 is referenced. Digging into it I discovered that a similar builpack for Python and DB2 is provided on Github, but not really referenced. I succeeded in pushing my app to Bluemix using that special DB2/Python buildpack, the ibm_db library could be download and installed:
That library needs to be added as dependency to the file "requirements.txt". Cloud Foundry buildpacks for Python then try to download and install it to the runtime environment.

cf push welcomedb2 -b

I continued my test in trying to connect to my DB2 service from my Python script. I used similar code as in my CouchDB test to obtain the user/server credentials from the Bluemix runtime environment. To demonstrate successful access to DB2 the script utilizes the administrative function ENV_SYS_INFO to obtain information about the DB2 server environment, e.g., which operating system is used and what CPU and memory resources are available. That information is filled into a simple webpage as shown here:
DB2 server environment on IBM Bluemix

Last, here is the entire test script to access DB2 from Python in the IBM Bluemix environment:
 import os  
 from flask import Flask,redirect  
 import urllib  
 import datetime  
 import json  
 import ibm_db  
 app = Flask(__name__)  
 # get service information if on Bluemix  
 if 'VCAP_SERVICES' in os.environ:  
   db2info = json.loads(os.environ['VCAP_SERVICES'])['sqldb'][0]  
   db2cred = db2info["credentials"]  
 # connect to DB2  
 db2conn = ibm_db.connect("DATABASE="+db2cred['db']+";HOSTNAME="+db2cred['hostname']+";PORT="+str(db2cred['port'])+";UID="+db2cred['username']+";PWD="+db2cred['password']+";","","")  
 # main page to dump some environment information  
 def index():  
   page = '<title>Welcome DB2!</title>'  
   page += '<h1>Welcome DB2!</h1>'  
   if db2conn:  
    # we have a DB2 connection, so obtain system information via ENV_SYS_INFO:  
    stmt = ibm_db.exec_immediate(db2conn,"select host_name,os_name,os_version,total_cpus,configured_cpus, total_memory,os_kernel_version,os_arch_type, os_release,os_full_version from sysibmadm.env_sys_info")  
    # fetch the result  
    result = ibm_db.fetch_assoc(stmt)  
    page += "OS Name: "+result["OS_NAME"]+"<br/>OS Version: "+result["OS_VERSION"]   
    page += "<br/>Hostname: "+result["HOST_NAME"]+"<br/> Total CPUs: "+str(result["TOTAL_CPUS"])  
    page += "<br/>Configured CPUs: "+str(result["CONFIGURED_CPUS"])+"<br/>Total memory: "+str(result["TOTAL_MEMORY"])+" MB"  
    page += "<br/>OS Kernel Version: "+result["OS_KERNEL_VERSION"]+"<br/>OS Architecture Tpye: "+result["OS_ARCH_TYPE"]  
    page += "<br/>OS Release: "+result["OS_RELEASE"]+"<br/>OS full version: "+result["OS_FULL_VERSION"]  
   return page  
 port = os.getenv('VCAP_APP_PORT', '5000')  
 if __name__ == "__main__":'', port=int(port))