Tuesday, August 19, 2014

Sample IBM Bluemix Application: Extended IP logging with node.js and DB2

As I wrote in my last blog article I have been playing around with node.js and DB2 on IBM Bluemix. Now I am done with my first server-side JavaScript application which I named "db2visitors".

Screenshot of db2visitors index page
When accessing the main page (index page), the app logs the visitor's IP address. Then a free geo location service is called to obtain extended information, such as country, region and city information of the visitor. This is how Google Analytics and other services know more about a website user, with the exception that many have their own database. That extended IP record is then inserted into a DB2 table and the information displayed on a "DB2 got you" welcome page. To report back about from where the visitors to my demo application came, I created a second page that returns the aggregated country and region information for all visitors so far. How does it work? Let's look into the details.

First of all, I uploaded my sample code to my Github "db2visitors" repository, so that you can get an impression of the directory structure and files. It is pretty simple and you will notice that I am new to node.js. The application consists of a main file "app.js" that sets up the general infrastructure and defines the handling of so-called routes. A route basically determines what needs to be executed for a specific request to the Web server. When a user visits the main or index page, the node/Express framework would serve the request by calling my new function getIP(). That function is implemented in a file "db2access.js" in the "routes" directory. The current version as of now is shown below.

SQLDB Control Center with Table DDL
The function getIP obtains the visitor's IP address. Then it uses that information to provide it to an external IP address geo location service. That service returns a JSON structure with country, region, city and other data it has on file for the IP address or the related address block. Sometimes it nails it, (fortunately) in many cases only the country is correct. Anyway, it is a free service and you get an idea of Google and other advertising companies can do with your data. Once that extended record is available, we call into the function insertIP(). There, a DB2 connection is opened, a SQL statement prepared and executed to insert the IP address data into a relational table IP.VISITORS (schema shown on the right, it was created using the Bluemix SQLDB Control Center). Some form of basic IP logging. Last in the function getIP, a function render() is called with the parameter "index". It tells the rendering engine to process the HTML templates and page segments for the view "index". All the view-related files are in exactly that directory (see my Github repository).

The last function in the file db2access.js is listCountries(). It demonstrates some basic reporting, i.e., how to issue a DB2 query from node.js using the ibm_db API, fetch the result and pass it to a template page. That page is displayed when going to the path "/visits" (as defined in app.js). You can test it here. The query uses a simple GROUP BY to aggregate the country and region data.

That is all for today. Let me know if you have questions (comment, tweet or email) and read my other database on Bluemix-related blog entries.

1:  // Written by Henrik Loeser  
2:  // Some functions to retrieve extended information for a IP address,  
3:  // to insert that extended IP record into a DB2 table, and to  
4:  // report some basic statistics as part of a second HTML page.  
5:    
6:    
7:    
8:  // Insert the JSON record with the IP information as SQL record into DB2.  
9:  exports.insertIP = function(ibmdb,connString,ipinfo) {  
10:            console.log("insertIP called",ipinfo);      
11:            ibmdb.open(connString, function(err, conn) {  
12:               if (err ) {  
13:                res.send("error occurred " + err.message);  
14:               }  
15:               else {  
16:                // prepare the SQL statement  
17:                conn.prepare("INSERT INTO IP.VISITORS(vtime,ip,country_code,country,region_code,region,city,zip,latitude,longitude,metro,area) VALUES (current timestamp,?,?,?,?,?,?,?,?,?,?,?)", function(err, stmt) {  
18:                  if (err) {  
19:                   //could not prepare for some reason  
20:                   console.log(err);  
21:                   return conn.closeSync();  
22:                  }  
23:    
24:                  //Bind and Execute the statment asynchronously  
25:                  stmt.execute([ipinfo["ip"],ipinfo["country_code"],ipinfo["country_name"],ipinfo["region_code"],ipinfo["region_name"],ipinfo["city"],ipinfo["zipcode"], ipinfo["latitude"], ipinfo["longitude"],ipinfo["metro_code"],ipinfo["area_code"]], function (err, result) {  
26:                   console.log(err);  
27:                   // Close the connection to the database  
28:                   conn.close(function(){  
29:                    console.log("Connection Closed");  
30:                   });  
31:                  });  
32:                });  
33:              }  
34:          })};  
35:    
36:    
37:  // Get the caller's IP address from runtime environment and call  
38:  // geo location service to obtain extended data  
39:  exports.getIP=function(request,ibmdb,connString) {  
40:   return function(req, res) {  
41:   var ip = req.headers['x-client-ip'] || req.connection.remoteAddress;  
42:   var ipurl = 'http://freegeoip.net/json/' + ip;  
43:   console.log("yes, called");  
44:   // fetch ip info   
45:   request.get( {  
46:    url: ipurl,  
47:    json : true},  
48:    function(error, response, body) {  
49:       var ipinfo;  
50:      if (!error) {  
51:        ipinfo=body;  
52:       // insert IP info into DB2  
53:       exports.insertIP(ibmdb,connString,ipinfo);  
54:       // finish by rendering the HTML page  
55:       res.render('index',{ ipinfo : ipinfo});  
56:       }  
57:     });  
58:  }  
59:  };  
60:    
61:    
62:    
63:  // Very simple country/region-based reporting done using GROUP BY.  
64:  exports.listCountries = function(ibmdb,connString) {  
65:          return function(req, res) {  
66:                  
67:            ibmdb.open(connString, function(err, conn) {  
68:                  if (err ) {  
69:                   res.send("error occurred " + err.message);  
70:                  }  
71:                  else {  
72:                      conn.query("SELECT country, region, count(region) as rcount FROM ip.visitors group by country,region order by 3 desc", function(err, tables, moreResultSets) {  
73:        
74:                                    
75:                      if ( !err ) {   
76:                          res.render('tablelist', {  
77:                              "tablelist" : tables  
78:                                
79:                           });  
80:    
81:                            
82:                      } else {  
83:                        res.send("error occurred " + err.message);  
84:                      }  
85:    
86:                      /*  
87:                          Close the connection to the database  
88:                          param 1: The callback function to execute on completion of close function.  
89:                      */  
90:                      conn.close(function(){  
91:                          console.log("Connection Closed");  
92:                          });  
93:                      });  
94:                  }  
95:              } );  
96:              
97:          }  
98:      }  


Monday, August 18, 2014

Accessing DB2 from node.js on IBM Bluemix and locally

Some days ago I started experimenting with node.js. Other than JSON and some click functions on webpages I don't have much experience with JavaScript. The reason for this "adventure" is that node.js is offered as one of several programming languages on IBM Bluemix, IBM's platform-as-a-service (PaaS). I wanted to find out how complex or easy it would be to bring both node.js and DB2 (IBM's relational and in-memory database system) together.

When I start with some new language I typically produce errors. Thus I wanted to avoid pushing my app to Bluemix all the time, but instead wanted to test it locally first. Hence I downloaded and installed a local node.js environment, including the so-called node.js package manager "npm" first. npm allows you to install additional modules/code libraries. They are placed into the directory "node_modules". Within the program (or script), the modules are included and referenced via the "require" statement:

var express = require('express');

The above binds the "ExpressJS Web Application Framework for node". That framework is part of the node.js starter application on IBM Bluemix. With that basic application which is offered for download you can easily test whether the local installation work ok:

node app.js

The command which I executed in a regular shell launches the node.js runtime with the sample application. Based on the configuration it provides a small web application available on my laptop on port 3000. Accessing "http://127.0.0.1:3000" in my web browser shows the demo page. All ok.

To combine node.js and DB2 I require the DB2 database driver:

var ibmdb = require('ibm_db');

Just running the application again would return an error because the module has not been installed. Hence my next step in the command shell is:

npm install ibm_db

This invokes the node package manager and instructs it to download and install the IBM database client driver and related node.js API. Waiting for a minute it returned an error because it couldn't find the file "sqlcli1.h". This is an indicator that my local DB2 was missing the application development environment. Running "db2setup" again (as root), selecting "work with existing" and marking the application development package for installation solved the issue. After db2setup finished, I ran "npm install ibm_db" again and it was able to download, build and install that module.

To test my small app both locally and on Bluemix, I needed to obtain user and DB2 instance information for either the local environment or the Bluemix SQLDB service (DB2). This is done with the following code snippet (not that beauty as I just started...):


1:  // get DB2 SQLDB service information  
2:  function findKey(obj,lookup) {  
3:    for (var i in obj) {  
4:     if (typeof(obj[i])==="object") {  
5:       if (i.toUpperCase().indexOf(lookup) > -1) {  
6:        // Found the key  
7:        return i;  
8:       }  
9:       findKey(obj[i],lookup);  
10:     }  
11:    }  
12:    return -1;  
13:  }  
14:  var env = null;  
15:  var key = -1;  
16:  var db2creds=null;  
17:  if (process.env.VCAP_SERVICES) {  
18:     env = JSON.parse(process.env.VCAP_SERVICES);  
19:     key = findKey(env,'SQLDB');  
20:  }  
21:  if (!env) {  
22:    console.log("We are local");  
23:    var file = __dirname + '/db2cred.json';  
24:    try {  
25:     db2creds = require(file);  
26:    } catch(err) {  
27:     return {};  
28:    }  
29:  //  db2creds = JSON.parse(fileJSON);  
30:    console.log(db2creds);    
31:  } else {  
32:  var db2creds = env[key][0].credentials;  
33:    
34:  }  
35:  var connString = "DRIVER={DB2};DATABASE=" + db2creds.db + ";UID=" + db2creds.username + ";PWD=" + db2creds.password + ";HOSTNAME=" + db2creds.hostname + ";port=" + db2creds.port;  
36:    
37:  app.get('/db2', routes.db2test(ibmdb,connString));  
38:    
39:    

In the code I first search for the object with the Bluemix environment information. If it is not found the code assumes it is a local invocation. In that case the DB2 access information is loaded from the file "db2cred.json". It is a file I created in the application directory with a content like here:
Logo for my DB2 node.js app

{
    "hostname": "127.0.0.1",
    "host": "127.0.0.1",
    "port": 50000,
    "username": "hloeser",
    "password": "mytopsecretpassword",
    "db": "CLOUDDB"
}


The code uses the information about the hostname, port, and the user/password combination to create a connection string. That information together with the IBM Database Driver interface can be passed to a request handler in the node.js/Express runtime infrastructure (the "app.get()" call).

My small test application runs successfully both on my laptop as well as on IBM Bluemix. I plan to write more about it over the next days and to upload the code to my Github account. Bluemix-related posts can be accessed by this link.

Update: The follow-up article has been published here, showing geo IP lookup and logging into DB2.

Wednesday, August 13, 2014

Using some Workload Management for free in non-Advanced Editions of DB2

One of the new features of DB2 10.5 is BLU Acceleration. In introduces a couple of default Workload Management objects that are intended to control heavy queries running against column-organized tables. The objects are automatically created with every database, independent of the product edition. They are only enabled when DB2_WORKLOAD has been set to ANALYTICS before creating the database, i.e., a database for in-memory analytics is set up. But what is available for the regular guy like myself? What can be used for free and as foundation for some monitoring and understanding the system workload? Let's take a look.


Typically I use a DB2 Developer Edition which includes all features including WLM. So I removed the db2de license and organized (being IBMer has some benefits!) a Workgroup Server Edition (db2wse) which I added to the system using db2licm. I also turned on hard license enforcement, so that any attempts of using an unlicensend feature are directly blocked. Here is what db2licm returned thereafter:

mymachine> db2licm -l

Product name:                     "DB2 Workgroup Server Edition"
License type:                     "Authorized User Single Install"
Expiry date:                      "Permanent"
Product identifier:               "db2wse"
Version information:              "10.5"
Max amount of memory (GB):        "128"
Enforcement policy:               "Hard Stop"
Number of licensed authorized users: "25"

With that in place I created a new database named WLMTEST and connected to it. My first test was to create a workload object which should not be possible given my DB2 edition:

DB: WLMTEST => create workload freeride applname('xploit')
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL8029N  A valid license key was not found for the requested functionality.
Reference numbers: "".


Ok, this look right. I don't have a license to use DB2 WLM (Workload Manager). My next query was intended to check what service classes are present in my system.

DB: WLMTEST => select varchar(serviceclassname,30), varchar(parentserviceclassname,30), enabled from syscat.serviceclasses

1                              2                              ENABLED
------------------------------ ------------------------------ -------
SYSDEFAULTSUBCLASS             SYSDEFAULTSYSTEMCLASS          Y     
SYSDEFAULTSUBCLASS             SYSDEFAULTMAINTENANCECLASS     Y     
SYSDEFAULTSUBCLASS             SYSDEFAULTUSERCLASS            Y     
SYSDEFAULTMANAGEDSUBCLASS      SYSDEFAULTUSERCLASS            Y     
SYSDEFAULTSYSTEMCLASS          -                              Y     
SYSDEFAULTMAINTENANCECLASS     -                              Y     
SYSDEFAULTUSERCLASS            -                              Y     

  7 record(s) selected.


The DB2 Knowledge Center has an overview of related default WLM objects and which parts can be modified with DBADM or WLMADM authority. Having the names of the system objects I tried my luck altering a work class set to reduce the cost barrier for the managed heavy queries (SYSMANAGEDQUERIES):

DB: WLMTEST => alter work class set sysdefaultuserwcs alter work class SYSMANAGEDQUERIES for timeroncost from 1000   
DB20000I  The SQL command completed successfully.


The threshold SYSDEFAULTCONCURRENT defines how many of those queries can run concurrently in the system. Why not change that threshold definition?

DB: WLMTEST => alter threshold SYSDEFAULTCONCURRENT when sqlrowsreturned > 20 stop execution
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL4721N  The threshold "SYSDEFAULTCONCURRENT" cannot be created or altered 
(reason code = "7").  SQLSTATE=5U037


Well, it seems that you cannot modify the entire threshold to your liking. However, following the documentation on what can be done, I successfully reduced the number of parallel activities.

DB: WLMTEST => alter threshold SYSDEFAULTCONCURRENT when CONCURRENTDBCOORDACTIVITIES > 3 stop execution
DB20000I  The SQL command completed successfully.


To test the impact of my changes, I opened 4 different shells, connected to DB2 in each window, and more or less simultaneously executed the following query:

select * from syscat.tables,syscat.columns

I have to restate that I tried to execute it in all four windows. It only ran in three of them. Why? Because the threshold kicked in for this heavy query and stopped the execution for the 4th session ("concurrentdbcoordactivities> 3 stop execution"). So some basic workload management seems to work even without a license.

Can I change the threshold to force the application off, i.e., to not allow running the query?

DB: WLMTEST => alter threshold SYSDEFAULTCONCURRENT when CONCURRENTDBCOORDACTIVITIES > 2 force application
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL4721N  The threshold "SYSDEFAULTCONCURRENT" cannot be created or altered 
(reason code = "13").  SQLSTATE=5U037

No, changing the entire definition of the threshold is not possible, but at least parts of it can be modified. You can then use the adapted default WLM objects to better understand what work is running on your system, e.g., testing what would fall into the category of "heavy queries". As a last step, I used a monitoring function to return the CPU time spent by service subclass. Most was in the managed subclass into which my queries from above were mapped:

DB: WLMTEST => SELECT varchar(service_superclass_name,30) as service_superclass, varchar(service_subclass_name,30) as service_subclass, sum(total_cpu_time) as total_cpu, sum(app_rqsts_completed_total) as total_rqsts FROM TABLE(MON_GET_SERVICE_SUBCLASS('','',-2)) AS t GROUP BY service_superclass_name, service_subclass_name ORDER BY total_cpu desc

SERVICE_SUPERCLASS             SERVICE_SUBCLASS               TOTAL_CPU            TOTAL_RQSTS        
------------------------------ ------------------------------ -------------------- --------------------
SYSDEFAULTUSERCLASS            SYSDEFAULTSUBCLASS                          1207794                  552
SYSDEFAULTUSERCLASS            SYSDEFAULTMANAGEDSUBCLASS                    852547                    0
SYSDEFAULTMAINTENANCECLASS     SYSDEFAULTSUBCLASS                           466436                 1374
SYSDEFAULTSYSTEMCLASS          SYSDEFAULTSUBCLASS                                0                    0

  4 record(s) selected.


With that I leave more testing to you. Happy monitoring!

BTW: The same tests can also be done on the SQL DB service on IBM Bluemix. That service is a DB2 Enterprise Server Edition.

LinkWithin

Related Posts with Thumbnails