Monday, September 1, 2014

What a plot: DB2, R, and Bluemix help with vacation weather

Last week I reported on how I set up a in-memory DB2 database on IBM Bluemix and loaded some historic weather data. Over the last couple days I used some spare time to play with the Cloud-based analytic capabilities that are provided as part of the Softlayer/Bluemix/DB2 combination. Most of the time went into learning (some basics of) R, an environment for statistical computing and graphics. As an example I wanted to find out what temperatures to expect for a possible September vacation on the German island of Norderney.

For my small exercise I used data provided by the German Meteorological Service "Deutscher Wetterdienst". It allows to freely download and use (under some conditions) data from several decades of weather observation. I uploaded the data to DB2/Bluemix as described in my previous post.
Bluemix: Change DB2 column name and type
While playing with the data I noticed that the column names required escaping of quotes and the observation dates were stored as integer values (yyyymmdd). In a second upload I simplified the column names and adapted the column data type using the DB2 load wizard (see picture). Thereafter I was set for my experiments with R.

The DB2 Cloud environment provides several examples for programming in R, a special function library "bluR" to easily connect R with DB2-based data, and it features the RStudio to develop, test, and execute code in R. Within RStudio it is possible to execute several demos to learn more about analytics, graphing, and data processing. For the DB2 in-memory database API for R there is a demo as well. You can invoke it using the "demo(blur)" command:

DB2 API demo for R in RStudio
The demo shows how to connect to DB2, execute a query and use the fetched data for analytic processing in R. Last week I already tweeted about how I tested escaping of quote characters (use C style, not SQL style):



The data set which I uploaded to DB2 has daily minimum and maximum temperatures (and lots of other meteorological) for about 70 years. I used a SQL query and then the ggplot2 library to create a graphic. It shows the band for the minimum temperatures for each September day as well as the band for the maximum daily temperatures.
DB2 with R: Historic September temperatures
The code for this graphic is pretty simple (and I started last week looking at R and DB2):
1:  ########### R script to analyze historic weather data for min/max values  
2:  ## Written by Henrik Loeser  
3:  ## Connection handle con to BLU for Cloud data warehouse is provided already  
4:  ## For plotting, we are using ggplot2 package  
5:  ##   
6:  library(ggplot2)  
7:  library(bluR)  
8:    
9:  ## initialize DB2 connection and environment  
10:  con <- bluConnect("BLUDB","","")  
11:  bluAnalyticsInit(con)  
12:    
13:  ## query DB2 weather data and fetch min/max values of min/max values  
14:  ## (lower/upper boundary each)   
15:  query<-paste('select max(lufttemperatur_maximum) as maxmax,min(lufttemperatur_minimum) as minmin,min(lufttemperatur_maximum) as minmax,max(lufttemperatur_minimum) as maxmin,tag from (select lufttemperatur_maximum, lufttemperatur_minimum, day(mdatum) as tag from blu01023.klima where month(mdatum)=9) group by tag order by tag asc')   
16:  df <- bluQuery(query,as.is=F)  
17:    
18:  ## Some plotting needs to be done  
19:  jpeg(type='cairo',"tempe.jpg",width=800,height=600)   
20:  ggplot(df, aes(x = TAG))+ylab("Temperature")+xlab("Day")+          
21:     geom_ribbon(aes(ymin = MINMIN, ymax=MAXMIN), fill='blue')+  
22:     geom_ribbon(aes(ymin = MAXMAX, ymax=MINMAX), fill='red')+  
23:     geom_ribbon(aes(ymin = MAXMIN, ymax=MINMAX), fill='white')+  
24:     geom_line(aes(y = MINMIN), colour = 'black') +  
25:     geom_line(aes(y = MAXMIN), colour = 'black') +  
26:     geom_line(aes(y = MINMAX), colour = 'black') +  
27:     geom_line(aes(y = MAXMAX), colour = 'black')   
28:    
29:  sink('/dev/null')   
30:    
31:  bluClose(con)  
32:  ## connection is closed, we are done  


Pretty cool (my opinion)! I am already turning into a data scientist. And you can test this yourself on IBM Bluemix with the Analytics Warehouse service (DB2 in-memory database feature).



Monday, August 25, 2014

Setting up and using a DB2 in-memory database on IBM Bluemix

Last Friday I was on the way back from some customer visits. While traveling in a German highspeed train I used the Wifi service, connected to IBM Bluemix and created a DB2 in-memory database. Let me show you how I set it up, what you can do with it and how I am connecting to the cloud-based database from my laptop.


Unbound DB2 service on Bluemix
The first thing to know is that on Bluemix the DB2 in-memory database service is called IBM Analytics Warehouse. To create a database, you select "Add service" and leave it unbound if you want, i.e., it is not directly associated with any Bluemix application. That is ok because at this time we are only interested in the database. Once the service is added and the database itself created, you can lauch the administration console.

The console supports several administration and development tasks as show in the picture. It includes loading data, to develop analytic scripts in R, to execute queries and link the data with Microsoft Excel for processing in a spreadsheet, and it has a section to connect external tools or applications to the database.
Administration/development task in DB2 BLU console on Bluemix
One of the offered task is very interesting and I twittered about it on Friday, too:



You can set up replication from a Cloudant JSON database to DB2, so that the data stream is directly fed in for in-memory analyses. I didn't test it so far, but plan to do so with one of my other Bluemix projects.

A task that I used is to (up)load data. For this I took some historic weather data (planning ahead for a vacation location), let the load wizard extract the metadata to create a suitable data, and ran some queries.

Uploading data to DB2 on Bluemix

Specify new DB2 table and column names

For executing (simple) selects there is a "Run Query" dialogue. It allows to choose a table and columns and then generates a basic query skeleton. I looked into whether a specific German island had warm nights, i.e., a daily minimum temperature of over 20 degrees Celsius. Only 14 days out of several decades and thousands of data points qualified.







Last but not least, I connected my local DB2 installation and tools to the Bluemix/Softlayer-based instance. The "CATALOG TCPIP NODE" is needed t make the remote server and communication port known. Then the database is added. If you already have a database with the same name cataloged on the local system, it will give an error message as shown below. You can work around it by specifying an alias. So instead of calling the database BLUDB, I used BLUDB2. The final step was to connect to DB2 with BLU Acceleration in the cloud. And surprise, it uses a fixpack version that officially is not available yet for download...

DB:  => catalog tcpip node bluemix remote 50.97.xx.xxx server 50000
DB20000I  The CATALOG TCPIP NODE command completed successfully.
DB21056W  Directory changes may not be effective until the directory cache is
refreshed.
DB:  => catalog db bludb at node bluemix
SQL1005N  The database alias "bludb" already exists in either the local
database directory or system database directory.
DB:  => catalog db bludb as bludb2 at node bluemix
DB20000I  The CATALOG DATABASE command completed successfully.
DB21056W  Directory changes may not be effective until the directory cache is
refreshed.
DB:  => connect to bludb2 user blu01xxx
Enter current password for blu01xxx:

   Database Connection Information

 Database server        = DB2/LINUXX8664 10.5.4
 SQL authorization ID   = BLU01xxx
 Local database alias   = BLUDB2

I will plan to develop a simple application using the DB2 in-memory database (BLU Acceleration / Analytics Warehouse) and then write about it. Until then read more about IBM Bluemix in my other related blog entries.

Tuesday, August 19, 2014

Sample IBM Bluemix Application: Extended IP logging with node.js and DB2

As I wrote in my last blog article I have been playing around with node.js and DB2 on IBM Bluemix. Now I am done with my first server-side JavaScript application which I named "db2visitors".

Screenshot of db2visitors index page
When accessing the main page (index page), the app logs the visitor's IP address. Then a free geo location service is called to obtain extended information, such as country, region and city information of the visitor. This is how Google Analytics and other services know more about a website user, with the exception that many have their own database. That extended IP record is then inserted into a DB2 table and the information displayed on a "DB2 got you" welcome page. To report back about from where the visitors to my demo application came, I created a second page that returns the aggregated country and region information for all visitors so far. How does it work? Let's look into the details.

First of all, I uploaded my sample code to my Github "db2visitors" repository, so that you can get an impression of the directory structure and files. It is pretty simple and you will notice that I am new to node.js. The application consists of a main file "app.js" that sets up the general infrastructure and defines the handling of so-called routes. A route basically determines what needs to be executed for a specific request to the Web server. When a user visits the main or index page, the node/Express framework would serve the request by calling my new function getIP(). That function is implemented in a file "db2access.js" in the "routes" directory. The current version as of now is shown below.

SQLDB Control Center with Table DDL
The function getIP obtains the visitor's IP address. Then it uses that information to provide it to an external IP address geo location service. That service returns a JSON structure with country, region, city and other data it has on file for the IP address or the related address block. Sometimes it nails it, (fortunately) in many cases only the country is correct. Anyway, it is a free service and you get an idea of Google and other advertising companies can do with your data. Once that extended record is available, we call into the function insertIP(). There, a DB2 connection is opened, a SQL statement prepared and executed to insert the IP address data into a relational table IP.VISITORS (schema shown on the right, it was created using the Bluemix SQLDB Control Center). Some form of basic IP logging. Last in the function getIP, a function render() is called with the parameter "index". It tells the rendering engine to process the HTML templates and page segments for the view "index". All the view-related files are in exactly that directory (see my Github repository).

The last function in the file db2access.js is listCountries(). It demonstrates some basic reporting, i.e., how to issue a DB2 query from node.js using the ibm_db API, fetch the result and pass it to a template page. That page is displayed when going to the path "/visits" (as defined in app.js). You can test it here. The query uses a simple GROUP BY to aggregate the country and region data.

That is all for today. Let me know if you have questions (comment, tweet or email) and read my other database on Bluemix-related blog entries.

1:  // Written by Henrik Loeser  
2:  // Some functions to retrieve extended information for a IP address,  
3:  // to insert that extended IP record into a DB2 table, and to  
4:  // report some basic statistics as part of a second HTML page.  
5:    
6:    
7:    
8:  // Insert the JSON record with the IP information as SQL record into DB2.  
9:  exports.insertIP = function(ibmdb,connString,ipinfo) {  
10:            console.log("insertIP called",ipinfo);      
11:            ibmdb.open(connString, function(err, conn) {  
12:               if (err ) {  
13:                res.send("error occurred " + err.message);  
14:               }  
15:               else {  
16:                // prepare the SQL statement  
17:                conn.prepare("INSERT INTO IP.VISITORS(vtime,ip,country_code,country,region_code,region,city,zip,latitude,longitude,metro,area) VALUES (current timestamp,?,?,?,?,?,?,?,?,?,?,?)", function(err, stmt) {  
18:                  if (err) {  
19:                   //could not prepare for some reason  
20:                   console.log(err);  
21:                   return conn.closeSync();  
22:                  }  
23:    
24:                  //Bind and Execute the statment asynchronously  
25:                  stmt.execute([ipinfo["ip"],ipinfo["country_code"],ipinfo["country_name"],ipinfo["region_code"],ipinfo["region_name"],ipinfo["city"],ipinfo["zipcode"], ipinfo["latitude"], ipinfo["longitude"],ipinfo["metro_code"],ipinfo["area_code"]], function (err, result) {  
26:                   console.log(err);  
27:                   // Close the connection to the database  
28:                   conn.close(function(){  
29:                    console.log("Connection Closed");  
30:                   });  
31:                  });  
32:                });  
33:              }  
34:          })};  
35:    
36:    
37:  // Get the caller's IP address from runtime environment and call  
38:  // geo location service to obtain extended data  
39:  exports.getIP=function(request,ibmdb,connString) {  
40:   return function(req, res) {  
41:   var ip = req.headers['x-client-ip'] || req.connection.remoteAddress;  
42:   var ipurl = 'http://freegeoip.net/json/' + ip;  
43:   console.log("yes, called");  
44:   // fetch ip info   
45:   request.get( {  
46:    url: ipurl,  
47:    json : true},  
48:    function(error, response, body) {  
49:       var ipinfo;  
50:      if (!error) {  
51:        ipinfo=body;  
52:       // insert IP info into DB2  
53:       exports.insertIP(ibmdb,connString,ipinfo);  
54:       // finish by rendering the HTML page  
55:       res.render('index',{ ipinfo : ipinfo});  
56:       }  
57:     });  
58:  }  
59:  };  
60:    
61:    
62:    
63:  // Very simple country/region-based reporting done using GROUP BY.  
64:  exports.listCountries = function(ibmdb,connString) {  
65:          return function(req, res) {  
66:                  
67:            ibmdb.open(connString, function(err, conn) {  
68:                  if (err ) {  
69:                   res.send("error occurred " + err.message);  
70:                  }  
71:                  else {  
72:                      conn.query("SELECT country, region, count(region) as rcount FROM ip.visitors group by country,region order by 3 desc", function(err, tables, moreResultSets) {  
73:        
74:                                    
75:                      if ( !err ) {   
76:                          res.render('tablelist', {  
77:                              "tablelist" : tables  
78:                                
79:                           });  
80:    
81:                            
82:                      } else {  
83:                        res.send("error occurred " + err.message);  
84:                      }  
85:    
86:                      /*  
87:                          Close the connection to the database  
88:                          param 1: The callback function to execute on completion of close function.  
89:                      */  
90:                      conn.close(function(){  
91:                          console.log("Connection Closed");  
92:                          });  
93:                      });  
94:                  }  
95:              } );  
96:              
97:          }  
98:      }  


LinkWithin

Related Posts with Thumbnails