Data Henrik: 2014

Thursday, December 18, 2014

Data and 2014: Security and privacy

some bits

With security and privacy being one of the top topics for data processing in this year (and probably the next year too), I wanted my last post for this year to be spot on. Here is some input for DB2, dashDB, or the SQLDB service on Bluemix, the output is for you:

values(decrypt_char(cast (x'08D030FFB804A5D560F6F794046C11F329C9004B80EC3159' as varchar(50) for bit data),'santaclaus'));

BTW: This is not a quiz... :)

Friday, December 12, 2014

New fixpack for DB2 10.5 brings in-memory analytics to Windows and zLinux

The new DB2 10.5 Fixpack 5 is available since today. A high-level overview of new features and enhancements can be found in the fixpack summary in the DB2 Knowledge Center. The list of all available DB2 fixpacks is available in the IBM Support Portal for DB2. There you will also find the links to download this new fixpack and a list of fixed bugs.

After this introduction I would like to point out two product enhancements that are included in this fixpack:

As you may know, "BLU Acceleration" is the technology codename for highly optimized in-memory analytics that is deeply integrated into the supported platforms. It is not just another column store, but optimizes the data flow from disk to the CPU registers to efficiently use the available processing power and memory resources. DB2 is also exploiting special CPU instruction sets, e.g., on the POWER platform, for faster data processing. With the fixpack 5 this technology is available now on Microsoft Windows and for Linux on zSeries.

Another feature enhancement is the new ability to specify which network interface cards (NICs) DB2 should use, if you have multiple. A new file nicbinding.cfg can be used to set up the bindings. If you had to deal with db2nodes.cfg before, then the syntax will look familiar.

That's all for my quick summary. Enjoy the weekend AND DB2.

Wednesday, December 3, 2014

Introduction and resources for migrating from Oracle to DB2

Well, a boring headline for an interesting topic. Originally I had planned to title today's blog entry "Forget about Black Friday and Cyber Monday - Save big by moving to DB2". But then writing this entry has dragged on for some days...

Database Conversion Workbench

In the past weeks I have been asked several times about how to migrate off Oracle and on to DB2. Let me give you a quick overview of the technical side, for the financial part you have to ask an IBM business partner or an IBM seller. Although you can move to DB2 from database systems like Microsoft SQL Server, MySQL, and others, I will focus on Oracle because of the compatibility features built into DB2.

When moving off Oracle this could be for a SAP system (or other vendor application) or other applications ("non-SAP"). For SAP environments and applications from several other software vendors there is a (kind of) standardized process to migrate a system. The reason is that there are database-specific definitions and feature exploitations. A great example is how SAP is making use of the DB2-only BLU Acceleration to boost performance for analytic environments. Many software vendors provide tools for database migration and related processes or services.

For custom scenarios where the application code is available, a manual migration applies. The traditional barrier to a migration, the (more or less) incompatibility of products, has been torn down by adding compatibility features to DB2. Some of those features come ready to use by any user, some require specific preparation of DB2 because they may impact the traditional handling/"look and feel". The DB2 Knowledge Center has a good introduction and overview into that topic: "DB2 Compatibility Features". If you are comping to DB2 with a background in Oracle then use the Terminology Mapping to discover how products, features, and commands are named in the world of DB2.

From release to release there have been several enhancements to the SQL compatibility with database vendors such as Oracle. An overview by release can be found in the related "What's New" section of each of the recent DB2 releases:

I have to point out that the features mentioned in the linked documents are only related to the SQL language, but that there have been several other features dedicated to making a transition from Oracle to DB2 as smooth as possible. Some of them are covered in the section "Application development enhancements":

New features and enhancements related to application development in DB2 9.7
Application development improvements in DB2 10.1
The related section for DB2 10.5 (only JSON support listed)

If you prefer a book instead of the DB2 Knowledge Center, then I recommend the IBM Redbook "Oracle to DB2 Conversion Guide: Compatibility Made Easy". It gives an overview of DB2, the tools needed for a migration in a non-SAP environment, and the conversion approach. In the appending you will also find a nice terminology mapping, i.e., explaining how Oracle commands and features are named in the world of DB2.

A key tool for custom migrations is the Database Conversion Workbench (DCW). It is a plugin into the IBM Data Studio, a free tool for database administration, design, and SQL development. The DCW allows to analyze a database schema with respect to DB2 compatibility. The result is a valuable foundation for estimating the overall conversion effort. Once that is done (or not needed), the Database Conversion Workbench helps in the process of moving the database schema, database-side SQL packages, and thereafter the data from another database system to DB2. DCW also includes a package visualizer to help understand package dependencies which simplifies the code conversion. See this page for an overview and more resources around the Database Conversion Workbench.

An important DB2 feature related to the compatibility is the ability to run PL/SQL code. Several administrative PL/SQL packages ship with DB2 which can be found in the "built-in modules" section. Moreover, there are also some PL/SQL packages that can be used "as is" and are available from IBM developerWorks in the database conversion community: See here for the add-on PL/SQL packages.
That's all for now with my overview of resources for the Oracle to DB2 migration. I hope that it provides a good introduction into that (sometimes confusing) topic.

BTW: I have been covering many migration-related topics in my blog. Serge Rielau and now Rick Swagerman have provided many SQL tips for DB2 in their blog.

Thursday, November 20, 2014

Useful DB2 administrative functions and views

Did you know that there are about 80 (eight-zero) administrative views in the SYSIBMADM schema in DB2 that are ready for use? I have used several of them and also looked into the documentation, but 80 is quite a lot. (Almost) All of them are documented in the DB2 Knowledge Center in the "Built-in routines and views" section.

The routines live in the SYSPROC schema, administrative views can be found in the schema SYSIBMADM. Given that insight it is easy to construct a simple query to find all available views:

SELECT viewname from syscat.views where viewschema='SYSIBMADM'

Depending on your version and fixpack level of DB2 the result will vary. Speaking of fixpack level, do you know how to find out what your system is running by using SQL? The view ENV_INST_INFO may help in that case because it returns instance-related information such as the instance name, the DB2 version, fixpack, and build level:

SELECT * FROM SYSIBMADM.ENV_INST_INFO

Are you connected to, e.g., an Advanced Workgroup Server Edition (AWSE) of DB2 or an Enterprise Server Edition (ESE)? Find out by querying the product information using the view ENV_PROD_INFO. It returns the installed product, the kind of active licenses, and more:

SELECT * FROM SYSIBMADM.ENV_PROD_INFO

Next in the list of useful views with system information is ENV_SYS_INFO. It can be utilized to find out more about the operating system, the type of hardware, installed CPU and memory, etc.:

SELECT * from SYSIBMADM.ENV_SYS_INFO

Last, but not least in my list of views with basic system information are DBMCFG and DBCFG. As the name implies can these views help to retrieve the current instance (database manager / dbm) or the current database (db) configuration. So it is easy to find out whether the self-tuning memory manager (STMM) is active or where diagnostic logs are stored.

That's it for today, I am back to playing with more of those views (and routines)...

Monday, November 17, 2014

A quick look at dashDB and a happy SQuirreL

dashDB slogan on its website

This morning I took some time to take a look at dashDB, a new IBM DWaaS (Data Warehouse as a Service) offering. When you go to the dashDB website, you are offered two choices: Use the dashDB service available on IBM Bluemix or use a Cloudant account to add a warehouse to your JSON database. Let me give you a brief overview of what you can do with dashDB and how I connected a local (open source) SQuirreL SQL client to my new dashDB database.

Cloudant Warehousing (dashDB)

dashDB is a cloud-based analytics database ("analytics in a dash")) with roots in Netezza and DB2 with BLU Acceleration. Data is stored in table (rows and columns) format. It is ready to connect all kinds analytic tools, local or cloud-based, and is already set up for geo-spatial data analysis (instructions on how to use the ESRI ArcGIS Desktop are provided). The best is that your regular SQL database/analytic tools continue to work, see below for details.

dashDB: schema discovery

I started my journey by logging into my existing Cloudant account. There, on the dashboard menu is a new item "Warehousing". When clicking on the "New Warehouse" button, you can select the Cloudant databases that you want to import into the warehouse. Because multiple databases can be associated with a Cloudant account or a Bluemix Cloudant service, this step let's you pick the data of choice. After the source data is chosen, the dashDB database is created and so-called schema discovery turns the JSON documents into rows of tables. Thereafter, the data is ready to have analytics applied. That is the time to launch the dashDB control center, another so-called "dashboard".

The welcome screen shows some of the analytic options, e.g., the database is ready to be used with either Cognos, SPSS, InfoSphere DataStage, R scripts, or all of them and more:

Analytis for dashDB: Cognos, SPSS, DataStage, R

SQuirrel SQL client - dashDB connected

Because some time ago I already tested and blogged about a predecessor of dashDB (see here: how to set it up and how to use R), I was more interested in trying out a JDBC-based client with my new cloud-based data warehouse. Included as part of the dashboard are several sections that help you with the application setup. So it was easy for me to obtain the JDBC URL and configure it and the listed userid/password in my local SQuirrel SQL client (it will work in IBM Data Studio and the Optim tool, too). As you can see from the screenshot, the database connection from my laptop to the cloud-based dashDB succeeded. Ready for some SQL.

My lessons learned from testing database queries on the converted data (JSON to relational) will be part of another blog entry. Stay tuned...

Friday, October 24, 2014

Apply DB2 skills to Hadoop by using Big SQL on Bluemix

IBM Analytics for Hadoop

One of the many services offered on the Platform-as-a-Service (PaaS) IBM Bluemix is "IBM Analytics for Hadoop", basically InfoSphere BigInsights as cloud service. Because it is a Big Data service and it offers SQL capabilities I was eager to test it. Here is a first report how I got some queries running against data on my Hadoop file system in the cloud.

Software for download

After creating the Analytics for Hadoop service (which I left unbound) I launched the dashboard.
The first I noticed is a link to download software. I brought up a long list of drivers, Eclipse plugins, API packages and so forth. If you want to explore the full power of BigInsights / Hadoop you don't need to search for all the relevant and compatible software, it's just a click away. I passed on that offer and instead created new directories in the HDFS. I read in the BigInsights tutorials that the directories are needed to hold my data and to create workbooks or so-called BigSheets to transform data from files to something processable.

Creating the directories is done by clicking on the appropriate icon on top of the directory tree structure, picking the right parent directory and specifying a name. Been there, done that. I then used another icon to invoke the GUI for file upload. Few minutes later I had uploaded two files with historic weather data (see my older blog entries) and was in the BigSheets section of the dashboard, ready to create a workbook. I named it "weather" and also chose a file. Because the input file is not a CSV by definition but uses a semicolon to separate data, I had to apply the "Character Delimited Data with Text Qualifier" reader to it. I specified the semicolon as delimiter and also checked the "header included", i.e., the first row included the column names.

New BigInSights / Hadoop directory and uploaded files

After creating the workbook I clicked on the option "Create Table" and then experienced a "lesson learned":

Lesson learned: creating a table over a workbook in #BigInsights can be done for directories, not files. "sheet.weather" (not shit weather)
— Henrik Loeser (@data_henrik) October 23, 2014

So I created another directory, moved the files in there, deleted the workbook and wanted to create it again. However, I got an error message. Another lesson learned:

Lesson learned: When deleting a workbook from #BigSheets, first purge, then recreate. #bigdata #hadoop #bluemix
— Henrik Loeser (@data_henrik) October 23, 2014

Ok, purge the deleted workbook, then recreate it. Done. I had read that I can share the data of a workbook for use by Big SQL and Hive by creating a table based on the workbook. In my test I created "sheets.weather" as a first table. I also created a second workbook based on my first, removed some columns, and created a second table "sheets.myweather" (not too creative in naming stuff...). Anyway, the main purpose of this exercise was to import data and to get ready for SQL queries.

Next in my journey was to actually using what is called Big SQL, running SQL queries against data in my Hadoop file system. The Analytics for Hadoop service on Bluemix offers a basic SQL web console for this. As an alternative you can also issue queries from applications via JDBC (see software download above) or use other tools. As I was short on time and didn't want install any software, I decided to use the provided SQL console. When executing SQL statements you have the choice of "Big SQL" and "Big SQL V1". I picked the new "Big SQL" as it is based on the DB2 query compiler and runtime infrastructure. This comes in handy when you are unsure about available tables, column names etc. As you can see from the first screenshot below, I started by querying the Big SQL (DB2) system catalog to retrieve my available table names. The second screenshot shows a simple SQL query against the uploaded weather data.

Overall, given my short amount of time I had, it was a nice experience. Having the opportunity to apply DB2 skills towards data in a Hadoop file system reduced the barrier. Given that the data is already uploaded, I am sure I will try out other features of InfoSphere BigInsights / IBM Analytics for Hadoop, too. Have you tested it or Bluemix in general? You can sign up for a free trial (no credit card required) at http://bluemix.net.

DB2 catalog tables for Hadoop

Basic SQL query against Hadoop data

Monday, October 13, 2014

Node-RED: Simple "phoney" JSON entries in Cloudant

Node-RED Starter
on IBM Bluemix

Want to automatically store data about who called you on which phone number? Last Friday tried exactly that on IBM Bluemix and was amazed how simple it is, no real programming involved. All I needed was the Node-RED starter boilerplate (icon is on the right) on Bluemix and a new API of my mobile service provider.

The Node-RED boilerplate automatically creates a Node.js runtime environment on Bluemix and installs the Node-RED tool into it. In addition, a Cloudant JSON database is created. Once everything is deployed I opened the Node-RED tool in a Web browser. It offers a basic set of different input and output methods, processing nodes, and the ability to connect them in a flow graph. One of the input nodes is a listener for http requests. They help to react to Web service requests. I placed such http input node on the work sheet and labeled it "phone" (see screenshot).

Node-RED tool on Bluemix

How did I obtain information about callers and the numbers they called? ~~Well, I tapped my friends at a security agency.~~ I used a new free service offering at one of my phone service providers. It is called "sipgate.io" and allows to configure an URL/Web address that is accessed whenever someone calls one of the account's phone numbers (you could have multiple phone numbers). An http POST request is sent to the configured URL and the caller's phone number and the called number are included as payload. In my Node-RED application the "phone" node would answer to this request.

What I needed now was the data processing flow of the Web services request. On Friday I already tweeted the entire flow:

Easy. Cool. #NodeRED used to store #caller info coming from #sipgate API into #Cloudant on #Bluemix pic.twitter.com/MFkqFAdp5D
— Henrik Loeser (@data_henrik) October 10, 2014

On the left we see "phone" node as http input. Connected to it is the "ok" node which sends an http response back, telling the phone company's Web services that we received the information. The other connected node is a "json" processor which translates the payload (who called which number) into a meaningful JSON object. That object is then moved on to the "calls" node, a Cloudant output node. All we needed was to select the Cloudant service on Bluemix and to configure the database name.

Cloudant Output Node, Node-RED on Bluemix

The magic itself was pressing the "Deploy" button on top of my Node-RED worksheet. It created the Node.js code for my Bluemix app. I was a little bit nervous about testing the app because I didn't code anything, just clicking. For my test I took the phone in my home office and dialed the number of my mobile phone number. The result? A new, nice and shiny, "phoney" JSON document in Cloudant:

"Phoney" record in Cloudant

Almost too easy. Great stuff, but unfortunately no code to share... :) (You can find the flow on Github)

Tuesday, October 7, 2014

Starvation: Electronic books, DRM, the local library, and database locks

Over the past days I ran into an interesting database problem. It boils down to resource management and database locks. One of my sons is an avid reader and thus we have an ongoing flow of hardcopy and electronic books, most of them provided by the local public library (THANK YOU!). Recently, my son used the electronic library to place a reservation on a hard-to-get ebook. Yesterday, he received the email that the book was available exclusively to him (intention lock) and to be checked out within 48 hours (placing the exclusive lock). And so my problems began...

Trouble lending an ebook

There is a hard limit on the maximum number of checked out ebooks per account. All electronic books are lent for 14 days without a way to return them earlier because of Digital Rights Management (DRM). If the account is maxed out, lending a reserved book does not work. Pure (teenage) frustration. However, there is an exclusive lock on the book copy and nobody else can lend it either, making the book harder to get and (seemingly) even more popular. As consequence more reservation requests are placed, making the book even harder to lend. In database theory this is called starvation effect or resource starvation. My advise of "read something else" is not considered a solution.

How could this software problem be solved? A change to DRM to allow earlier returns seems to be too complex. As there is also a low limit for open reservation requests per account, temporarily bumping up the number of books that can be lent per account would both solve the starvation effect and enhance the usability. It would even increase the throughput (average books out to readers), would reduce lock waits (trying to read a certain book), and customer feedback.

BTW: The locklist configuration in DB2 (similar to the number of books lent per account) is adapted automatically by the Self Tuning Memory Manager (STMM), for easy of use, for great user/customer feedback.

Monday, September 22, 2014

Enforce backup encryption with encrlib and encropts

What may sound like sore throat, "encrlib/encropts", in reality is a really useful combination of new configuration parameters for DB2 in the Cancun release. They allow the security administrator (SECADM) to enforce encryption of database backups.

The database configuration parameter "encrlib" can be pointed to the encryption library by providing the file path. Only the security administrator is allowed to change the configuration. Once set, the library is automatically used for every database backup. The configuration variable "encropts" can hold additional parameters needed for the encryption (library). Again, only SECADM can change the value.

If you have a database encryption toolkit such as InfoSphere Guardium Data Encryption in use, then the new options provide a simple, auditable way for the security administrator to make sure, database backups are secure, too.

Tuesday, September 2, 2014

New DB2 Cancun Release (Version 10.5 Fixpack 4) offers many enhancements

The fixpack 4 for DB2 10.5 is available since end of last week. Because it has an unusually long list of product enhancements and new features the fixpack even has the codename or nickname "Cancun Release". For those of you not too familiar with Northern American vacation culture, Cancún is a favorite vacation/tourist destination in Mexico, located at the Carribean Sea. So "Cancun Release" may suggest relaxation, recreation, and a dream come true because of ease of use, simplification and major performance enhancements for the DB2 in-memory database feature (BLU Acceleration), the broadened pureScale support, and other nice to haves.

A good start for approaching the new DB2 Cancun release is the fixpack summary in the Knowledge Center. It lists new features by category, my personal highlights are:

For the in-memory database support (referred to as "column-organized tables" and known as "BLU Acceleration" some bigger items include so-called shadow table to improve analytic queries in an OLTP environment, lifting of several DDL restrictions, and major performance improvement by adding CHAR and VARCHAR columns to the synopsis table. An in-memory database can be made highly available with the HADR feature.
DB2 pureScale clusters can be deployed in virtualized environments (VMware ESXi, KVM), on low-cost solutions without the RDMA requirement, and geographically dispersed cluster (2 data centers) can be implemented on AIX, Red Hat, SuSE with just RoCE as requirement.
As part of the SQL compatibility DB2 now supports string length definitions by characters, not just by bytes as before.
Installation of DB2 in so-called thin server instances.
A SECADM can enforce encryption of backups.
db2audit can be used to transfer audit records to syslog for simpler analyzation with, e.g., Splunk.
db2look has been improved to generate the CREATE DATABASE statement and export the configuration (see my earlier blog article on that db2look improvement in DB2 10.1).
Official support for POWER8.

I plan to blog about some of the new functionality over the next weeks. Until then you can take a look at the new items yourself. Fixpacks can be downloaded from this IBM support website. If you have an IBM Bluemix account or plan to create one, you can use the improved DB2 as part of the Bluemix Analytics Warehouse service. Check out my earlier post about how to set it up and connect to it using a local DB2CLP.

Last but not least: What is your favorite vacation destinations? Suggest new codenames as comment and don't forget new DB2 features you want to see...

Monday, September 1, 2014

What a plot: DB2, R, and Bluemix help with vacation weather

Last week I reported on how I set up a in-memory DB2 database on IBM Bluemix and loaded some historic weather data. Over the last couple days I used some spare time to play with the Cloud-based analytic capabilities that are provided as part of the Softlayer/Bluemix/DB2 combination. Most of the time went into learning (some basics of) R, an environment for statistical computing and graphics. As an example I wanted to find out what temperatures to expect for a possible September vacation on the German island of Norderney.

[Update 2014-11-04: The Analytics Warehouse service on Bluemix is now called dashDB]

For my small exercise I used data provided by the German Meteorological Service "Deutscher Wetterdienst". It allows to freely download and use (under some conditions) data from several decades of weather observation. I uploaded the data to DB2/Bluemix as described in my previous post.

Bluemix: Change DB2 column name and type

While playing with the data I noticed that the column names required escaping of quotes and the observation dates were stored as integer values (yyyymmdd). In a second upload I simplified the column names and adapted the column data type using the DB2 load wizard (see picture). Thereafter I was set for my experiments with R.

The DB2 Cloud environment provides several examples for programming in R, a special function library "bluR" to easily connect R with DB2-based data, and it features the RStudio to develop, test, and execute code in R. Within RStudio it is possible to execute several demos to learn more about analytics, graphing, and data processing. For the DB2 in-memory database API for R there is a demo as well. You can invoke it using the "demo(blur)" command:

DB2 API demo for R in RStudio

The demo shows how to connect to DB2, execute a query and use the fetched data for analytic processing in R. Last week I already tweeted about how I tested escaping of quote characters (use C style, not SQL style):

For escaping quotes in #SQL statements for #DB2 in #RStudio/#rstats use C style, not SQL. (from my #Bluemix tests) pic.twitter.com/XslDgxZU7L
— Henrik Loeser (@data_henrik) August 29, 2014

The data set which I uploaded to DB2 has daily minimum and maximum temperatures (and lots of other meteorological) for about 70 years. I used a SQL query and then the ggplot2 library to create a graphic. It shows the band for the minimum temperatures for each September day as well as the band for the maximum daily temperatures.

DB2 with R: Historic September temperatures

The code for this graphic is pretty simple (and I started last week looking at R and DB2) and available from my Github account:

1:  ########### R script to analyze historic weather data for min/max values  
2:  ## Written by Henrik Loeser  
3:  ## Connection handle con to BLU for Cloud data warehouse is provided already  
4:  ## For plotting, we are using ggplot2 package  
5:  ##   
6:  library(ggplot2)  
7:  library(bluR)  
8:    
9:  ## initialize DB2 connection and environment  
10:  con <- bluConnect("BLUDB","","")  
11:  bluAnalyticsInit(con)  
12:    
13:  ## query DB2 weather data and fetch min/max values of min/max values  
14:  ## (lower/upper boundary each)   
15:  query<-paste('select max(lufttemperatur_maximum) as maxmax,min(lufttemperatur_minimum) as minmin,min(lufttemperatur_maximum) as minmax,max(lufttemperatur_minimum) as maxmin,tag from (select lufttemperatur_maximum, lufttemperatur_minimum, day(mdatum) as tag from blu01023.klima where month(mdatum)=9) group by tag order by tag asc')   
16:  df <- bluQuery(query,as.is=F)  
17:    
18:  ## Some plotting needs to be done  
19:  jpeg(type='cairo',"tempe.jpg",width=800,height=600)   
20:  ggplot(df, aes(x = TAG))+ylab("Temperature")+xlab("Day")+          
21:     geom_ribbon(aes(ymin = MINMIN, ymax=MAXMIN), fill='blue')+  
22:     geom_ribbon(aes(ymin = MAXMAX, ymax=MINMAX), fill='red')+  
23:     geom_ribbon(aes(ymin = MAXMIN, ymax=MINMAX), fill='white')+  
24:     geom_line(aes(y = MINMIN), colour = 'black') +  
25:     geom_line(aes(y = MAXMIN), colour = 'black') +  
26:     geom_line(aes(y = MINMAX), colour = 'black') +  
27:     geom_line(aes(y = MAXMAX), colour = 'black')   
28:    
29:  sink('/dev/null')   
30:    
31:  bluClose(con)  
32:  ## connection is closed, we are done

Pretty cool (my opinion)! I am already turning into a data scientist. And you can test this yourself on IBM Bluemix with the Analytics Warehouse service (DB2 in-memory database feature).

Monday, August 25, 2014

Setting up and using a DB2 in-memory database on IBM Bluemix

[Update 2014-11-04: The Analytics Warehouse service on Bluemix is now called dashDB.]
Last Friday I was on the way back from some customer visits. While traveling in a German highspeed train I used the Wifi service, connected to IBM Bluemix and created a DB2 in-memory database. Let me show you how I set it up, what you can do with it and how I am connecting to the cloud-based database from my laptop.

Sitting in #train with 300 km/h and creating #DB2 in-memory #database with #Bluemix http://t.co/OzJqLzNDFr
— Henrik Loeser (@data_henrik) August 22, 2014

Unbound DB2 service on Bluemix

The first thing to know is that on Bluemix the DB2 in-memory database service is called IBM Analytics Warehouse. To create a database, you select "Add service" and leave it unbound if you want, i.e., it is not directly associated with any Bluemix application. That is ok because at this time we are only interested in the database. Once the service is added and the database itself created, you can lauch the administration console.

The console supports several administration and development tasks as show in the picture. It includes loading data, to develop analytic scripts in R, to execute queries and link the data with Microsoft Excel for processing in a spreadsheet, and it has a section to connect external tools or applications to the database.

Administration/development task in DB2 BLU console on Bluemix

One of the offered task is very interesting and I twittered about it on Friday, too:

Great! You can even import #JSON #data from @Cloudant into #DB2 in-memory db service in #Bluemix pic.twitter.com/BoLS0YM7nH
— Henrik Loeser (@data_henrik) August 22, 2014

You can set up replication from a Cloudant JSON database to DB2, so that the data stream is directly fed in for in-memory analyses. I didn't test it so far, but plan to do so with one of my other Bluemix projects.

A task that I used is to (up)load data. For this I took some historic weather data (planning ahead for a vacation location), let the load wizard extract the metadata to create a suitable data, and ran some queries.

Uploading data to DB2 on Bluemix

Specify new DB2 table and column names

For executing (simple) selects there is a "Run Query" dialogue. It allows to choose a table and columns and then generates a basic query skeleton. I looked into whether a specific German island had warm nights, i.e., a daily minimum temperature of over 20 degrees Celsius. Only 14 days out of several decades and thousands of data points qualified.

Last but not least, I connected my local DB2 installation and tools to the Bluemix/Softlayer-based instance. The "CATALOG TCPIP NODE" is needed t make the remote server and communication port known. Then the database is added. If you already have a database with the same name cataloged on the local system, it will give an error message as shown below. You can work around it by specifying an alias. So instead of calling the database BLUDB, I used BLUDB2. The final step was to connect to DB2 with BLU Acceleration in the cloud. And surprise, it uses a fixpack version that officially is not available yet for download...

DB: => catalog tcpip node bluemix remote 50.97.xx.xxx server 50000
DB20000I The CATALOG TCPIP NODE command completed successfully.
DB21056W Directory changes may not be effective until the directory cache is
refreshed.
DB: => catalog db bludb at node bluemix
SQL1005N The database alias "bludb" already exists in either the local
database directory or system database directory.
DB: => catalog db bludb as bludb2 at node bluemix
DB20000I The CATALOG DATABASE command completed successfully.
DB21056W Directory changes may not be effective until the directory cache is
refreshed.
DB: => connect to bludb2 user blu01xxx
Enter current password for blu01xxx:

   Database Connection Information

Database server        = DB2/LINUXX8664 10.5.4
SQL authorization ID   = BLU01xxx
Local database alias   = BLUDB2

I will plan to develop a simple application using the DB2 in-memory database (BLU Acceleration / Analytics Warehouse) and then write about it. Until then read more about IBM Bluemix in my other related blog entries.

Tuesday, August 19, 2014

Sample IBM Bluemix Application: Extended IP logging with node.js and DB2

As I wrote in my last blog article I have been playing around with node.js and DB2 on IBM Bluemix. Now I am done with my first server-side JavaScript application which I named "db2visitors".

Screenshot of db2visitors index page

When accessing the main page (index page), the app logs the visitor's IP address. Then a free geo location service is called to obtain extended information, such as country, region and city information of the visitor. This is how Google Analytics and other services know more about a website user, with the exception that many have their own database. That extended IP record is then inserted into a DB2 table and the information displayed on a "DB2 got you" welcome page. To report back about from where the visitors to my demo application came, I created a second page that returns the aggregated country and region information for all visitors so far. How does it work? Let's look into the details.

First of all, I uploaded my sample code to my Github "db2visitors" repository, so that you can get an impression of the directory structure and files. It is pretty simple and you will notice that I am new to node.js. The application consists of a main file "app.js" that sets up the general infrastructure and defines the handling of so-called routes. A route basically determines what needs to be executed for a specific request to the Web server. When a user visits the main or index page, the node/Express framework would serve the request by calling my new function getIP(). That function is implemented in a file "db2access.js" in the "routes" directory. The current version as of now is shown below.

SQLDB Control Center with Table DDL

The function getIP obtains the visitor's IP address. Then it uses that information to provide it to an external IP address geo location service. That service returns a JSON structure with country, region, city and other data it has on file for the IP address or the related address block. Sometimes it nails it, (fortunately) in many cases only the country is correct. Anyway, it is a free service and you get an idea of Google and other advertising companies can do with your data. Once that extended record is available, we call into the function insertIP(). There, a DB2 connection is opened, a SQL statement prepared and executed to insert the IP address data into a relational table IP.VISITORS (schema shown on the right, it was created using the Bluemix SQLDB Control Center). Some form of basic IP logging. Last in the function getIP, a function render() is called with the parameter "index". It tells the rendering engine to process the HTML templates and page segments for the view "index". All the view-related files are in exactly that directory (see my Github repository).

The last function in the file db2access.js is listCountries(). It demonstrates some basic reporting, i.e., how to issue a DB2 query from node.js using the ibm_db API, fetch the result and pass it to a template page. That page is displayed when going to the path "/visits" (as defined in app.js). You can test it here. The query uses a simple GROUP BY to aggregate the country and region data.

That is all for today. Let me know if you have questions (comment, tweet or email) and read my other database on Bluemix-related blog entries.

1:  // Written by Henrik Loeser  
2:  // Some functions to retrieve extended information for a IP address,  
3:  // to insert that extended IP record into a DB2 table, and to  
4:  // report some basic statistics as part of a second HTML page.  
5:    
6:    
7:    
8:  // Insert the JSON record with the IP information as SQL record into DB2.  
9:  exports.insertIP = function(ibmdb,connString,ipinfo) {  
10:            console.log("insertIP called",ipinfo);      
11:            ibmdb.open(connString, function(err, conn) {  
12:               if (err ) {  
13:                res.send("error occurred " + err.message);  
14:               }  
15:               else {  
16:                // prepare the SQL statement  
17:                conn.prepare("INSERT INTO IP.VISITORS(vtime,ip,country_code,country,region_code,region,city,zip,latitude,longitude,metro,area) VALUES (current timestamp,?,?,?,?,?,?,?,?,?,?,?)", function(err, stmt) {  
18:                  if (err) {  
19:                   //could not prepare for some reason  
20:                   console.log(err);  
21:                   return conn.closeSync();  
22:                  }  
23:    
24:                  //Bind and Execute the statment asynchronously  
25:                  stmt.execute([ipinfo["ip"],ipinfo["country_code"],ipinfo["country_name"],ipinfo["region_code"],ipinfo["region_name"],ipinfo["city"],ipinfo["zipcode"], ipinfo["latitude"], ipinfo["longitude"],ipinfo["metro_code"],ipinfo["area_code"]], function (err, result) {  
26:                   console.log(err);  
27:                   // Close the connection to the database  
28:                   conn.close(function(){  
29:                    console.log("Connection Closed");  
30:                   });  
31:                  });  
32:                });  
33:              }  
34:          })};  
35:    
36:    
37:  // Get the caller's IP address from runtime environment and call  
38:  // geo location service to obtain extended data  
39:  exports.getIP=function(request,ibmdb,connString) {  
40:   return function(req, res) {  
41:   var ip = req.headers['x-client-ip'] || req.connection.remoteAddress;  
42:   var ipurl = 'http://freegeoip.net/json/' + ip;  
43:   console.log("yes, called");  
44:   // fetch ip info   
45:   request.get( {  
46:    url: ipurl,  
47:    json : true},  
48:    function(error, response, body) {  
49:       var ipinfo;  
50:      if (!error) {  
51:        ipinfo=body;  
52:       // insert IP info into DB2  
53:       exports.insertIP(ibmdb,connString,ipinfo);  
54:       // finish by rendering the HTML page  
55:       res.render('index',{ ipinfo : ipinfo});  
56:       }  
57:     });  
58:  }  
59:  };  
60:    
61:    
62:    
63:  // Very simple country/region-based reporting done using GROUP BY.  
64:  exports.listCountries = function(ibmdb,connString) {  
65:          return function(req, res) {  
66:                  
67:            ibmdb.open(connString, function(err, conn) {  
68:                  if (err ) {  
69:                   res.send("error occurred " + err.message);  
70:                  }  
71:                  else {  
72:                      conn.query("SELECT country, region, count(region) as rcount FROM ip.visitors group by country,region order by 3 desc", function(err, tables, moreResultSets) {  
73:        
74:                                    
75:                      if ( !err ) {   
76:                          res.render('tablelist', {  
77:                              "tablelist" : tables  
78:                                
79:                           });  
80:    
81:                            
82:                      } else {  
83:                        res.send("error occurred " + err.message);  
84:                      }  
85:    
86:                      /*  
87:                          Close the connection to the database  
88:                          param 1: The callback function to execute on completion of close function.  
89:                      */  
90:                      conn.close(function(){  
91:                          console.log("Connection Closed");  
92:                          });  
93:                      });  
94:                  }  
95:              } );  
96:              
97:          }  
98:      }

Monday, August 18, 2014

Accessing DB2 from node.js on IBM Bluemix and locally

Some days ago I started experimenting with node.js. Other than JSON and some click functions on webpages I don't have much experience with JavaScript. The reason for this "adventure" is that node.js is offered as one of several programming languages on IBM Bluemix, IBM's platform-as-a-service (PaaS). I wanted to find out how complex or easy it would be to bring both node.js and DB2 (IBM's relational and in-memory database system) together.

When I start with some new language I typically produce errors. Thus I wanted to avoid pushing my app to Bluemix all the time, but instead wanted to test it locally first. Hence I downloaded and installed a local node.js environment, including the so-called node.js package manager "npm" first. npm allows you to install additional modules/code libraries. They are placed into the directory "node_modules". Within the program (or script), the modules are included and referenced via the "require" statement:

var express = require('express');

The above binds the "ExpressJS Web Application Framework for node". That framework is part of the node.js starter application on IBM Bluemix. With that basic application which is offered for download you can easily test whether the local installation work ok:

node app.js

The command which I executed in a regular shell launches the node.js runtime with the sample application. Based on the configuration it provides a small web application available on my laptop on port 3000. Accessing "http://127.0.0.1:3000" in my web browser shows the demo page. All ok.

To combine node.js and DB2 I require the DB2 database driver:

var ibmdb = require('ibm_db');

Just running the application again would return an error because the module has not been installed. Hence my next step in the command shell is:

npm install ibm_db

This invokes the node package manager and instructs it to download and install the IBM database client driver and related node.js API. Waiting for a minute it returned an error because it couldn't find the file "sqlcli1.h". This is an indicator that my local DB2 was missing the application development environment. Running "db2setup" again (as root), selecting "work with existing" and marking the application development package for installation solved the issue. After db2setup finished, I ran "npm install ibm_db" again and it was able to download, build and install that module.

To test my small app both locally and on Bluemix, I needed to obtain user and DB2 instance information for either the local environment or the Bluemix SQLDB service (DB2). This is done with the following code snippet (not that beauty as I just started...):

1:  // get DB2 SQLDB service information  
2:  function findKey(obj,lookup) {  
3:    for (var i in obj) {  
4:     if (typeof(obj[i])==="object") {  
5:       if (i.toUpperCase().indexOf(lookup) > -1) {  
6:        // Found the key  
7:        return i;  
8:       }  
9:       findKey(obj[i],lookup);  
10:     }  
11:    }  
12:    return -1;  
13:  }  
14:  var env = null;  
15:  var key = -1;  
16:  var db2creds=null;  
17:  if (process.env.VCAP_SERVICES) {  
18:     env = JSON.parse(process.env.VCAP_SERVICES);  
19:     key = findKey(env,'SQLDB');  
20:  }  
21:  if (!env) {  
22:    console.log("We are local");  
23:    var file = __dirname + '/db2cred.json';  
24:    try {  
25:     db2creds = require(file);  
26:    } catch(err) {  
27:     return {};  
28:    }  
29:  //  db2creds = JSON.parse(fileJSON);  
30:    console.log(db2creds);    
31:  } else {  
32:  var db2creds = env[key][0].credentials;  
33:    
34:  }  
35:  var connString = "DRIVER={DB2};DATABASE=" + db2creds.db + ";UID=" + db2creds.username + ";PWD=" + db2creds.password + ";HOSTNAME=" + db2creds.hostname + ";port=" + db2creds.port;  
36:    
37:  app.get('/db2', routes.db2test(ibmdb,connString));  
38:    
39:

In the code I first search for the object with the Bluemix environment information. If it is not found the code assumes it is a local invocation. In that case the DB2 access information is loaded from the file "db2cred.json". It is a file I created in the application directory with a content like here:

Logo for my DB2 node.js app

{
    "hostname": "127.0.0.1",
    "host": "127.0.0.1",
    "port": 50000,
    "username": "hloeser",
    "password": "mytopsecretpassword",
    "db": "CLOUDDB"
}

The code uses the information about the hostname, port, and the user/password combination to create a connection string. That information together with the IBM Database Driver interface can be passed to a request handler in the node.js/Express runtime infrastructure (the "app.get()" call).

My small test application runs successfully both on my laptop as well as on IBM Bluemix. I plan to write more about it over the next days and to upload the code to my Github account. Bluemix-related posts can be accessed by this link.

Update: The follow-up article has been published here, showing geo IP lookup and logging into DB2.

Wednesday, August 13, 2014

Using some Workload Management for free in non-Advanced Editions of DB2

One of the new features of DB2 10.5 is BLU Acceleration. In introduces a couple of default Workload Management objects that are intended to control heavy queries running against column-organized tables. The objects are automatically created with every database, independent of the product edition. They are only enabled when DB2_WORKLOAD has been set to ANALYTICS before creating the database, i.e., a database for in-memory analytics is set up. But what is available for the regular guy like myself? What can be used for free and as foundation for some monitoring and understanding the system workload? Let's take a look.

Typically I use a DB2 Developer Edition which includes all features including WLM. So I removed the db2de license and organized (being IBMer has some benefits!) a Workgroup Server Edition (db2wse) which I added to the system using db2licm. I also turned on hard license enforcement, so that any attempts of using an unlicensend feature are directly blocked. Here is what db2licm returned thereafter:

mymachine> db2licm -l

Product name:                     "DB2 Workgroup Server Edition"
License type:                     "Authorized User Single Install"
Expiry date:                      "Permanent"
Product identifier:               "db2wse"
Version information:              "10.5"
Max amount of memory (GB):        "128"
Enforcement policy:               "Hard Stop"
Number of licensed authorized users: "25"
With that in place I created a new database named WLMTEST and connected to it. My first test was to create a workload object which should not be possible given my DB2 edition:

DB: WLMTEST => create workload freeride applname('xploit')
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL8029N A valid license key was not found for the requested functionality.
Reference numbers: "".

Ok, this look right. I don't have a license to use DB2 WLM (Workload Manager). My next query was intended to check what service classes are present in my system.

DB: WLMTEST => select varchar(serviceclassname,30), varchar(parentserviceclassname,30), enabled from syscat.serviceclasses

1                              2                              ENABLED
------------------------------ ------------------------------ -------
SYSDEFAULTSUBCLASS             SYSDEFAULTSYSTEMCLASS          Y
SYSDEFAULTSUBCLASS             SYSDEFAULTMAINTENANCECLASS     Y
SYSDEFAULTSUBCLASS             SYSDEFAULTUSERCLASS            Y
SYSDEFAULTMANAGEDSUBCLASS      SYSDEFAULTUSERCLASS            Y
SYSDEFAULTSYSTEMCLASS          -                              Y
SYSDEFAULTMAINTENANCECLASS     -                              Y
SYSDEFAULTUSERCLASS            -                              Y

7 record(s) selected.

The DB2 Knowledge Center has an overview of related default WLM objects and which parts can be modified with DBADM or WLMADM authority. Having the names of the system objects I tried my luck altering a work class set to reduce the cost barrier for the managed heavy queries (SYSMANAGEDQUERIES):

DB: WLMTEST => alter work class set sysdefaultuserwcs alter work class SYSMANAGEDQUERIES for timeroncost from 1000
DB20000I The SQL command completed successfully.

The threshold SYSDEFAULTCONCURRENT defines how many of those queries can run concurrently in the system. Why not change that threshold definition?

DB: WLMTEST => alter threshold SYSDEFAULTCONCURRENT when sqlrowsreturned > 20 stop execution
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL4721N The threshold "SYSDEFAULTCONCURRENT" cannot be created or altered
(reason code = "7"). SQLSTATE=5U037

Well, it seems that you cannot modify the entire threshold to your liking. However, following the documentation on what can be done, I successfully reduced the number of parallel activities.

DB: WLMTEST => alter threshold SYSDEFAULTCONCURRENT when CONCURRENTDBCOORDACTIVITIES > 3 stop execution
DB20000I The SQL command completed successfully.

To test the impact of my changes, I opened 4 different shells, connected to DB2 in each window, and more or less simultaneously executed the following query:

select * from syscat.tables,syscat.columns

I have to restate that I tried to execute it in all four windows. It only ran in three of them. Why? Because the threshold kicked in for this heavy query and stopped the execution for the 4th session ("concurrentdbcoordactivities> 3 stop execution"). So some basic workload management seems to work even without a license.

Can I change the threshold to force the application off, i.e., to not allow running the query?

DB: WLMTEST => alter threshold SYSDEFAULTCONCURRENT when CONCURRENTDBCOORDACTIVITIES > 2 force application
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL4721N The threshold "SYSDEFAULTCONCURRENT" cannot be created or altered
(reason code = "13"). SQLSTATE=5U037
No, changing the entire definition of the threshold is not possible, but at least parts of it can be modified. You can then use the adapted default WLM objects to better understand what work is running on your system, e.g., testing what would fall into the category of "heavy queries". As a last step, I used a monitoring function to return the CPU time spent by service subclass. Most was in the managed subclass into which my queries from above were mapped:

DB: WLMTEST => SELECT varchar(service_superclass_name,30) as service_superclass, varchar(service_subclass_name,30) as service_subclass, sum(total_cpu_time) as total_cpu, sum(app_rqsts_completed_total) as total_rqsts FROM TABLE(MON_GET_SERVICE_SUBCLASS('','',-2)) AS t GROUP BY service_superclass_name, service_subclass_name ORDER BY total_cpu desc

SERVICE_SUPERCLASS             SERVICE_SUBCLASS               TOTAL_CPU            TOTAL_RQSTS
------------------------------ ------------------------------ -------------------- --------------------
SYSDEFAULTUSERCLASS            SYSDEFAULTSUBCLASS                          1207794                  552
SYSDEFAULTUSERCLASS            SYSDEFAULTMANAGEDSUBCLASS                    852547                    0
SYSDEFAULTMAINTENANCECLASS     SYSDEFAULTSUBCLASS                           466436                 1374
SYSDEFAULTSYSTEMCLASS          SYSDEFAULTSUBCLASS                                0                    0

4 record(s) selected.

With that I leave more testing to you. Happy monitoring!

BTW: The same tests can also be done on the SQL DB service on IBM Bluemix. That service is a DB2 Enterprise Server Edition.

Pages