Data Henrik: couchDB

Showing posts with label couchDB. Show all posts

Monday, July 20, 2015

Bluemix: Simple cron-like service for my Python code

Schedule background tasks in Bluemix

This morning I had some time to fiddle around with Bluemix and Python. I wanted to test out creating a cron-like service, i.e., something that runs in the background and kicks off tasks with given intervals or at a given time. For Python there exists a package "schedule" that is easy to set up and use. So how do you create a background service in Bluemix?

The trick is in the file manifest.yml and the Cloudfoundry documentation has all the needed details (Bluemix is built on this open standard). The attribute "no-route" is set to true, indicating that this is not a Web application and we don't need a subdomain name. In addition the attribute "command" is set to invoke the Python interpreter with my script as parameter. Basically, this starts my background task:

 applications:  
 - name: hltimer  
  memory: 256M  
  instances: 1  
  no-route: true  
  command: python mytimer.py  
  path: .

The cron-like service script is pretty simple. It uses the "schedule" package to set up recurring jobs. I tested it with the Twilio API to send me SMS at given timestamps. You can also use it to scrape webpages in given intervals, kick off feed aggregators, read out sensors and update databases like Cloudant or DB2, and more. See this Github repository for the full source.

 import schedule  
 import time 

 def job():  
 #put the task to execute here  

 def anotherJob():  
 #another task can be defined here  

 schedule.every(10).minutes.do(job)  
 schedule.every().day.at("10:30").do(anotherJob)  

while True:  
   schedule.run_pending()  
   time.sleep(1)

Monday, November 17, 2014

A quick look at dashDB and a happy SQuirreL

dashDB slogan on its website

This morning I took some time to take a look at dashDB, a new IBM DWaaS (Data Warehouse as a Service) offering. When you go to the dashDB website, you are offered two choices: Use the dashDB service available on IBM Bluemix or use a Cloudant account to add a warehouse to your JSON database. Let me give you a brief overview of what you can do with dashDB and how I connected a local (open source) SQuirreL SQL client to my new dashDB database.

Cloudant Warehousing (dashDB)

dashDB is a cloud-based analytics database ("analytics in a dash")) with roots in Netezza and DB2 with BLU Acceleration. Data is stored in table (rows and columns) format. It is ready to connect all kinds analytic tools, local or cloud-based, and is already set up for geo-spatial data analysis (instructions on how to use the ESRI ArcGIS Desktop are provided). The best is that your regular SQL database/analytic tools continue to work, see below for details.

dashDB: schema discovery

I started my journey by logging into my existing Cloudant account. There, on the dashboard menu is a new item "Warehousing". When clicking on the "New Warehouse" button, you can select the Cloudant databases that you want to import into the warehouse. Because multiple databases can be associated with a Cloudant account or a Bluemix Cloudant service, this step let's you pick the data of choice. After the source data is chosen, the dashDB database is created and so-called schema discovery turns the JSON documents into rows of tables. Thereafter, the data is ready to have analytics applied. That is the time to launch the dashDB control center, another so-called "dashboard".

The welcome screen shows some of the analytic options, e.g., the database is ready to be used with either Cognos, SPSS, InfoSphere DataStage, R scripts, or all of them and more:

Analytis for dashDB: Cognos, SPSS, DataStage, R

SQuirrel SQL client - dashDB connected

Because some time ago I already tested and blogged about a predecessor of dashDB (see here: how to set it up and how to use R), I was more interested in trying out a JDBC-based client with my new cloud-based data warehouse. Included as part of the dashboard are several sections that help you with the application setup. So it was easy for me to obtain the JDBC URL and configure it and the listed userid/password in my local SQuirrel SQL client (it will work in IBM Data Studio and the Optim tool, too). As you can see from the screenshot, the database connection from my laptop to the cloud-based dashDB succeeded. Ready for some SQL.

My lessons learned from testing database queries on the converted data (JSON to relational) will be part of another blog entry. Stay tuned...

Monday, August 25, 2014

Setting up and using a DB2 in-memory database on IBM Bluemix

[Update 2014-11-04: The Analytics Warehouse service on Bluemix is now called dashDB.]
Last Friday I was on the way back from some customer visits. While traveling in a German highspeed train I used the Wifi service, connected to IBM Bluemix and created a DB2 in-memory database. Let me show you how I set it up, what you can do with it and how I am connecting to the cloud-based database from my laptop.

Sitting in #train with 300 km/h and creating #DB2 in-memory #database with #Bluemix http://t.co/OzJqLzNDFr
— Henrik Loeser (@data_henrik) August 22, 2014

Unbound DB2 service on Bluemix

The first thing to know is that on Bluemix the DB2 in-memory database service is called IBM Analytics Warehouse. To create a database, you select "Add service" and leave it unbound if you want, i.e., it is not directly associated with any Bluemix application. That is ok because at this time we are only interested in the database. Once the service is added and the database itself created, you can lauch the administration console.

The console supports several administration and development tasks as show in the picture. It includes loading data, to develop analytic scripts in R, to execute queries and link the data with Microsoft Excel for processing in a spreadsheet, and it has a section to connect external tools or applications to the database.

Administration/development task in DB2 BLU console on Bluemix

One of the offered task is very interesting and I twittered about it on Friday, too:

Great! You can even import #JSON #data from @Cloudant into #DB2 in-memory db service in #Bluemix pic.twitter.com/BoLS0YM7nH
— Henrik Loeser (@data_henrik) August 22, 2014

You can set up replication from a Cloudant JSON database to DB2, so that the data stream is directly fed in for in-memory analyses. I didn't test it so far, but plan to do so with one of my other Bluemix projects.

A task that I used is to (up)load data. For this I took some historic weather data (planning ahead for a vacation location), let the load wizard extract the metadata to create a suitable data, and ran some queries.

Uploading data to DB2 on Bluemix

Specify new DB2 table and column names

For executing (simple) selects there is a "Run Query" dialogue. It allows to choose a table and columns and then generates a basic query skeleton. I looked into whether a specific German island had warm nights, i.e., a daily minimum temperature of over 20 degrees Celsius. Only 14 days out of several decades and thousands of data points qualified.

Last but not least, I connected my local DB2 installation and tools to the Bluemix/Softlayer-based instance. The "CATALOG TCPIP NODE" is needed t make the remote server and communication port known. Then the database is added. If you already have a database with the same name cataloged on the local system, it will give an error message as shown below. You can work around it by specifying an alias. So instead of calling the database BLUDB, I used BLUDB2. The final step was to connect to DB2 with BLU Acceleration in the cloud. And surprise, it uses a fixpack version that officially is not available yet for download...

DB: => catalog tcpip node bluemix remote 50.97.xx.xxx server 50000
DB20000I The CATALOG TCPIP NODE command completed successfully.
DB21056W Directory changes may not be effective until the directory cache is
refreshed.
DB: => catalog db bludb at node bluemix
SQL1005N The database alias "bludb" already exists in either the local
database directory or system database directory.
DB: => catalog db bludb as bludb2 at node bluemix
DB20000I The CATALOG DATABASE command completed successfully.
DB21056W Directory changes may not be effective until the directory cache is
refreshed.
DB: => connect to bludb2 user blu01xxx
Enter current password for blu01xxx:

   Database Connection Information

Database server        = DB2/LINUXX8664 10.5.4
SQL authorization ID   = BLU01xxx
Local database alias   = BLUDB2

I will plan to develop a simple application using the DB2 in-memory database (BLU Acceleration / Analytics Warehouse) and then write about it. Until then read more about IBM Bluemix in my other related blog entries.

Wednesday, July 2, 2014

Nice Cloud, no rain: Using Cloudant/couchDB with Python on Bluemix

My last two blog entries were about getting started with Python on IBM Bluemix and how to use a custom domain with my Bluemix weather application. Today I am going to show how I added Cloudant and couchDB to my application, both locally and on Bluemix.

Storing the weather data locally doesn't make sense because I can query much more historical data on OpenWeatherMap. So I am going to use a database to log information about for which city and when the data was requested. That information, in aggregated form, could then be reported as fun fact to each user of the app. I chose Cloudant because it is simple to use, adequate for the intended purpose, has free usage plans on Bluemix, and I can use it and test locally as couchDB.

Add Cloudant as new service

The code itself is relatively simple and I put comments (shown at the end of the article). The interesting part is how to add a Cloudant service to my application on Bluemix, how to bind them in the application, and the preparation work for the database itself. So let's take a look at those steps.

Cloudant is offered as one of several services in the "Data Management" category on Bluemix. While on the Dashboard you simply click on the "Add a service" button as show on the right. Navigate to the Data Management section and choose Cloudant.

It will bring up a screen showing information about the service itself, on usage terms, and on the right side of it a dialog "Add Service" for adding the service to your account. Here you can already bind the new database service to your application by selecting an existing application from a dropdown list. I did that and gave my new Cloudant service the name "cloudantWeather" as shown:

Bind Cloudant to
your application

Once the service is added you can bring up the Cloudant administration interface. I have used Cloudant and couchDB before, so that isn't anything new. To avoid dealing with creation of a database as part of the actual program I decided to create a "weather" database through the administration interface for the hosted Cloudant and my local couchDB servers. An interesting but not too tricky part is how to access both servers depending on where the application is running. Information with the username, password, server address and other details is provided in an environment variable VCAP_SERVICES when run on Bluemix. Thus, in the program I am testing for the presence of that variable and then either retrieve the server information from it or access my local couchDB:

#get service information if on Bluemix  
 if 'VCAP_SERVICES' in os.environ:  
   couchInfo = json.loads(os.environ['VCAP_SERVICES'])['cloudantNoSQLDB'][0]  
   couchServer = couchInfo["credentials"]["url"]  
   couch = couchdb.Server(couchServer)  
 #we are local  
 else:  
   couchServer = "http://127.0.0.1:5984"  
   couch = couchdb.Server(couchServer)

Storing new documents is simple and is shown in the full code listing. For the queries I am using the MapReduce feature of couchDB. In a "map" function I return the city name (and just the integer value 1), in the reduce function I am aggregating (summing up) the values by city. Both functions could be defined in the Python script and then passed into Cloudant as part of the query or predefined for more performance. I chose the latter one. So I created a so-called "secondary index" in my Cloudant database, it is called "view" in my couchDB. They are stored as part of a "design document" (shown is Cloudant):

Secondary index / permanent view

With that I finish my Python application, add some calls to the couchDB Python API (which I needed to add to the file "requirements.txt" as dependency) and test it locally. The final step is to deploy the application to Bluemix using the Cloud Foundry tool "cf push". Done, seems to work:

Bluemix weather app with Cloudant stats

Last but not least, here is the code I used for my little app:

 import os  
 from flask import Flask,redirect  
 import urllib  
 import datetime  
 import json  
 import couchdb  
   
 BASE_URL = "http://api.openweathermap.org/data/2.5/weather?q="  
 BASE_URL_fc ="http://api.openweathermap.org/data/2.5/forecast/daily?cnt=1&q="  
 app = Flask(__name__)  
   
 # couchDB/Cloudant-related global variables  
 couchInfo=''  
 couchServer=''  
 couch=''  
   
 #get service information if on Bluemix  
 if 'VCAP_SERVICES' in os.environ:  
   couchInfo = json.loads(os.environ['VCAP_SERVICES'])['cloudantNoSQLDB'][0]  
   couchServer = couchInfo["credentials"]["url"]  
   couch = couchdb.Server(couchServer)  
 #we are local  
 else:  
   couchServer = "http://127.0.0.1:5984"  
   couch = couchdb.Server(couchServer)  
   
 # access the database which was created separately  
 db = couch['weather']  
   
 @app.route('/')  
 def index():  
   return redirect('/weather/Friedrichshafen')  
   
 @app.route('/weather/<city>')  
 def weather(city):  
   # log city into couchDB/Cloudant  
   # basic doc structure  
   doc= { "type" : "city",  
     "c_by" : "bm",  
   }  
   # we store the city and the current timestamp  
   doc["city"]=city  
   doc["timestamp"]=str(datetime.datetime.utcnow())  
   # and store the document  
   db.save (doc)  
   
   # Time to grab the weather data and to create the resulting Web page  
   # build URIs and query current weather data and forecast  
   # JSON data needs to be converted  
   url = "%s/%s" % (BASE_URL, city)  
   wdata = json.load(urllib.urlopen(url))  
   url_fc = "%s/%s" % (BASE_URL_fc, city)  
   wdata_fc = json.load(urllib.urlopen(url_fc))  
   
   # build up result page  
   page='<title>current weather for '+wdata["name"]+'</title>'  
   page +='<h1>Current weather for '+wdata["name"]+' ('+wdata["sys"]["country"]+')</h1>'  
   page += '<br/>Min Temp. '+str(wdata["main"]["temp_min"]-273.15)  
   page += '<br/>Max Temp. '+str(wdata["main"]["temp_max"]-273.15)  
   page += '<br/>Current Temp. '+str(wdata["main"]["temp"]-273.15)+'<br/>'  
   page += '<br/>Weather: '+wdata["weather"][0]["description"]+'<br/>'  
   page += '<br/><br/>'  
   page += '<h2>Forecast</h2>'  
   page += 'Temperatures'  
   page += '<br/>Min: '+str(wdata_fc["list"][0]["temp"]["min"]-273.15)  
   page += '<br/>Max: '+str(wdata_fc["list"][0]["temp"]["max"]-273.15)  
   page += '<br/>Morning: '+str(wdata_fc["list"][0]["temp"]["morn"]-273.15)  
   page += '<br/>Evening: '+str(wdata_fc["list"][0]["temp"]["eve"]-273.15)  
   page += '<br/><br/>Weather: '+wdata_fc["list"][0]["weather"][0]["description"]  
   page += '<br/><br/>'  
   
   # Gather information from database about which city was requested how many times  
   page += '<h3>Requests so far</h3>'  
   # We use an already created view  
   for row in db.view('weatherQueries/cityCount',group=True):  
    page += row.key+': '+str(row.value)+'<br/>'  
   
   # finish the page structure and return it  
   page += '<br/><br/>Data by <a href="http://openweathermap.org/">OpenWeatherMap</a>'  
   return page  
   
 port = os.getenv('VCAP_APP_PORT', '5000')  
 if __name__ == "__main__":  
      app.run(host='0.0.0.0', port=int(port))

Pages