|
Architecture: GitHub Traffic Analytics |
In a new
solution
tutorial, I show you how to automatically retrieve and store GitHub
traffic data the serverless way with IBM Cloud Functions and Db2. The data can then be analyzed via a Web app deployed to Cloud Foundry on
IBM Cloud.
The app is secured with App ID using OpenID Connect. The new service
Dynamic Dashboard Embedded provides visualization of the views and
clones of GitHub repositories.
Tutorial overview
Many
of my open source projects are hosted on GitHub. In the "Insights"
section of each repository, I can see statistics on views and clones of
my repositories. This is great, but GitHub only provides access to the
traffic data for the last 14 days. If you want to analyze statistics
over a longer period of time, you need to download and store that data
yourself. In
this new tutorial,
you deploy a serverless action to retrieve the traffic data and store
it in a SQL database. Moreover, a Cloud Foundry app is used to manage
repositories and provide access to the statistics for data analytics.
The app and the serverless action discussed in the tutorial implement a
multi-tenant-ready solution with the initial feature set supporting
single-tenant mode.
The code is available on GitHub.
Automated, scheduled data retrieval
To automatically retrieve the GitHub traffic data and merge it into the Db2 database,
IBM Cloud Functions provides a built-in alarms package.
It allows to fire triggers on a regular basis and supports cron-like
syntax. An action written in Python makes the necessary GitHub API calls
on a weekly basis, thereby retrieves the data and merges it into the
database. The necessary access token that authorizes the Python action
as well as repository metadata is managed in the same database. The
database could be used by multiple tenants, i.e., different users with
their own set of GitHub repositories and access token.
|
Raw data as table |
Web-based data analytics
A Python Flask app
provides access to the traffic statistics. It also allows to add or
delete repositories. The app is protected via App ID service. The
tutorial uses the Cloud Directory with the users managed by App ID.
However, social logins (Google, Facebook) could be easily used, too.
From the app, an OpenID Connect module interacts with App ID as
authentication provider. After the successful login process, users have
access to their role-specific functionality only. Administrators cannot
access data and a notion of tenant and tenant-viewer (read-only) is
supported.
The web client uses the jQuery plugin DataTables to
display and filter the raw data. For data visualization, the client
embeds dashboards of the new IBM Cloud service
Dynamic Dashboard Embedded. Depending on the mode, either a so-called
canned dashboard with pre-defined visual elements is shown or users can
assemble their own dashboard from a set of given visualizations. The
newly defined dashboards can be exported and could be used in future
sessions.
|
Dynamic dashboard to visualize Db2 data |
Conclusions
Combining serverless
and Cloud Foundry, it is possible to overcome the limited availability
of GitHub traffic statistics. The data is automatically downloaded and
stored in a database. A web app provides access to the data and allows
analytics either by filtering on a data tables or utilizing dashboards
with data visualizations.
Read all the details in the tutorial and check out the
code provided in this repository.
If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (
@data_henrik) or
LinkedIn.