Learn to write SQL |
SQL for working with data
SQL, the Structured Query Language, is used by most database systems (Db2, MySQL, Oracle, PostgreSQL, SQL Server, SQLite, ...) as query language. Moreover, it can be directly applied to data, using programming libraries to in-memory data or something like the SQL Query service to analyze data managed with Cloud Object Storage.
Plot with COVID-19 data for Germany, showing rolling average |
SQL Analytics
Most of us know how to write a simple SELECT ... FROM ... WHERE statement. But how do you easily put data from different days side by side? How do you compute rolling averages to smooth out reported values, e.g., to account for "administrative weekends"? See the above plot for raw vs. smoothed data for Germany. How do you make sure that all countries are represented, even if they did not report at the latest available date (MAX(date))? See the table below with mortality rates computed today based on the latest available data, not the current date.
The answer is to apply SQL window functions and to utilize common table expressions. They allow to partition and order data, then apply additional functions on top. I have created a public gist on GitHub with sample SQL statements to dig into COVID-19 data. It is too long to embed it here. Feel free to comment there or add to it. The two screenshots in this post are based on data processed with such SQL statements.
Mortality rate per 100,000 population |
If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn.