Visualising database #1 by makaroni4
Let me dilute all our programming posts and questions with article mostly related to startups. On example of simple application we will depict some of the project's KPIs with Google Charts and some tricky ActiveRecord queries.

Intro
Almost every project (including not only web ones) has its own database (and is not necessarily SQL or noSQL, it could be any notebook, whatever). And since we want to run our projects in lean way we need to measure a lot of metrics along the way and I think most of them could be built with this database data.
Example application
Imagine that we have a simple application where user write comments. So we have
class User < ActiveRecord::Base
has_many :comments
end
class Comment < ActiveRecord::Base
belongs_to :user
end
KPIs
Key Performance Indicators, also known as KPI or Key Success Indicators (KSI), help an organization define and measure progress toward organizational goals.
Let's consider our simple application successful if we have a lot of user accounts and huge part of them are active. And by "active" I mean that users write a lot of comments. So our KPIs are:
- user registration per period of time
- user activity (comments per user)
Measurements
User registrations
To come up with user registration numbers we need first to choose period of time and after it group user records in our database by created_at field. Sounds simple but we can do it by many ways. So let's obtaion our daily registration statisticts:
User.where(:created_at => 20.days.ago..Time.now).count(:group => 'DATE(created_at)')
# SELECT COUNT(*) AS count_all, DATE(created_at) AS date_created_at FROM `users` WHERE (`users`.`created_at` BETWEEN '2013-01-15 20:57:24' AND '2013-02-04 20:57:24') GROUP BY DATE(created_at)
# => {Tue, 15 Jan 2013=>123, Wed, 16 Jan 2013=>4519, Mon, 21 Jan 2013=>2, Fri, 25 Jan 2013=>1, Tue, 29 Jan 2013=>1, Thu, 31 Jan 2013=>1, Fri, 01 Feb 2013=>1, Mon, 04 Feb 2013=>1}
In the next chapter we will visualise this data, but for now lets think what can we do more. Sometimes daily statistics is too approximate so lets use other #sql time functions to get more interesting data. First lets take a look at monthly statistics:
User.where(:created_at => Time.now.beginning_of_year..Time.now.end_of_year).count(:group => 'MONTH(created_at)')
# => {1=>1006, 12=>1}
NOTE! That MONTH returns number from 1 to 12 so if we don't specify range of current year we can pollute our data with users from past year.
But what if we want distribution for each month for all years? To solve this task we can run previous query for each year and then collect the results but easy way to handle this is to use DATE_FORMAT() function:
User.count(:group => 'DATE_FORMAT(created_at, "%Y-%m")')
# "2011-08"=>1433, "2011-09"=>2972, "2011-10"=>2480, "2011-11"=>2456, "2011-12"=>2832, "2012-01"=>4002, "2012-02"=>3820, "2012-03"=>4211, "2012-04"=>4023, "2012-05"=>4302, "2012-06"=>2876, "2012-07"=>2629, "2012-08"=>2366, "2012-09"=>2087, "2012-10"=>2424, "2012-11"=>2092, "2012-12"=>1938, "2013-01"=>1006}
Also we can build distribution by day of week using DAY() function, HOUR() to find out the part of day people use website and more. Take a look at list of all MySQL date and time functions here: http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html
User activity
First lets find out how many users have not left any comment:
User.count
# => 64485
User.includes(:comments).where(comments: { user_id: nil }).count
# SELECT COUNT(DISTINCT `users`.`id`) FROM `users` LEFT OUTER JOIN `comments` ON `comments`.`user_id` = `users`.`id` WHERE `comments`.`user_id` IS NULL
# => 62924
Now let's find out a distribution of number of comments per user:
Comment.count(group: :user_id)
# => {1=>7, 2=>914, 20=>155, 22=>1, 24=>5, 25=>61, 26=>37, 27=>162, 28=>942, 31=>3, 32=>204, 34=>1, 35=>10, 36=>28 .. 67617=>1, 67726=>1}
ActiveRecord::Base.connection.execute('SELECT comments_count, COUNT(*) comments_count FROM (SELECT user_id, COUNT(*) comments_count FROM comments AS comments GROUP BY user_id) AS comments_with_count GROUP BY comments_count')
# => [[1, 726], [2, 227], [3, 129], [4, 73] .. [914, 1], [943, 1], [1333, 1], [1450, 1]]
As you see in the first query we have a hash of user ids and values of comments count. The second query groups these counts and aggregates them so we see that 726 users left one comment, 227 users left 2 comments and so on.
Visualisation
Now lets visualize all these data using Google Charts. We will need three types of charts for each statistics:
- line chart to depict user registrations vs time
- pie chart to display portion of users that left comments
- column chart to depict distribution of comments counts
To convert result hash from the first query we just need to call Hash#to_a method and pass this data to arrayToDataTable method in Google Chart:
<div id="chart_div" style="width: 900px; height: 500px;"></div>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['Month', 'Registrations'],
["2011-08", 1433], ["2011-09", 2972], ["2011-10", 2480], ["2011-11", 2456], ["2011-12", 2832], ["2012-01", 4002], ["2012-02", 3820], ["2012-03", 4211], ["2012-04", 4023], ["2012-05", 4302], ["2012-06", 2876], ["2012-07", 2629], ["2012-08", 2366], ["2012-09", 2087], ["2012-10", 2424], ["2012-11", 2092], ["2012-12", 1938], ["2013-01", 1006]]
]);
var options = {
title: 'User registrations'
};
var chart = new google.visualization.LineChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>

If there is no registration in some month you need to interpolate null values like these:
var options = {
interpolateNulls: true
}
Pie charts for users with comments will look like this:
<div id="chart_div" style="width: 900px; height: 500px;"></div>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['Kind of users', 'Number'],
['Without comments', 1561],
['With comments', 62924]
]);
var options = {
title: 'Users'
};
var chart = new google.visualization.PieChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>

And our final distribution of comments counts:
<div id="chart_div" style="width: 900px; height: 500px;"></div>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['Amount of comments', 'Number of users'],
[1, 726], [2, 227], [3, 129], [4, 73], [5, 52], [6, 33], [7, 34], [8, 24], [9, 13], [10, 14], [11, 13], [12, 11], [13, 10], [14, 10], [15, 9], [16, 3], [17, 8], [18, 5], [19, 2], [20, 6], [21, 4], [22, 1], [23, 3], [24, 5], [25, 4]
]);
var options = {
title: 'Comment count distribution',
vAxis: {title: 'Number of users'},
hAxis: {title: 'Amount of comments'}
};
var chart = new google.visualization.ColumnChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
</script>

As you see it is quite easy to get and visualize project statistics, I encourage you to do this and make your projects even better!
Comments
SeTeM commented 4 months ago
Thanks for Google Charts!
kirs commented 4 months ago
It's one hundred times simpler than it's described: https://github.com/saulabs/reportable
And anyway, if you write
function drawChart()it's better to wrap it in some class, for instance UserChartDrawer. Just imagine what can happen if you have a plenty ofdrawChartfunctions in your project :)makaroni4 commented 4 months ago
@kirs it is one hundred times more important to be able to construct statistics queries yourself, but I agree
gem 'reportable'seems nice.mulderp commented 3 months ago
great post! Especially interesting how you apply Ruby array to select ranges in SQL queries! Looking forward to learn some more tricks here.
asterite commented 3 months ago
Nice post! A simpler way to make the queries generate google charts is to use the rgviz and rgviz-rails gems: https://github.com/asterite/rgviz-rails/wiki/showing-a-visualization-in-a-view
makaroni4 commented 3 months ago
@asterite thanks! rgviz looks really good!
mulderp commented 3 months ago
I was using Morris for visualization and had to transform your collections a bit, I wrote a post here on my experiments: http://thinkingonthinking.com/map-reduce-in-ruby/
makaroni4 commented 3 months ago
I tried Morris either but when you have more than 5 lines on the plot Morris gets messy with its labels - it does't looks good for my case. For less lines Morris is great I think