Let me dilute all our programming posts and questions with article mostly related to startups. On example of simple application we will depict some of the project's KPIs with Google Charts and some tricky ActiveRecord queries.

visualising database

Intro

Almost every project (including not only web ones) has its own database (and is not necessarily SQL or noSQL, it could be any notebook, whatever). And since we want to run our projects in lean way we need to measure a lot of metrics along the way and I think most of them could be built with this database data.

Example application

Imagine that we have a simple application where user write comments. So we have

class User < ActiveRecord::Base
  has_many :comments
end

class Comment < ActiveRecord::Base
  belongs_to :user
end

KPIs

Key Performance Indicators, also known as KPI or Key Success Indicators (KSI), help an organization define and measure progress toward organizational goals.

Let's consider our simple application successful if we have a lot of user accounts and huge part of them are active. And by "active" I mean that users write a lot of comments. So our KPIs are:

  • user registration per period of time
  • user activity (comments per user)

Measurements

User registrations

To come up with user registration numbers we need first to choose period of time and after it group user records in our database by created_at field. Sounds simple but we can do it by many ways. So let's obtaion our daily registration statisticts:

User.where(:created_at => 20.days.ago..Time.now).count(:group => 'DATE(created_at)')

# SELECT COUNT(*) AS count_all, DATE(created_at) AS date_created_at FROM `users` WHERE (`users`.`created_at` BETWEEN '2013-01-15 20:57:24' AND '2013-02-04 20:57:24') GROUP BY DATE(created_at)

# => {Tue, 15 Jan 2013=>123, Wed, 16 Jan 2013=>4519, Mon, 21 Jan 2013=>2, Fri, 25 Jan 2013=>1, Tue, 29 Jan 2013=>1, Thu, 31 Jan 2013=>1, Fri, 01 Feb 2013=>1, Mon, 04 Feb 2013=>1}

In the next chapter we will visualise this data, but for now lets think what can we do more. Sometimes daily statistics is too approximate so lets use other #sql time functions to get more interesting data. First lets take a look at monthly statistics:

User.where(:created_at => Time.now.beginning_of_year..Time.now.end_of_year).count(:group => 'MONTH(created_at)')

# => {1=>1006, 12=>1}

NOTE! That MONTH returns number from 1 to 12 so if we don't specify range of current year we can pollute our data with users from past year.

But what if we want distribution for each month for all years? To solve this task we can run previous query for each year and then collect the results but easy way to handle this is to use DATE_FORMAT() function:

User.count(:group => 'DATE_FORMAT(created_at, "%Y-%m")')

# "2011-08"=>1433, "2011-09"=>2972, "2011-10"=>2480, "2011-11"=>2456, "2011-12"=>2832, "2012-01"=>4002, "2012-02"=>3820, "2012-03"=>4211, "2012-04"=>4023, "2012-05"=>4302, "2012-06"=>2876, "2012-07"=>2629, "2012-08"=>2366, "2012-09"=>2087, "2012-10"=>2424, "2012-11"=>2092, "2012-12"=>1938, "2013-01"=>1006}

Also we can build distribution by day of week using DAY() function, HOUR() to find out the part of day people use website and more. Take a look at list of all MySQL date and time functions here: http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html

User activity

First lets find out how many users have not left any comment:

User.count
# => 64485

User.includes(:comments).where(comments: { user_id: nil }).count
# SELECT COUNT(DISTINCT `users`.`id`) FROM `users` LEFT OUTER JOIN `comments` ON `comments`.`user_id` = `users`.`id` WHERE `comments`.`user_id` IS NULL
# => 62924 

Now let's find out a distribution of number of comments per user:

Comment.count(group: :user_id)
# => {1=>7, 2=>914, 20=>155, 22=>1, 24=>5, 25=>61, 26=>37, 27=>162, 28=>942, 31=>3, 32=>204, 34=>1, 35=>10, 36=>28 .. 67617=>1, 67726=>1}

ActiveRecord::Base.connection.execute('SELECT comments_count, COUNT(*) comments_count FROM (SELECT user_id, COUNT(*) comments_count FROM comments AS comments GROUP BY user_id) AS comments_with_count GROUP BY comments_count')

# => [[1, 726], [2, 227], [3, 129], [4, 73] .. [914, 1], [943, 1], [1333, 1], [1450, 1]]

As you see in the first query we have a hash of user ids and values of comments count. The second query groups these counts and aggregates them so we see that 726 users left one comment, 227 users left 2 comments and so on.

Visualisation

Now lets visualize all these data using Google Charts. We will need three types of charts for each statistics:

  • line chart to depict user registrations vs time
  • pie chart to display portion of users that left comments
  • column chart to depict distribution of comments counts

To convert result hash from the first query we just need to call Hash#to_a method and pass this data to arrayToDataTable method in Google Chart:

<div id="chart_div" style="width: 900px; height: 500px;"></div>

<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
  google.load("visualization", "1", {packages:["corechart"]});
  google.setOnLoadCallback(drawChart);
  function drawChart() {
    var data = google.visualization.arrayToDataTable([
      ['Month', 'Registrations'],
      ["2011-08", 1433], ["2011-09", 2972], ["2011-10", 2480], ["2011-11", 2456], ["2011-12", 2832], ["2012-01", 4002], ["2012-02", 3820], ["2012-03", 4211], ["2012-04", 4023], ["2012-05", 4302], ["2012-06", 2876], ["2012-07", 2629], ["2012-08", 2366], ["2012-09", 2087], ["2012-10", 2424], ["2012-11", 2092], ["2012-12", 1938], ["2013-01", 1006]]
    ]);

    var options = {
      title: 'User registrations'
    };

    var chart = new google.visualization.LineChart(document.getElementById('chart_div'));
    chart.draw(data, options);
  }
</script>

line chart

If there is no registration in some month you need to interpolate null values like these:

var options = {
  interpolateNulls: true
}

Pie charts for users with comments will look like this:

<div id="chart_div" style="width: 900px; height: 500px;"></div>

<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
  google.load("visualization", "1", {packages:["corechart"]});
  google.setOnLoadCallback(drawChart);
  function drawChart() {
    var data = google.visualization.arrayToDataTable([
      ['Kind of users', 'Number'],
      ['Without comments', 1561],
      ['With comments', 62924]
    ]);

    var options = {
      title: 'Users'
    };

    var chart = new google.visualization.PieChart(document.getElementById('chart_div'));
    chart.draw(data, options);
  }
</script>

pie chart

And our final distribution of comments counts:

<div id="chart_div" style="width: 900px; height: 500px;"></div>

<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
  google.load("visualization", "1", {packages:["corechart"]});
  google.setOnLoadCallback(drawChart);
  function drawChart() {
    var data = google.visualization.arrayToDataTable([
      ['Amount of comments', 'Number of users'],
      [1, 726], [2, 227], [3, 129], [4, 73], [5, 52], [6, 33], [7, 34], [8, 24], [9, 13], [10, 14], [11, 13], [12, 11], [13, 10], [14, 10], [15, 9], [16, 3], [17, 8], [18, 5], [19, 2], [20, 6], [21, 4], [22, 1], [23, 3], [24, 5], [25, 4]
    ]);

    var options = {
      title: 'Comment count distribution',
      vAxis: {title: 'Number of users'},
      hAxis: {title: 'Amount of comments'}
    };

    var chart = new google.visualization.ColumnChart(document.getElementById('chart_div'));
    chart.draw(data, options);
  }
</script>

column chart

As you see it is quite easy to get and visualize project statistics, I encourage you to do this and make your projects even better!

#rails #mysql #activerecord #startup #plot #chart