In this post I will tell how we created Gitfm in just 48 hours for RailsRumble2012, what its architecture looks like and what technologies are used.

Idea

On Friday night I and couple of my colleagues decided to participate in RailsRumble2012 – competition of rails developers when in 48 hours a team of 4 people (or less) must create app from scratch, configure Linode VPS server and deploy their app. Then 65 judges select 10 winners + one people's choice award.

To select the idea for a project we used brainstorm – we spoke by one for 30 seconds pointing out good and bad qualities of idea. So after rejecting idea of personal checklist of useful todos, aggregator of products reviews we came up with GitFM. Also I want to mention that our team is geo separated – we live in Minsk (Belarus), Moscow (Russia), Prague (Czech Republic) and Helsinki (Finland).

In Minsk timezone rumble started at 3am. At 7 am we were discussing architecture and tools to create recommendation service – we spent couple of hours reading about 'collaborative filtering algorithms' which are used in Amazon, LastFM and many more to recommend relevant items. So we found out about Apache Mahout and since it is built on top of the Hadoop we decided to use it. With Hadoop we can calculate recommendation really fast and it is a key factor of our app.

Setup

So by now our server setup looks like this: server_setup

Rails app

The core of our project is Rails application. It is responsible for fetching user and repo data from Github. To make it really fast we chose EM Synchrony by @igrigorik. Let me list couple of useful links for you to start using it right away:

http://gistflow.com/posts/213-check-pages-status-with-em-synchrony-crawler http://stackoverflow.com/questions/11000029/nested-iterators-for-em-synchrony-in-ruby

jRuby daemon and Apache Mahout

When we need to generate recommendations for user Rails app pushes user id to Redis query which is processed by jRuby daemon. Why jRuby? Apache Mahout is written in #java so to make communication with it easy we wrote jRuby script which pulls user_id and then call method which writes recommendations to PostgreSQL database. To create jRuby daemon we used this generator, it is really good, great respect to @junegunn.

ApacheMahout implements algorithms of collaborative filtering and there are many specifications for it. So for first version of GitFM we used probably the simplest one. We uses Tanimoto coefficients for similarity calculation and we use binary model of user preference (i.e. user gives each repo a mark of 0 or 1). In these couple of days I fell in love with collaborative filtering techniques so let me share some links with you:

http://www.igvita.com/2007/01/15/svd-recommendation-system-in-ruby/ http://www.igvita.com/2009/09/01/collaborative-filtering-with-ensembles/ http://jaydonnell.com/blog/2011/10/21/collaborative-filtering-using-jruby-and-mahout/ http://code.google.com/p/unresyst/wiki/CreateMahoutRecommender

WebSocket

In fact GitFM is three applications communicating through #Redis. To send message that recommendations are ready we wrote special server – when signed user visits recommendations page we open connection to our server which listens to Redis via messages:* pattern and stores pairs of id-connection. When recommendations are ready message with user id is sent and our user receives recommendations via web socket. So to use these techniques you need to be familiar with:

https://github.com/igrigorik/em-websocket https://github.com/mloughran/em-hiredis https://github.com/gimite/web-socket-js

Conclusion

So by the moment we have 3128 users in our database (probably there is much more already) and more than 95000 repos. We have processed more than 4000 recommendations requests.

Here is distribution of languages in GitFM:

languages

Our plans

To make our repo preference more smart – we need to consider forks, watched repos, followers and so on. Also after RailsRumble we are going setup another server for recommendations generation, we want to have relativistic-fast recommendations.

At last take a look of #octocat by @v1515 special for GitFM:

octocat

GitFM was created at Evrone by @makaroni4, @releu, @ognevsky and @kirs.

Vote for us!

If you like GitFM please vote for us in RailsRumble:

http://railsrumble.com/entries/421-gitfm

  • you need to sign in with twitter
  • press Favorite