Have you ever face a situation where unique validations in your #rails app didn't work? Under the cut a little story about this situation in project that I maintain and solution how broken records (duplicates) were fixed.

Uniqueness validation vs Race conditions

In your #rails app when you want some filed to be unique you usually do this:

class User < ActiveRecord::Base
  validates :email, uniqueness: true
end

But as described in doc ActiveRecord does not guarantee that there will be no duplications due to the race conditions. And it is a fact.

To demonstrate this let's create a simple application with User model and try to imitate concurrent requests. We will create users create action and than make a number of requests using #em_synchrony.

rails g model User email:string
rake db:migrate

# app/controllers/users_controller.rb
class UsersController < ApplicationController
  def create
    User.create email: 'blah@blah.blah'
    render :nothing => true
  end
end

# app/controllers/application_controller.rb
class ApplicationController < ActionController::Base
  # protect_from_forgery
end

# config/unicorn.rb
worker_processes 5

# run server with
RACK_ENV=none RAILS_ENV=development unicorn -c config/unicorn.rb -p 3000

NOTICE that since we will just make POST http requests I remove csrf tokens check be commenting protect_from_forgery. Never do it in production.

A good tool to imitate a lot of users making conrurrent request is #em_synchrony:

require "em-synchrony"
require "em-synchrony/em-http"

URL = 'http://localhost:3000/users'

EM.synchrony do
  CONCURRENCY = 10

  results = EM::Synchrony::Iterator.new((1..500), CONCURRENCY).map do |index, iter|
    http = EventMachine::HttpRequest.new(URL).apost
    http.callback do
      puts "SUCCESS #{index}"
      iter.return(http)
    end

    http.errback do 
      puts "ERROR #{http.response_header.status}"
      iter.return(http)
    end
  end


  EM.stop
end

After we launch this script you can see that we have 5 users in our database with the same email.

Users.count
# => 5

Fighting race conditions

To prevent creating of duplications the recipe is quite simple – just create unique index and protect dulications at database level:

add_index :users, :email, unique: true

Consequences of duplications

If you have a lot of User associations what could happen is different records belongs to different duplicates of the same user. It could be payments, comments anything and if you try to create index it will throw an exception because there are duplicates in you users table. What we need to do is to merge all user associations to one user and delete user duplicates. To check how many duplications you have is quite simple:

User.count(group: :email).select { |k,v| v > 1 }

# => {"blah@blah.blah"=>10}

Merging duplicates

Algorithm is pretty simple:

  • create fresh backup (always create backups :) )
  • find group of users with the same email
  • choose one user to be saved (let it be the most recent updated)
  • find all associations with user_id belongs to group of users
  • replace this user_id with user_id of chosen user
  • remove user duplicates
  • run unique index migration

Here is rake task for has_one and has_many associations:

namespace :users do
  task :merge_duplicates => :environment do
    associations = [:has_one, :has_many].inject([]) do |names, assoc|
      names += User.reflect_on_all_associations(assoc).map(&:name)
      names
    end

    duplicate_emails = User.count(group: :email).select { |k, v| v > 1 }.keys

    duplicate_emails.each do |email|
      users = User.where(:email => email)
      current_user = users.order('updated_at DESC').first

      users.each do |user|
        associations.each do |association|
          next unless user.send(association)
          user.send(association).update_all :user_id => current_user.id
        end
      end

      users.keep_if { |u| u.id != current_user.id }
      users.map(&:destroy)
    end
  end
end

The only trouble you can have here is if your project is under high load and if you fix some user associations and this user is active at the moment some other associations could be created. So it is probably better to first save user_ids somewhere and delete all duplicates in users, run unique index migration and after fix associations.

Hope that helps, the rule of thumb here – if you create validations or associtation always create index.

Literature

#ruby #rails #active_record