Where did my Enum go?

#Enum is probably one of the things I miss most in #Ruby. Not sure why - but as a fairly common design pattern I expected to see it more than I do in Ruby's Stdlib (or maybe even Core) and in Ruby / Rails implementations around.

Enum (or Factor if you will) is basically a set of values called members (or elements) that act as a constant and represent some sort of a numeric (mostly integer) value. It is mainly used to represent a string constant in a numeric way, allowing easier serialization of data representation and avoiding recurring types of long / annoying strings. Java, C#, and even our beloved neighbor Python - all provide some kind of an implementation to Enum as part of their toolbelt, some databases like (MySQL)[http://dev.mysql.com/doc/refman/5.0/en/enum.html], Postgres provide a specific implementation "by the book" for enum types but some others like SQL-Server (yes, Microsoft again) provide some black magic to achieve an Enum representation:

mycol VARCHAR(10) NOT NULL CHECK (mycol IN('Oranges', 'Apples', 'Pineapples'))

But, a decent Enum is still missing in Ruby.

Quick draw Enum in Ruby

If you ignore all the fancy words around the Enum definition, you are probably saying to yourself - "Eh, it's a hash".

Right.

The most simple way is to create a frozen hash constant that will hold all your constants:

class Canvas
  COLORS = { 
              0 => :black,
              1 => :white,
              2 => :green,
              ...
            }.freeze
end

testing it out:

$> Canvas::COLORS[0]
:black

This #implementation is rather simple and convenient - but not too easy to maintain and carries absolutely no syntactic sugar.

Note that if you want to access the numeric value via a given constant (ex: Canvas::Colors[:black]) you'll need to implement external access methods for the frozen hash. Another problem I have with this implementation is that it is too specific, if you want an other class to use the COLORS "enum", you'll need to redefine it again.

Reusable Module Enum

The Reusable module implementation is very similar to the implementation that you'll find in C#, Python and Java - a free module that allows the enum to be re-used in multiple classes and is basically all about defining constants:

module ColorsEnum
  BLACK = 0
  WHITE = 1
  GREEN = 2
end

class Canvas
  include ColorsEnum
end

class Wall
  include ColorsEnum
end

This form allows code re-using of course, but still lacks the ability to access the enum members by both numeric index and the constant.

Enum by method

Before reaching out to solve the dual access problem (isn't it annoying to declare constants manually?) lets add some syntactic sugar to the constant definition process.

class Object
  def self.enumify(*args)
    args.flatten.each_with_index do | const, i |
      const_set(const, i)
    end
  end
end

class Bowl
  enumify "ORANGES", "APPLES", "PINEAPPLES"
end

p Bowl::ORANGES # => 0

Now we are getting closer. We can't use this syntactic sugar anywhere in our app to define enums like the big boys do, but it still doesn't solve our dual access problem.

Enum with dual-access

The feature that almost all the Enum implementations I presented above are missing is the ability to access an enum member by either its value or by name. The following implementation (original from here) suggests a more robust module based implementation, it adds an object implementation for each enum member to store all member attributes (index for example) and extends the enum base class with the Enumerable module capabilities.

I used this base class and added some more attributes to the Enum::Member class to also store the actual value and allow dual access to both the key and the value of the member:

class Enum < Module 
  class Member < Module 
    attr_reader :enum, :index, :syme

    def initialize(enum, index, syme) 
      @enum, @index, @syme = enum, index, syme 
      # Allow Color::Red.is_a?(Color) 
      extend enum 
    end 

    # Allow use of enum members as array indices 
    alias :to_int :index 
    alias :to_i :index 

    alias :to_sym :syme
    alias :to_s :name

    def name
      self.syme.to_s
    end

    # Allow comparison by index 
    def <=>(other)
      @index <=> other.index if other.respond_to?(:index)
    end

    include Comparable
  end

  def initialize(*symbols, &block)
    @members = []    
    symbols.each_with_index do |symbol, index|
      # Allow Enum.new(:foo)
      symbol = symbol.to_s.sub(/^[a-z]/){|letter| letter.upcase}.to_sym
      member = Enum::Member.new(self, index, symbol)
      const_set(symbol, member)
      @members << member 
    end 
    super(&block) 
  end 

  def all
    all = {}
    @members.each_with_index do |member, index|
      all[index] = member
    end
  end

  def [](val)
    if val.is_a?(Numeric) 
      @members[val] 
    elsif val.is_a?(Symbol)
      @members.select {|member| member.syme == val }.first
    elsif val.is_a?(String)
      @members.select {|member| member.name == val }.first
    end
  end 

  def size 
    @members.size 
  end 

  alias :length :size 

  def first(*args) 
    @members.first(*args) 
  end 

  def last(*args) 
    @members.last(*args) 
  end 

  def each(&block) 
    @members.each(&block) 
  end 

  include Enumerable 
end

Lets focus on the Enum#[] method in line 49:

def [](val)
  if val.is_a?(Numeric)
    @members[val]
  elsif val.is_a?(Symbol)
    @members.select {|member| member.syme == val }.first
  elsif val.is_a?(String)
    @members.select {|member| member.name == val }.first
  end
end

This method allows the dual-access mode I discussed before when doing the following

  • When getting a Numeric value as a parameter it checks it against any memeber's #index attribute
  • When getting a string it compares it to the #name attribute.

Voila!

This implementation stores the member list in an array member named @members and delegates manually all the Array and Enumerable methods to this array member. Seems like we can simply inherit from Array and drop all these delegation methods.

Ruby Set based Enum

Set implements a collection of unordered values with no duplicates. By using Set as a parent we inherit its ability to traverse members, we don't need to implement and manage delegation to an array member and we don't need to create constants (I'll discuss that later in this article):

require 'set'

class Enum < Set

  class Member

    include Comparable
    attr_reader :symbol
    attr_reader :index
    attr_reader :name

    def initialize(value, index = 0, name = nil)
      @symbol = value.to_sym
      @index = index.to_i
      @name = name ||= value
    end

    def <=>(other_member)
      self.symbol <=> other_member.symbol
    end
  end

  def initialize(*members)
    super()
    populate(members)
  end

  def to_a
    super.sort_by(&:index)
  end

  def each
    block_given? or return enum_for(__method__)
    self.to_a.each { |o| yield(o) }
    self
  end

  def [](index_or_symbol)
    if index_or_symbol.is_a?(Symbol)
      return self.select {|member| member.symbol == index_or_symbol }.first
    elsif index_or_symbol.is_a?(Integer)
      return self.select {|member| member.index == index_or_symbol }.first
    end
  end

  protected

  def populate(members)
    members.each_with_index do |member, index|
      self.add(Enum::Member.new(member, index)) unless self.to_a.collect(&:name).include?(member)
    end
  end
end

First, I'll focus on the #populate method. This method is fired when #initialize finishes and appends all members applied to the Enum set. Line 50 is where all the magic happens - it creates a new Enum::Member and adds it to the set, unless it already exists.

Wait! Aren't set members already unique by default?

Yes, they are. But this implementation runs into a Ruby brick wall. The ruby Set keeps an internal hash (named @hash surprisingly) that holds the member as keys and sets a value of true for each one when they are entered. Further more, Set#include? is delegated to Hash that on its turn - runs a native C code that compares objects directly.

In our case it will result in something like this:

Enum::Member.new.object_id == Enum::Member.new.object_id

Which, no matter if both members have the same attributes, will return false.

Set#include? does not exercise the Comparable module convenience I included in Enum::Member, so we need to do a comparison on our own.

And why didn't you use Constants?

It is easy and smart to use constants when you can take care of any naming conventions and constant name limitations that Ruby introduces, in our case we allow Enum::Member to initialize with any kind of string, even ones that don't pass the constant name restrictions, some examples are:

Enum::Member.new("888betonline", 0).symbol #=> :"888betonline"

# Same in a constant
Object.const_set(888BETONLINE, 0)
SyntaxError: compile error
(irb):10: syntax error, unexpected tCONSTANT, expecting ')'
User.const_set(888BETONLINE, 0)
                           ^

Since we don't know or even want to restrict our enum value names, dropping the constant convention out of the loop seemed like the right thing to do.

Update: A clean, simple and perfect Hash

After this post was published, a talked to my friend Daniel about the Set implementation. He suggested the the Set usage is not required since that #include? bug causes me to write the presence test myself and that anyway, I traverse the members myself so the Set has no meaning.

he later came up with this suggestion:

class Enum < Hash
  def initialize(*members)
    super()
    @rev = {}
    members.each_with_index {|m,i| self[i] = m }
  end
  def [](k)
     super || @rev[k]
  end
  def []=(k,v)
    @rev[v] = k
    super
  end
end

enum  = Enum.new("Apples", "Oranges")

enum[:Apples] # => 0
enum[0]       # => :Apples 

This solution, not only answer the dual access requirement - but it drops the need to use an internal Enum::Member inclusion. Definitely a winner

Conclusion

Ruby misses a real enum implementation. I would be happy to see an official one come out in one of the next releases. Until then the Hash implementation is something I will be using.

Source available here, fork away.