Arel with Wharel

Object Relational Mappers (ORM) are complex things. As the term implies, an ORM provides a mapping from objects in an object-oriented language like Ruby to a relational persistence layer. In theory, this makes it possible to abstract away details of this layer behind objects; in practice, of course, things are not so simple.

In this post, I’ll show you how I’ve leveraged a couple very simple features of Ruby’s object model to simplify building queries in ActiveRecord, in the form of a teeny tiny gem called Wharel. Wharel allows you to create Arel predicates in blocks passed to ActiveRecord query methods, without all the standard boilerplate.

Before I explain Wharel, though, let’s take a few steps back and consider the problem this gem tries to solve.

Querying with ActiveRecord

Part of the challenge of building and extending an ORM is striking a good balance between convenience and control. ActiveRecord offers you by default a fairly minimal interface for making queries which (I suppose) satisfies about 90% of users' basic needs. That interface is made up of the handful of methods like where and not, which allow you to create queries built up of simple equality predicates, optionally negated:

Post.where(title: "foo").where.not(content: "bar")
#=> SELECT "posts".* FROM "posts" WHERE "posts"."title" = 'foo' AND "posts"."content" != 'bar'

ActiveRecord also supports SELECT, HAVING, GROUP BY, along with the combinators AND and (recently) OR, as well as inner and outer JOIN, and a few others I forget just now.

Which is all fine and good, and may be adequate to power queries in your little CRUD app. But very quickly you will find that where and friends are not enough. Suppose, for example, you want to search for partial matches on string-valued data. A Rails developer would typically drop down to SQL to do this:

Post.where("title LIKE ?", "%#{params[:title]}%")
#=> SELECT `posts`.* FROM `posts` WHERE (title LIKE '%foo%')

… where params[:title] is a title from user input, and the application is using MySQL as its database.

In this case, we’re not really using the ORM much at all, other than to sanitize the data (params[:title]) interpolated into the SQL string. We can’t extend or re-use this query in other contexts. We can’t take it apart, build it up, or compose it. We can’t use it with a different database which uses a different name for LIKE (like PostgreSQL, which uses ILIKE for case-insensitive matching).

In short, we can either use this SQL snippet as-is, or re-write it to be something different.

Enter Arel

For one-off situations, this may well be enough. But for anything else, ActiveRecord offers another layer in the form of Arel, a relational algebra for building queries abstracted from the particulars of any particular database. If ActiveRecord is a mapping between Ruby model objects and a relational database, Arel is the fine-grained mapping between Ruby query objects and SQL query fragments.

Let’s get concrete. Convert the LIKE query above to Arel and we get:

posts = Post.arel_table
Post.where(posts[:title].matches("%#{params[:title]}%"))
#=> SELECT `posts`.* FROM `posts` WHERE (`posts`.`title` LIKE '%foo%')

Here, we needed to first call Post.arel_table to get an instance of Arel::Table corresponding to the posts table for the model. We can then call posts[:title] to get an arel node (actually, an Arel::Attribute) for the title column.

We can then use this title node to build a query using the matches method, which corresponds to LIKE in MySQL.

This may not seem like such a big win, since anyway you could have written the query in raw SQL with less lines of code. But what if you wanted to re-use that snippet in combination with another condition?

You could certainly do this with a new inline SQL, but a more scalable way is to use Arel. For example:

class Post < ApplicationRecord
  scope :with_title,             ->(str) { where(matches_title(str)) }
  scope :with_title_or_subtitle, ->(str) { where(matches_title(str).or(matches_subtitle(str))) }

  class << self
    def matches_title(title)
      arel_table[:title].matches("%#{title}%")
    end

    def matches_subtitle(subtitle)
      arel_table[:subtitle].matches("%#{subtitle}%")
    end
  end
end

In this case, we’ve used the arel predicates returned by matches_title and matches_subtitle to build a combined predicate using the pattern a.or(b), which generates SQL like:

# Post.with_title_or_subtitle("foo")
SELECT `posts`.* FROM `posts`
WHERE (`posts`.`title` LIKE '%foo%' OR `posts`.`subtitle` LIKE '%foo%')

As you can imagine, with more complex queries, using Arel building blocks can be quite useful. But it should also be apparent that Arel’s “wordiness” can be a real turn-off, and partly explains why Rails developers tend to prefer one-off SQL fragments over Arel predicates.

Where + Arel = Wharel

Recently I’ve been looking a lot at the internals of Sequel, a Ruby ORM developed by Jeremy Evans that is superficially similar in many ways to ActiveRecord, but which has a vastly different internal implementation and much broader feature set.

One of the things that fascinated me about Sequel is its interface for building queries, which allows you to use blocks in the form of so-called “virtual rows”, like this:

Artist.where{id > 5}

Here, we are querying on a column id. But notice that rather than passing a hash of keys and values to where, we are instead passing a block which references the method id and builds it into a predicate with id > 5.

This type of query would not be possible passing a hash to ActiveRecord’s where, since a hash can only generate equality predicates. Instead, we would need to build the appropriate predicate with Arel, like this:

artists = Artist.arel_table
Artist.where(artists[:id].gt(5))

For such a simple query, I find the ActiveRecord/Arel code to be much harder to read. Unsurprisingly, others have felt the same way.

Many years ago, Ernie Miller set out to improve ActiveRecord with an interface similar to Sequel, in the form of a gem called Squeel. Squeel was popular, but ultimately the monkey-patching magic it required to bend ActiveRecord into shape burnt out its maintainer, and the gem is now abandoned. More recently, a gem named BabySqueel has attempted to recreate some of this interface without Squeel’s monkey-patching baggage.

While working on Mobility, I recently had a need to incorporate Arel predicates with ActiveRecord relations in a way that went beyond AR’s standard query method interface. I decided to implement something like the block-based query interface from Sequel, but vastly simplified.

In the process, it became apparent to me how simple it is to do this, so I extracted the core logic into a separate gem I’ve called Wharel, a portmanteau of “Where” and “Arel”.

What does Wharel do? Well, add it to your Gemfile and you can simplify queries like the one above to:

Artist.where { id.gt(5) }

Wharel also supports combining nodes from different models by optionally using a block argument, like this:

Artist.joins(:posts).where { |a| Post.where { |p| p.title.matches(a.name) } }

(This query matches artists who have at least one post with a title that matches their name.)

As you can see, Wharel adds an optional block format like Sequel, within which any attributes called as methods are converted to their corresponding Arel node. In this case, id becomes the arel node Artist.arel_table[:id].

That’s it. There’s no additional magic to support predicates like id > 5, which Sequel, Squeel and BabySqueel offer, because I wanted to keep Wharel to an absolute minimum level of complexity.

And Wharel is indeed very simple: its core is a mere 31 lines of code. How can that be? Let’s have a look inside.

Virtual Rows and BasicObjects

Here is the Wharel module:

module Wharel
  module QueryMethods
    def where(opts = :chain, *rest, &block)
      block_given? ? super(VirtualRow.build_query(self, &block)) : super
    end

    module WhereChain
      def not(*args, &block)
        block_given? ? super(VirtualRow.build_query(@scope, &block)) : super
      end
    end
  end

  class VirtualRow < BasicObject
    def initialize(klass)
      @klass = klass
    end

    def method_missing(m, *)
      @klass.column_names.include?(m.to_s) ? @klass.arel_table[m] : super
    end

    class << self
      def build_query(klass, &block)
        row = new(klass)
        query = block.arity.zero? ? row.instance_eval(&block) : block.call(row)
        ::ActiveRecord::Relation === query ? query.arel.constraints.inject(&:and) : query
      end
    end
  end
end

There is really not much here. The QueryMethods module is required to patch ActiveRecord to handle blocks; nothing special there. What is special is the VirtualRow class, which subclasses BasicObject.

Why BasicObject? You may have seen this esoteric, minimalist object somewhere in a diagram of Ruby’s object model, and recall that every Object inherits from it. Indeed, BasicObject is a very simple object with almost no methods at all, and for most contexts not something you’d really want to lean on too much.

But in this case, the fact that BasicObject is so minimal happens to be very useful, because it allows us to create what’s called a Clean Room. Take a look at build_query, which you’ll see we call from where or not to convert a block to an Arel predicate:

VirtualRow.build_query(self, &block)

Here, self is the class or relation (for not it is @scope, which is basically the same), and block is the block passed to where or not.

In build_query, we first create an instance of VirtualRow with a reference to the class (or relation) (this is the new(klass) line). We then check the arity of the block (don’t be scared, “arity” is just another word for “number of arguments”).

If the block has no arguments (block.arity.zero?), like this:

Post.where { title.eq("foo") }

… then we call instance_eval on the virtual row and pass it the block.

Now what happens?

Well, instance_eval is a bit like that movie, Being John Malkovich, when you go through the door and enter the mind of actor John Malkovich. Except in this case, the door takes you to the mind of whatever object you’re calling instance_eval on. Any method you call from this context is received by that object, just like everything you see when you go though the door is what John Malkovich is seeing.

Here, the object is the virtual row. And this is where the fact that VirtualRow subclasses BasicObject comes in useful. Since VirtualRow has virtually no methods (no pun intended), anything you call on it will fall through to method_missing. So if you call title, method_missing will be called with :title as its first argument.

What do we do then? Well, just define a method_missing to catch these method calls and “route” them to the model’s Arel table, which Wharel does in three lines of code:

def method_missing(m, *)
  @klass.column_names.include?(m.to_s) ? @klass.arel_table[m] : super
end

Here, @klass would be Artist, so we’re calling Artist.arel_table[:title], which will get us the Arel node for title. Perfect! That’s exactly what we wanted.

Since this works for any attribute (as long as the attribute name does not overlap with a method on BasicObject, which is exceedingly unlikely), we can build up any arbitrary Arel predicate in this block without ever having to explicitly call arel_table on any model.

That Object Model Thing

Although this post has Arel in its title, the real point I’m trying to make with Wharel is to show how some of Ruby’s seemingly esoteric features, like BasicObject and instance_eval, can actually be very powerful. With only thirty lines of code, we’ve created an interface that eliminates a huge amount of Arel boilerplate code in any Rails app. I find this really remarkable.

Although I don’t think many Rubyists are very familiar with it, Ruby’s Object Model is actually very powerful and I highly recommend digging deeper into its details. Books like Metaprogramming Ruby by Paolo Perrotta are a great place to start, but it’s also a good idea to just pry open your favourite gem (like I did with Sequel) and see what it’s doing under the hood. If it’s doing something powerful, it’s probably leveraging this model to do it.

Posted on May 30th, 2018. Filed under: activerecord, arel, metaprogramming, ruby, rails.
blog comments powered by Disqus