Reducing Leaky Abstractions Introduced by ActiveRecord

Rails’ ActiveRecord provides a comprehensive interface for querying the database. Unchecked and without proper processes in place, it can become unwieldy as the domain changes.

The Setup

Imagine an application domain where a team of people publishes a technical blog.

class Person < ApplicationRecord
  has_many :posts
end

class Post < ApplicationRecord
  belongs_to :author, class_name: "Person"
end

In addition to the author association and other standard post data attributes, the Post model contains a boolean flag named published.

A Rails controller showing the newest published posts might look like:

class PostsController < ApplicationController
  def index
    @newest_posts = Post.where(published: true).order(created_at: :desc).limit(10)
  end
end

Let’s go one step further, where we create a page dedicated to the list of published authors:

class AuthorsController < ApplicationController
  def index
    @published_authors = Person.distinct.joins(:posts).where(posts: { published: true })
  end
end

The Pain

A new feature comes in where teammates want to enqueue posts to be published in the future.

This could be modeled by adjusting the published boolean to a published_at timestamp that allows for three states:

  • unpublished (published_at is set to nil)
  • published (published_at is set to a timestamp less than or equal to now)
  • enqueued (published_at is set to a timestamp in the future)

While this is a relatively small change in the database and corresponding migration (which we won’t go into here), the necessary changes across these different controllers represent a code smell, Shotgun Surgery.

While this example is small, in larger codebases, changes like this can add up to a sizeable PR quickly. Most often, changes associated with this shift in data include:

  • controllers
  • service objects
  • query objects
  • jobs
  • factories
  • tests (especially acceptance tests or anything that touches the database)

The Underlying Issue

The underlying issue here is that ActiveRecord can act as a leaky abstraction.

By nature of it abstracting over a database with direct references to columns, in combination with the ability to use where either directly on Post within a controller (or even worse, reaching through an association to find published authors in the second controller example), we’re littering information about how a post is considered published (the contents of the where clause) in a few different files within the application (currently, the model and two separate controllers).

The Suggested Fix

While this approach is dependent on the complexity of the queries, I’d first lean on a class method on Post:

class Post < ApplicationRecord
  # other methods

  def self.published
    where("published_at < ?", Time.current)
  end
end

With this, changes to the controllers are trivial:

class PostsController < ApplicationController
  def index
-   @newest_posts = Post.where(published: true).order(created_at: :desc).limit(10)
+   @newest_posts = Post.published.order(created_at: :desc).limit(10)
  end
end

class AuthorsController < ApplicationController
  def index
-   @published_authors = Person.distinct.joins(:posts).where(posts: { published: true })
+   @published_authors = Person.distinct.joins(:posts).merge(Post.published)
  end
end

Is there still coupling at the controller level between a person and their corresponding posts? Yep! Adjusting that setup, however, seems more appropriate to be a breaking change, where the notion of a post being published should hold, generally speaking, whether we’re using a published boolean, a published_at timestamp, or some sort of state machine.

Worth highlighting in this second change is the merge method, which handles all the heavy lifting of merging the Post.published query with the Person.distinct query.

Caveats and Considerations

In working with larger applications, use of where is not the only indicator from ActiveRecord, nor is it always problematic. where with associations, for example, falls into the “coupling association” category, which is usually innocuous.

It’s also worth noting that where use being problematic is not only bound to Rails controllers; service objects, jobs, and other areas of the application querying against the “guts” of an ActiveRecord object are susceptible.

Finally, while we used a class method in the example above, for larger queries, consider a dedicated query object to encapsulate logic in the appropriate spots.