Wooden file card drawers

image by Jan Antonin Kolar

Know what your scopes are doing: where.first vs find_by

There are numerous benefits to using an Object-Relational Mapper (ORM), such as Active Record, to generate your SQL. You gain from clear repeatable abstractions, saving time and improving readability.

RailsConf 2024

I'm co-chairing RailsConf 2024 in Detroit May 7–9. Come and join us 

Unfortunately the very abstractions that make an ORM useful can also have unintended consequences.

It is very easy to introduce performance problems because the actual SQL that runs on your database server is generated by the framework, sometimes this means the SQL isn’t as efficient as it could be.

We recently came across an issue at CoverageBook that highlighted this very point.

Instead of…

…using a where condition then first.

User.where(email: "andy@goodscary.com").first

Use…

find_by.

User.find_by(email: "andy@goodscary.com")
User.find_by_email("andy@goodscary.com")

Why?

This is one of those cases where the ORM (and the tooling around it) get in the way and introduce unforseen performance issues.

The .where scope has an implicit ORDER scope on the primary key that isn’t obvious at first glance.

User.where(email: "andy@goodscary.com")
# SELECT "users".*
# FROM "users"
# WHERE "users"."email" = "andy@goodscary.com"

User.where(email: "andy@goodscary.com").first
# SELECT "users".*
# FROM "users"
# WHERE "users"."email" = "andy@goodscary.com"
# ORDER BY "users"."id" ASC
# LIMIT 1

User.find_by(email: "andy@goodscary.com")
# SELECT "users".*
# FROM "users"
# WHERE "users"."email" = "andy@goodscary.com"
# LIMIT 1

Straightforward indexes on our database didn’t help us as in our—more complex—case. We were querying using an index, but because we were using .where().first we were inadvertently doing a non-indexed scan to establish the order, which caused enormous performance problems.

Additionally, we were writing many thousands of rows per second and, even with a monstrously powerful database, we were seeing issues because the entire table was being sorted to then pick only one record.

Debugging this issue was tricky because it is not possible to call .to_sql on the results of .find_by or .where().first as the query executes and you have to use logging to work out the exact SQL that is being generated.

Knowing the exact SQL Active Record is generating from methods that might seem the same on the surface can be very important.

Why not?

In small tables, under light load, the performance impact of using where().first would be negligible.

Brighton Ruby 2024

Still running UK’s friendliest, Ruby event on Friday 28th June. Ice cream + Ruby 


Last updated on March 14th, 2022 by @andycroll

An email newsletter, with one Ruby/Rails technique delivered with a ‘why?’ and a ‘how?’ every two weeks. It’s deliberately brief, focussed & opinionated.