ActiveRecord For Databases Without Unique Ids

We don't get to choose our families. Some of us don't even get to choose our databases. What do you do if you have the bad luck to inherit a database with non-unique ids? In this case study, Regan Ryan shows us how his team faced the challenge.

  • author photo By Regan Ryan
  • #ruby
  • Feb 15, 2021

Sometimes unique situations and things out of our control lead to wildly unorthodox requirements. Recently, I had an experience where I needed to use ActiveRecord without relying on the database ID for any records. If anyone is considering doing the same, I highly recommend finding another way! But, let's move on to the rest of the story.

Decisions were made. Smaller databases (clones in structure but not in data) needed to be merged. I joined the project just as the team was putting the finishing touches on a script that copies and pastes database records from one database to another. It copied everything exactly as-is, including IDs.

Database A

id fruit user_id
... ... ...
123 orange 456
... ... ...

Database B

id fruit user_id
... ... ...
123 banana 74
... ... ...

Database A after merge

id fruit user_id
... ... ...
123 orange 456
123 banana 74
... ... ...

This breaks the fundamental reason for having IDs: unique identification. I didn't know the specifics, but I felt like all kinds of problems would show up once duplicate IDs were introduced into the system. I tried to say something, but I was new to the project, and others seemed certain this was the best path forward. In a few days, we were going to deploy the code and start handling data with duplicate IDs. The question was no longer, "should we do this?"; instead, the questions were, "how do we do this?" and "how much longer will this take?"

Working with duplicate IDs

So, how do you handle data with duplicate IDs? The solution was to make a composite ID of several fields. Most of our DB fetches looked like this:

# This doesn't work, there may be 2 users with id: 123
FavoriteFruit.find(123)

# Multiple IDs scope the query to the correct record
FavoriteFruit.find_by(id: 123, user_id: 456)

All the ActiveRecord calls were updated in this way, and as I glanced through the code, it seemed to make sense. Until we deployed it.

All hell breaks loose

Shortly after we deployed the code, the phones started ringing. Customers were seeing numbers that didn't add up. They couldn't update their own records. All kinds of features were breaking.

What should we do? We didn't just deploy code; we also moved data from one database to another (and new data were created/updated after we deployed). It was not a simple rollback situation. We needed to fix things fast.

What is Rails doing?

The first step in debugging was to see what the current behavior was and how to reproduce the error. I took a clone of the production data and started a Rails console. Depending on your setup, you may not automatically see the SQL queries Rails runs when you execute an ActiveRecord query. Here's how to ensure SQL statements are visible on your console:

ActiveRecord::Base.logger = Logger.new(STDOUT)

After that, I tried some common Rails queries:

$ FavoriteFruit.find_by(id: 123, user_id: 456)

FavoriteFruit Load (0.6ms)
SELECT  "favorite_fruits".*
FROM "favorite_fruits"
WHERE "favorite_fruits"."id" = $1
AND "favorite_fruits"."user_id" = $2
[["id", "123"], ["user_id", "456"]]

find_by seemed to work fine, but then I saw some code like this:

fruit = FavoriteFruit.find_by(id: 123, user_id: 456)
...
...
fruit.reload

That reload made me curious, so I tested that too:

$ fruit.reload

FavoriteFruit Load (0.3ms)
SELECT  "favorite_fruits".*
FROM "favorite_fruits"
WHERE "favorite_fruits"."id" = $1
LIMIT $2
[["id", 123], ["LIMIT", 1]]

Uh oh. So, even though we initially fetched the correct record with find_by, whenever we called reload, it would take the ID of the record and do a simple find-by-id query, which, of course, would often give incorrect data due to our duplicate IDs.

Why did it do that? I examined the Rails source code for clues. This is a great aspect of coding with Ruby on Rails, the source code is plain Ruby and freely available to access. I simply googled "ActiveRecord reload" and quickly found this:

# File activerecord/lib/active_record/persistence.rb, line 602
def reload(options = nil)
  self.class.connection.clear_query_cache

  fresh_object =
    if options && options[:lock]
      self.class.unscoped { self.class.lock(options[:lock]).find(id) }
    else
      self.class.unscoped { self.class.find(id) }
    end

  @attributes = fresh_object.instance_variable_get("@attributes")
  @new_record = false
  self
end

This shows that reload is, more or less, a wrapper for self.class.find(id). Querying only by an ID was hardwired into this method. For us to work with duplicate IDs, we'd need to either override core Rails methods (never recommended) or stop using reload altogether.

Our Solution

So, we decided to go through every reload in the code and change it to find_by to get the database fetching via multiple keys.

However, that was only some of the bugs resolved. After more digging, I decided to test our update calls:

$ fruit = FavoriteFruit.find_by(id: 123, user_id: 456)
$ fruit.update(last_eaten: Time.now)

FavoriteFruit Update (43.3ms)
UPDATE "favorite_fruits"
SET "last_eaten" = $1
WHERE "favorite_fruits"."id" = $2
[["updated_at", "2020-04-16 06:24:57.989195"], ["id", 123]]

Uh oh. You can see that even though find_by scoped the record by specific fields, when we called update on the Rails record, it created a simple WHERE id = x query, which also breaks with duplicate IDs. How did we get around this?

We made a custom update method, update_unique, which looks like this:

class FavoriteFruit
  def update_unique(attributes)
    run_callbacks :save do
      self.class
        .where(id: id, user_id: user_id)
        .update_all(attributes)
    end
    self.class.find_by(id: id, user_id: user_id)
  end
end

Which let us update records scoped to more than IDs:

$ fruit.update_unique(last_eaten: Time.now)

FavoriteFruit Update All (3.2ms)
UPDATE "favorite_fruits"
SET "last_eaten" = '2020-04-16 06:24:57.989195'
WHERE "favorite_fruits"."id" = $1
AND "favorite_fruits"."user_id" = $2
[["id", "123"], ["user_id", "456"]]

This code ensured a narrow scope for updating records, but by calling the class's update_all method, we lost the callbacks that normally come with updating a record. Therefore, we had to manually run the callbacks and do another database call to retrieve the updated record since update_all doesn't return the updated record. The final product isn't too messy, but it's definitely more difficult to read than fruit.update.

The Real Solution

Due to sunken costs, management, and time constraints, our solution was to monkey patch Rails into using multiple keys for all database calls. This worked, in the sense that customers would still buy and use the product, but it was a bad idea for several reasons:

  • Any future development might inadvertently reintroduce bugs by using common Rails methods. New developers will need strict training to keep the code free of hidden bugs, such as using the reload method.
  • The code is more complicated, less clear, and less maintainable. This is technical debt that slows down development speed more and more as the project goes on.
  • Testing slows down a lot. You need to test not only that a function works but also that it works when various objects have duplicate IDs. It takes more time to write tests, and then each time the test suite is run, it takes more time to run through all the extra tests. Testing can also easily miss bugs if each developer on the project doesn't carefully test all possible scenarios.

The real solution to this problem is to never have duplicate IDs in the first place. If data need to be transferred from one database to another, then the script doing that should collect and insert the data without IDs, allowing the receiving database to use its standardized auto-increment counter to give each record its own unique ID.

Another solution would be to use UUIDs for all records. This type of ID is a long string of characters created at random (instead of step-by-step counting, as with an integer ID). Then, moving data to other databases would have no conflicts or issues.

The bottom line is that Rails was built with the understanding that IDs are unique per record and a quick and easy way to manipulate specific data in the database. Rails is an opinionated framework, and the beauty of this is how smoothly everything runs, as long as you stick to the Rails way of doing things. This applies not only to Rails but also to many other aspects of programming. When things get complicated, we should know how to identify the problem; however, if we write clear, maintainable, and conventional code, we can avoid many of these complications in the first place.

What to do next:
  1. Try Honeybadger for FREE
    Honeybadger helps you find and fix errors before your users can even report them. Get set up in minutes and check monitoring off your to-do list.
    Start free trial
    Easy 5-minute setup — No credit card required
  2. Get the Honeybadger newsletter
    Each month we share news, best practices, and stories from the DevOps & monitoring community—exclusively for developers like you.
    author photo

    Regan Ryan

    Regan is a full-stack Rails developer with a fondness for regular expressions and cleaning up messy code. In recent years he has been drinking hot cocoa in the jungles of Ecuador, swimming around the coast of Vanuatu, and climbing mountains in Germany. In his spare time he is an absolute cabbage, enjoying the little and quiet things of life.

    Stop wasting time manually checking logs for errors!

    Try the only application health monitoring tool that allows you to track application errors, uptime, and cron jobs in one simple platform.

    • Know when critical errors occur, and which customers are affected.
    • Respond instantly when your systems go down.
    • Improve the health of your systems over time.
    • Fix problems before your customers can report them!

    As developers ourselves, we hated wasting time tracking down errors—so we built the system we always wanted.

    Honeybadger tracks everything you need and nothing you don't, creating one simple solution to keep your application running and error free so you can do what you do best—release new code. Try it free and see for yourself.

    Start free trial
    Simple 5-minute setup — No credit card required

    Learn more

    "We've looked at a lot of error management systems. Honeybadger is head and shoulders above the rest and somehow gets better with every new release."
    — Michael Smith, Cofounder & CTO of YvesBlue

    Honeybadger is trusted by top companies like:

    “Everyone is in love with Honeybadger ... the UI is spot on.”
    Molly Struve, Sr. Site Reliability Engineer, Netflix
    Start free trial
    Are you using Sentry, Rollbar, Bugsnag, or Airbrake for your monitoring? Honeybadger includes error tracking with a whole suite of amazing monitoring tools — all for probably less than you're paying now. Discover why so many companies are switching to Honeybadger here.
    Start free trial
    Stop digging through chat logs to find the bug-fix someone mentioned last month. Honeybadger's built-in issue tracker keeps discussion central to each error, so that if it pops up again you'll be able to pick up right where you left off.
    Start free trial