NewCI/CD in 2023. Check out the December Release for usage metrics, platform improvements, and a sneak peek at upcoming features.

5 ways we’ve improved flakey test debugging


We use RSpec as our test framework, and it has a ton of hooks and features that allow us to extend it in cool and useful ways. So here’s the top 5 things we did that turned out to be awesome.

Change #1 — Saving database state for failed tests

One of the more tricky types of test failures to debug are when you expect the database to be in one state, but it’s weirdly in another. Perhaps there are extra or missing records from the database that’s making all your tests fail. They’re even harder to debug on CI when the database state is wiped after every test and destroyed at the end of the test suite.

To combat these types of error, when a test fails, we capture the state of the database at that time and store it as a simple JSON file. We then have a mechanism to load that same state into a temporary database on our development machines so we can see what was going on in the database when the test failed.

Here’s a gist of the code we use to save and load database state from Rspec: https://gist.github.com/keithpitt/c124eec848c6b40ee4a5f1f1ec9f9cc9

Once you’ve downloaded a dump of the database state, loading it into a temporary database is super easy with script/load_test_database_state

Now I can inspect the database state at the time of the error

As a bonus, we upload the database state as a Buildkite Artifact so we can quickly download the state for each failed test.

Uploading the database state to Buildkite as an artifact

Change #2 — Our own simple database cleaner

Now that we had the database state saver, I decided to have a go at trying to fix some of the flakey tests in our test suite. The next time I saw them fail, I downloaded the state of the database for that test to figure out what was going on.

Looking through the downloaded data, I noticed that there were extra records in the database from previous tests runs that shouldn’t have been there. After more digging, it turned out that database_rewinder (the gem we use to clean the database between test runs) had a pretty major caveat: it doesn’t clean tables that you insert data into using ActiveRecord::Base.connection.execute %{INSERT ...} (which our application does a lot of for performance reasons).

After spending some time trying to figure out to how fix the issue within database_rewinder, I decided that it may be easier to just write our own database cleaning code. It ended up being much simpler than I thought it would be — and less lines too!

# Add this file to `spec/support/simple_database_cleaner.rb` class SimpleDatabaseCleaner def clean(connection) @cached_sql ||= {} sql = @cached_sql[connection] ||= begin "".tap do |sql| connection.tables.each do |table_name| next if table_name == "schema_migrations".freeze sql << %{DELETE FROM #{connection.quote_table_name(table_name)};\n} end end end connection.execute sql end end RSpec.configure do |config| database_cleaner = SimpleDatabaseCleaner.new config.before do |example| database_cleaner.clean(ActiveRecord::Base.connection) database_cleaner.clean(RecordWithOtherDatabase.connection) end end

Change #3 — Extracting logs from failed tests

When tests fail, it’s hard to find the specific place in the logs where that spec ran. We made it easier for ourselves by splitting out the log with the failed test.

You can find the code we use to split logs here: https://gist.github.com/keithpitt/74950281d01b11bac7b076fb1d3eda1a

The Rails Log Splitter also shows the name of the current spec

Just like we did the database state files, we upload all log/**/*.log files as Buildkite Artifacts so we can easily download them from the Buildkite UI.

Easy access to log of a failed spec

Change #4 — Tracing ActiveRecord IDs in logs

A common thing we found ourselves doing locally when trying to debug database related test failures was this:

it "returns does something" do user = User.create(...) Rails.logger.info "User created and had ID: #{user.id}" ... end

We found this a little repetitive, and we wanted a mechanism that not only worked when debugging failures locally — but also when we were trying to debug tests that failed on CI.

To replace this repetitive manual step, we invented the ActiveRecordIDTracer that outputs the ID of the last inserted record into log/test.log

ActiveRecordIDTracer as seen in log/test.log

# Add this file to `spec/support/active_record_id_tracer.rb` module ActiveRecordIDTracer def insert(*args) super(*args).tap do |returned_id| Rails.logger.debug("\033[1m\033[33mActiveRecordIDTracer\033[0m #{klass.name} insert returned ID #{returned_id}") end end end ActiveRecord::Relation.prepend ActiveRecordIDTracer

Change #5—Resetting database ID sequences

We came across a weird test failure once where the test would only fail if the 2 records we were testing had the same ID (yeah, weird…). The annoying part was the first time we’d run the test locally it would fail. But subsequent test runs would succeeed.

The first step to debugging any weird test failure was to try and get it to fail each time. One we realised what the problem was, we whipped up this little class that would reset all the ID sequences in PostgreSQL after each test run.

class IDSequenceResetter def reset(connection) @cached_sql ||= {} sql = @cached_sql[connection] ||= begin sequences = connection.execute(%{SELECT c.relname FROM pg_class c WHERE c.relkind = 'S'}).values.flatten "".tap do |sql| sequences.each do |name| next if not name.ends_with?("id_seq") sql << %{ALTER SEQUENCE #{connection.quote_table_name(name)} RESTART WITH 1;\n} end end end connection.execute sql end end RSpec.configure do |config| id_sequence_resetter = IDSequenceResetter.new config.before do |example| id_sequence_resetter.reset(ActiveRecord::Base.connection) id_sequence_resetter.reset(RecordWithOtherDatabase.connection) end end

Takeaway: Treat your test code just like production code

We change production code all the time, but tests end up rotting away without much improvement. Our light bulb moment was when we stopped treating our tests as this weird ball of code that couldn’t be shaped.

After we started treating test code like production code, debugging test failures got a whole lot easier (how cool is it downloading database test state to your dev machine!?)