Updating Rails: where's a memcache gone?

Can you imagine that updating a Rails version may become a real chase for performance? Keep reading to know the whole story.

The first try

Things started to go weird one morning when I have deployed my pull-request updating Rails version from 5.0 to 5.1 (tested both locally and on staging environment) to production. Everything went pretty well in terms of functionality until I discovered that the application works slower and slower as the request rate increases. I decided to take a break and rolled back to the previous release.

Here’s how the NewRelic’s application performance chart looked like:

NewRelic chart

As we can see, Ruby code execution time and Postgres requests time have grown dramatically while Memcached time has almost disappeared. I tried to reproduce the issue by running application stress tests, but without any luck: tests passed smoothly and with no such slowdowns.

The investigation

Later in the night, I deployed the updated version again and started to investigate the application’s performance charts.

The CPU chart shows a significant increase in CPU utilization:

Rails CPU chart

But the traffic chart is much more interesting. It shows 3 times increase in inbound application traffic!

Rails Inbound Traffic chart

It couldn’t come from web site visitors, so we should find its source somewhere inside our infrastructure. Let’s take a look at PostgreSQL’s charts.

PostgreSQL CPU chart PostgreSQL Traffic chart PostgreSQL Queries chart

CPU consumption, outbound traffic and queries count, all have increased on the PostgreSQL server. Great! We’ve found the source of extra traffic, but it’s strange. Where did these extra queries come from?

Having no answers, I kept looking through the charts. And something interesting came up at the memcached traffic chart:

Memcached traffic chart Memcached new connections

As we can see there, memcached traffic has crumbled while new connections count has significantly increased. This means the application is experiencing lots of connection problems and performs reconnects.

The time has come to set the debug application log level up and dive into logs.

Here’s what stands out first:

down_retry_delay not reached for memcached:11211 (10.528 seconds left)
DalliError: No server available
Cache write: hotels/beach_type_name_cache/0/24 ({:expires_in=>2 days})
down_retry_delay not reached for memcached:11211 (10.528 seconds left)
DalliError: No server available
Cache read: all_grouped_hotel_features ({:expires_in=>1 hour})
down_retry_delay not reached for memcached:11211 (10.521 seconds left)
DalliError: No server available
Cache read: gon_depart_cities
down_retry_delay not reached for memcached:11211 (55.690 seconds left)
DalliError: No server available

After doing some research of Dalli gem’s internal I’ve discovered that the message down_retry_delay not reached means Dally’s decided, the memcached server was down and excluded it from the list of available servers temporarily.

Looking further I’ve discovered the error which forced Dalli to do so:

Unexpected exception during Dalli request: NameError: uninitialized constant ActiveRecord::PredicateBuilder::ClassHandler
Did you mean?  ActiveRecord::PredicateBuilder::BaseHandler
/app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/inflector/methods.rb:271:in `const_get'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/inflector/methods.rb:271:in `block in constantize'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/inflector/methods.rb:267:in `each'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/inflector/methods.rb:267:in `inject'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/inflector/methods.rb:267:in `constantize'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/core_ext/string/inflections.rb:66:in `constantize'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/core_ext/marshal.rb:8:in `rescue in load'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/core_ext/marshal.rb:3:in `load'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/dalli/server.rb:442:in `deserialize'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/dalli/server.rb:513:in `generic_response'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/dalli/server.rb:272:in `get'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/dalli/server.rb:70:in `request'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/dalli/options.rb:19:in `block in request'
       /usr/local/rvm/rubies/ruby-2.5.3/lib/ruby/2.5.0/monitor.rb:226:in `mon_synchronize'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/dalli/options.rb:18:in `request'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/dalli/client.rb:368:in `perform'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/dalli/client.rb:59:in `get'
       /app/vendor/bundle/ruby/2.5.0/gems/newrelic_rpm-6.5.0.357/lib/new_relic/agent/instrumentation/memcache.rb:49:in `block (3 levels) in instrument_methods'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:306:in `block in read_entry'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/dalli/client.rb:266:in `with'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:87:in `with'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:306:in `read_entry'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/cache/strategy/local_cache.rb:111:in `block in read_entry'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/cache/strategy/local_cache.rb:65:in `block in fetch_entry'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/cache/strategy/local_cache.rb:65:in `fetch'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/cache/strategy/local_cache.rb:65:in `fetch_entry'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/cache/strategy/local_cache.rb:111:in `read_entry'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:420:in `read_entry'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:107:in `block in fetch'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:389:in `block in instrument_with_log'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:398:in `block in instrument'
       /app/vendor/bundle/ruby/2.5.0/gems/activesupport-5.1.6.2/lib/active_support/notifications.rb:168:in `instrument'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:397:in `instrument'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:389:in `instrument_with_log'
       /app/vendor/bundle/ruby/2.5.0/gems/dalli-2.7.10/lib/active_support/cache/dalli_store.rb:106:in `fetch'
       /app/app/controllers/api/v1/searches/results_controller.rb:105:in `suggested_subregions'
        ...
memcached:11211 is down
#<Dalli::NetworkError: memcached:11211 is down: NameError uninitialized constant ActiveRecord::PredicateBuilder::ClassHandler
Did you mean?  ActiveRecord::PredicateBuilder::BaseHandler>
retrying request with new server
down_retry_delay not reached for memcached:11211 (60.000 seconds left)
DalliError: No server available

I can see two problems here:

  1. Where did the ActiveRecord::PredicateBuilder::ClassHandler come from?
  2. What does NameError have to do with retrying request with new server?

First problem: caching ActiveRecord::Relation

Application code causing the error is the following (simplified a bit):

def suggested_subregions
  if destination.is_a?(Region)
    Rails.cache.fetch [:suggested_subregions, destination.id], expires_in: 24.hours do
      destination.descendants
                 .where(excluded: false)
                 .order_by_popularity
                 .limit(3)
      end
  else
    []
  end
end

Taking into account the Region is an ActiveRecord model class, you can easily discover, that we’re caching an ActiveRecord::Relation object here. It is a bad idea to serialize and deserialize such a complex objects, and now we can see a good reason why.

Due to some changes in the internal structure of ActiveRecord::Relation class between Rails version 5.0 and 5.1, the object serialized in Rails 5.0 can’t be deserialized in Rails 5.1.

Thus, the fast solution would be to clear all cached suggested_subregions keys. A more thoughtful fix would include changing the code to cache only plain Ruby objects and basic classes.

Second problem: NameError is not a server’s failure

The second problem lies deep in the Dalli gem when it is working in an environment with ActiveSupport extensions loaded.

As we have already discovered, dalli had run into NameError while deserializing object, and mistakenly treated this error as memcached server error.

Reproducing the error

I’ve made a simple code snippet to reproduce the error.

First, let’s define a simple test class:

require 'dalli'
require 'active_support'
require 'active_support/core_ext'

module M
  class C
    def initialize(v)
      @v = v
    end

    def to_s
      @v.to_s
    end
  end
end

Next, instantiate an object and save it to memcache:

[5] pry(main)> o = M::C.new('test')
=> #<M::C:0x00005591884f3b38 @v="test">
[6] pry(main)> dalli = Dalli::Client.new
=> #<Dalli::Client:0x000055918852aed0 @options={}, @ring=nil, @servers=["127.0.0.1:11211"]>
[7] pry(main)> dalli.set :my_object, o
=> 12397565349220843520
[8] pry(main)> dalli.get :my_object
=> #<M::C:0x00005591885b28f8 @v="test">

Things look good so far. But let’s do the trick: remove M::C class and try again.

[9] pry(main)> M.instance_eval { remove_const :C }
=> M::C
[10] pry(main)> dalli.get :my_object
E, [2019-07-26T09:44:26.341917 #27755] ERROR -- : Unexpected exception during Dalli request: NameError: uninitialized constant M::C
E, [2019-07-26T09:44:26.342009 #27755] ERROR -- : /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/activesupport-5.2.3/lib/active_support/inflector/methods.rb:285:in `const_get'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/activesupport-5.2.3/lib/active_support/inflector/methods.rb:285:in `block in constantize'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/activesupport-5.2.3/lib/active_support/inflector/methods.rb:281:in `each'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/activesupport-5.2.3/lib/active_support/inflector/methods.rb:281:in `inject'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/activesupport-5.2.3/lib/active_support/inflector/methods.rb:281:in `constantize'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/activesupport-5.2.3/lib/active_support/core_ext/string/inflections.rb:68:in `constantize'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/activesupport-5.2.3/lib/active_support/core_ext/marshal.rb:10:in `rescue in load'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/activesupport-5.2.3/lib/active_support/core_ext/marshal.rb:5:in `load'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/dalli-2.7.10/lib/dalli/server.rb:442:in `deserialize'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/dalli-2.7.10/lib/dalli/server.rb:513:in `generic_response'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/dalli-2.7.10/lib/dalli/server.rb:272:in `get'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/dalli-2.7.10/lib/dalli/server.rb:70:in `request'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/dalli-2.7.10/lib/dalli/options.rb:19:in `block in request'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/2.5.0/monitor.rb:226:in `mon_synchronize'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/dalli-2.7.10/lib/dalli/options.rb:18:in `request'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/dalli-2.7.10/lib/dalli/client.rb:368:in `perform'
    /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/dalli-2.7.10/lib/dalli/client.rb:59:in `get'
    (pry):20:in `__pry__'
    ...
W, [2019-07-26T09:44:26.342183 #27755]  WARN -- : 127.0.0.1:11211 is down
Dalli::RingError: No server available

We got it! Unexpected exception NameError led us to WARN -- : 127.0.0.1:11211 is down. And the one to blame is ActiveSupport’s internal call for const_get.

Redemption

The solution to this problem is simple: we should handle this kind on NameError as it’s already done to ArgumentError in Dalli::Server#deserialize.

After applying the patch I’ve proposed in my pull-request running the same test snippet ends up like this:

[10] pry(main)> dalli.get :my_object
Dalli::UnmarshalError: Unable to unmarshal value: uninitialized constant M::C
from /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/dalli-2.7.10/lib/dalli/server.rb:452:in `rescue in deserialize'
Caused by NameError: uninitialized constant M::C
from /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/activesupport-5.2.3/lib/active_support/inflector/methods.rb:285:in `const_get'
Caused by ArgumentError: undefined class/module M::C
from /home/mike/.rbenv/versions/2.5.3/lib/ruby/gems/2.5.0/gems/activesupport-5.2.3/lib/active_support/core_ext/marshal.rb:6:in `load'

Conclusion

As you can see now, updating Rails version may bring you a lot of challenging yet fascinating problems. What started as an application performance degradation then evolved into catching lurky bugs caused by usage of disputable caching technics.

Follow the guidelines and may the force be with you!

Useful links

Comment

Has something on your mind? Tweet it!
Photo by freestocks.org