How to Reduce Memory Usage by Tuning Gemfile

Rails is known for many things but memory effectiveness is not one of them. By default, it loads all gems, used and unused, which contributes to the overall memory footprint. Fortunately, we can easily eliminate this waste without touching the app.

image_magick is a perfect example of this problem. If we manipulate images only in workers then there’s no reason to require it on web servers. Conversely, there’s no need to require web-related stuff on workers.

Let’s start by understanding how Rails apps manage their dependencies.

Bundler in Action

Rails manages dependencies with Bundler. Its responsibilities are:

  1. Resolving gem specifications to concrete gem versions.
  2. Installing the gems.
  3. Requiring the gems during the boot process.

We’re all familiar with steps 1 and 2 but not necessarily 3. We’ll focus on the last step as that’s where extraneous gems get loaded.

The idea is to split gems into groups like web and worker with shared gems added to the default group (no explicit group required) and making Rails require the right group depending on where it’s run. A complex app may need more groups, especially if there’s more than one type of worker, but to keep things simple we’ll just assume web and worker.

Let’s do some wishful thinking. We’d like to make the following Gemfile do the trick:

gem 'rails' # Used on both web and worker servers.
gem 'sidekiq' # Same here.

gem 'pundit', group: :web # Web-only gem.
gem 'image_magick', group: :worker # Worker-only gem.

Obviously, Rails is unaware of our specialized groups so it won’t work. In order to figure out the best implementation, we’ll turn our attention to the boot process.

How Rails Applications Boot

At a high-level, the Rails boot process looks like this:

  1. The boot process is initiated by config.ru
  2. which requires config/environment.rb
  3. which requires config/application.rb
  4. which requires config/boot.rb
  5. which requires bundler/setup and then …
  6. config/application.rb calls Bundler.require(*Rails.groups).

We’re interested in the last two steps. Requiring bundler/setup in step 5 adds all gems from Gemfile to $LOAD_PATH without requiring them yet so it doesn’t increase memory use.

After setting up $LOAD_PATH, dependencies are required in config/application.rb with a single line of code:

Bundler.require(*Rails.groups)

What Bundler.require does is self-evident – it requires gems in the specified groups.

Rails.groups is more interesting. It returns an array of groups to load. Normally, they’re production and default. However, the array depends on three factors:

  1. RAILS_ENV
  2. RAILS_GROUPS which is a comma-separated list of extra groups to include
  3. A hash passed as an argument. It maps group names to environments (as defined by RAILS_ENV) in which these groups should be included. For example, { :frontend => [:web, :legacy_web] } means frontend should be required when RAILS_ENV is either web or legacy_web

RAILS_GROUPS sounds like exactly what we need!

Teaching Rails New Groups

You may think it’s enough to set RAILS_GROUPS in production and call it a day but we also need to make it work in development, test and CI. Let’s take a look at each of these environment in turn.

Development Environment

The right setting depends on the project but for maximum convenience, developers may set RAILS_GROUPS to web,worker. This would keep the default Rails behavior of loading everything and side-step questions about the correct worker configuration. Persisting this setting is a matter of adding it to .rbenv-vars or a similar file.

Let’s discuss two risks before moving on to the next environment:

  1. Developers may get confused when RAILS_GROUPS is missing or incorrect.
  2. If we add or remove a group then all developers will need to update or they’ll run into the problem above.

Are these risks serious? It’s up to you to decide. If you’re concerned then the following snippet (to be used in config/application.rb) may be a good trade-off between safety and complexity:

DEVELOPMENT_RAILS_GROUPS = 'web,worker'

if ENV['RAILS_GROUPS'].blank?
  ENV['RAILS_GROUPS'] = DEVELOPMENT_RAILS_GROUPS
  warn "RAILS_GROUPS is unset; defaulting to #{DEVELOPMENT_RAILS_GROUPS}"
elsif ENV['RAILS_GROUPS'] != DEVELOPMENT_RAILS_GROUPS
  warn "RAILS_GROUPS is set to #{ENV['RAILS_GROUPS']} instead of #{DEVELOPMENT_RAILS_GROUPS}"
end

Bundler.require(*Rails.groups)

In addition to explicitly informing the developer which groups are loaded it also makes production work when RAILS_GROUPS is missing.

Test Environment

All test files are usually run within one process which means we need web and worker to make all dependencies available.

We need to keep in mind the following risk: if we add a gem to the wrong group then the tests will pass but production will break. For example, if we add image_magick to web instead of worker then the test suite will pass because it loads both groups. However, production workers are configured to only load the worker group so image_magick won’t be available there.

We can eliminate this risk in several ways but the most convenient one is detecting it on the continuous integration server. We don’t add new gems frequently enough to push this burden to developers.

Continuous Integration

As discussed in the section above, we need to split test runs across the groups. Specifically, instead of:

RAILS_ENV=test bundle exec rails test

we should be running:

RAILS_ENV=test RAILS_GROUPS=web bundle exec rails test --exclude test/jobs
RAILS_ENV=test RAILS_GROUPS=worker bundle exec rails test test/jobs

In general, each specialized gem group should have a separate test run. This will ensure our code will actually work in production.

Production

Last but not least, we need to set RAILS_GROUPS in production or we won’t see any memory usage reductions. In order to prevent misconfiguration we may add a modified version of the snippet from the previous section:

DEVELOPMENT_RAILS_GROUPS = 'web,worker'

if ENV['RAILS_GROUPS'].blank?
  ENV['RAILS_GROUPS'] = DEVELOPMENT_RAILS_GROUPS
  warn "RAILS_GROUPS is unset; defaulting to #{DEVELOPMENT_RAILS_GROUPS}"
elsif !Rails.env.production? && ENV['RAILS_GROUPS'] != DEVELOPMENT_RAILS_GROUPS
  # We don't emit this warning in production as it's expected to see RAILS_GROUPS
  # set to a different value than the one for development.
  warn "RAILS_GROUPS is set to #{ENV['RAILS_GROUPS']} instead of #{DEVELOPMENT_RAILS_GROUPS}"
end

Bundler.require(*Rails.groups)

Next Steps

These are all the boot process modifications we need to make. We’re ready to split gems into groups. Obviously, this is project specific but here are a few rules of thumb:

  • API clients are frequently used by workers, as it’s an anti-pattern to make third-party API calls during the request-response cycle, so they belong to the worker group.
  • Frontend tooling is likely unused on workers and can be safely put in the web group.
  • Processing libraries that require lots of CPU are another candidate for the worker group.

Summary

The default Rails dependency management can easily lead to large memory footprint because it loads all gems even if they are unused. Splitting them into web- and worker-related groups and enhancing the boot process is a simple countermeasure that can be applied to any Rails project.

If you liked the article and would like to get early-access to other deep-dive articles covering Ruby, Rails, and PostgreSQL then leave your email below.

Leave your email to receive updates about articles.