Unraveling String Key Performance in Ruby 2.2

Share this article

Hanging Keys (XXXL)

Everyone wants their programs to be faster and to take up less memory. However, it’s not often that this is achieved without having to modify source code. This post will introduce optimizations added in Ruby 2.2.0 when working with a Hash and string keys. To understand the optimization, you need to know current behavior.

In Ruby 2.1 and prior, if you write something like this:

hash["access_string"]

Ruby will allocate the string "access_string" in order to look for an entry with a key by that name. Similarly, Ruby has to allocate a string every time you create a hash literal:

hash = { "foo" => @variable }

Here, we’re allocating a hash and the "foo" string every time this code is called. This might not seem like a big deal, but if you’re Rack and you create a hash with the same string keys and access them with the same string keys on EVERY request, this can add up, creating a ton of junk objects.

For example, "PATH_INFO" is accessed 36 times across 15 files in the Rack project, like so:

path = env["PATH_INFO"]

Doing this on every request, hundreds of one off strings are created that do not need to be allocated. This slows down programs and uses more memory.

Strings vs Symbols

It is possible to get around the string allocation in a hash by using symbols. With a simple benchmark, we can see that this type of hash access is much faster:

require 'benchmark/ips'

STRING_HASH = { "foo" => "bar" }
SYMBOL_HASH = { :foo => "bar"  }

Benchmark.ips do |x|
  x.report("string") { STRING_HASH["foo"] }
  x.report("symbol") { SYMBOL_HASH[:foo]  }
end

Results:

Calculating -------------------------------------
              string   129.746k i/100ms
              symbol   152.476k i/100ms
-------------------------------------------------
              string      4.619M (± 5.0%) i/s -     23.095M
              symbol      8.587M (± 5.4%) i/s -     42.846M

Using a symbol is about twice as fast as a string, because we don’t have to do the allocation. So, why do we ever use string keys? Well, symbols are garbage collected in Ruby 2.2, but not before. Therefore, if you’re creating a hash based on user input, it’s possible that you’re opening up your app to a symbol denial of service attack.

Ruby 2.2 added symbol GC, so the security concern is diminished, but calling to_sym on an existing string takes longer if you’ve already allocated a string.

String Pools for Fun and Performance

Rack uses a ton of string keys, and while we can’t simply change the API to use symbols, we can squeeze out some improved performance. By telling Ruby not to allocate a new string but, instead, use one if it already exists with String freeze optimization, we make some gains.

We can optimize the string-based hash lookups and creations inside of Rack by adding freeze:

path = env["PATH_INFO".freeze]

Ruby understands that String#freeze means that the string will not be modified. While it looks like a method, this is an optimization done at compile time.

require 'benchmark/ips'
HASH = { "PATH_INFO" => "bar" }

Benchmark.ips do |x|
  x.report("freeze") { HASH["PATH_INFO".freeze] }
  x.report("normal") { HASH["PATH_INFO"] }
end

Result:

Calculating -------------------------------------
              freeze   144.130k i/100ms
              normal   127.572k i/100ms
-------------------------------------------------
              freeze      7.326M (± 3.2%) i/s -     36.609M
              normal      4.566M (± 4.1%) i/s -     22.835M

Using the String#freeze method is MUCH faster. I used a similar technique to give Rack a 2~4% speed bump and reduce the amount of memory being used.


tweet

As a bonus, we also decrease the required running memory of Rack.

Ruby 2.2 String Hash Performance Boost

While that was a pretty easy optimization and resulted in a pretty good perf boost, it’s not very Ruby-esque. Why can’t Ruby do this for us? Well…it can.

Introduced in patch r43870, Ruby master (i.e. Ruby 2.2.0+) now automatically does this performance improvement for you. I was told about this optimization via Eric Wong in the Rack Devel group.

Previously, we saw that the comparison of string versus symbol access heavily favor the symbol:

# Ruby 2.1.5

Calculating -------------------------------------
              string   129.746k i/100ms
              symbol   152.476k i/100ms
-------------------------------------------------
              string      4.619M (± 5.0%) i/s -     23.095M
              symbol      8.587M (± 5.4%) i/s -     42.846M

Running this same benchmark in Ruby 2.2:

# Ruby 2.2.0

Calculating -------------------------------------
              string    141371 i/100ms
              symbol    143241 i/100ms
-------------------------------------------------
              string  7475494.0 (±7.6%) i/s -   37039202 in   5.001749s
              symbol  8128373.4 (±10.6%) i/s -   40107480 in   5.011651s

Still, using symbol is faster than string, but we’re now much closer. Instead of being half the speed, the string access is in the same ballpark. Which is amazing!

String De-duplication

While researching these optimizations, I found it interesting that only recently has Java implemented such a string de-duplication feature as this hash key optimization. The Ruby VM does not de-duplicate every string, it focuses only on the hotspot caused by hashes. With string keys, the ability to do this will improve the performance of most large Ruby applications.

It’s worth noting as well that, while Ruby does this by default, if you absolutely need the fastest possible code, it is still faster to manually use the freeze method. This is similar to Java 8, where they recommend that you manually intern strings (similar to freeze). You can see some benchmarks here of different speeds of access patterns in Ruby 2.2.0 .

The thing I love about this feature is it allows us to write Ruby code and focus on intent. The VM can see when it can optimize this case and make our programs faster. We don’t need to do anything to take advantage of this performance boost in our programs.

Just one more reason Ruby makes me happy, free performance.


If you care about performance, strings, Ruby, or benchmarking, follow @schneems on twitter

Richard SchneemanRichard Schneeman
View Author

Ruby developer for Heroku. Climbs rocks in Austin & teaches Rails classes at the University of Texas. You can see more of Richard's work at http://schneems.com/

GlennG
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week