Symbol GC in Ruby 2.2

Share this article

There is a Japanese translation of this post here!
Fotolia_62448068_Subscription_Monthly_M

What is symbol GC and why should you care? Ruby 2.2 was just released and, in addition to incremental GC, one of the other big features is Symbol GC. If you’ve been around the Ruby landscape, you’ve heard the term “symbol DoS”. A symbol denial of service attack occurs when a system creates so many symbols that it runs out of memory. This is because, prior to Ruby 2.2, symbols lived forever. For example in Ruby 2.1:

# Ruby 2.1

before = Symbol.all_symbols.size
100_000.times do |i|
  "sym#{i}".to_sym
end
GC.start
after = Symbol.all_symbols.size
puts after - before
# => 100001

Here we create 100,000 symbols and they’re still around, even though we’ve run GC and no variables reference those objects. This could easily be a problem if you wrote some code that accepted a user parameter and calls to_sym on it:

def show
  step = params[:step].to_sym
end

In this case, someone could make many requests to example.com/step= and, since your application never clears out symbols, your program would eventually run out of memory and crash. This may sound like a fabricated example, but it was similar to code I actually had committed in my Wicked gem (don’t worry, it’s fixed now). It’s not an isolated case either:

The list goes on and on, but you get the point. Creating symbols from user input is dangerous; only if symbols aren’t garbage collected, which is what is happening prior to Ruby 2.2.

Symbol GC in Ruby 2.2

Starting with Ruby 2.2 symbols are now garbage collected.

# Ruby 2.2
before = Symbol.all_symbols.size
100_000.times do |i|
  "sym#{i}".to_sym
end
GC.start
after = Symbol.all_symbols.size
puts after - before
# => 1

Since the symbols we create aren’t referenced by any other object or variable, they can be safely collected. This helps in preventing us from accidentally creating a scenario where a program creates and retains so many objects that it crashes. However, Ruby doesn’t garbage collect ALL symbols.

WAT?

#not_all_symbols

Previous to Ruby 2.2, we couldn’t collect symbols because they were used internally by the Ruby interpreter. Basically, each symbol has a unique object ID. For example :foo.object_id always needed to be the same value for the duration of the program execution. This is due to the way rb_intern works.

In C-Ruby, when you create a method it stores a unique ID to a method table.

Slide from Nari’s talk on Symbol GC

Later, when you call the method, Ruby will look up the symbol of the method name, and then get the ID of that symbol. The ID of the symbol is used to point at the static memory of the function in C. The function in C is then called and that’s how Ruby executes methods.

If we garbage collected a symbol and that symbol was used to reference a method, then that method is no longer callable. That would be bad.

To get around this problem Narihiro Nakamura introduced the idea of an “Immortal Symbol” in the C World and a “Mortal symbol” in the Ruby world.

Basically, all symbols created dynamically while Ruby is running (via to_sym, etc.) can be garbage collected because they are not being used behind the scenes inside the Ruby interpreter. However, symbols created as a result of creating a new method or symbols that are statically inside of code will not be garbage collected. For example :foo and def foo; end both will not be garbage collected, however "foo".to_sym would be eligible for garbage collection.

There are gotchas with this approach, it’s still possible to have a DoS if you’re accidentally creating methods based on user input.

define_method(params[:step].to_sym) do
  # ...
end

Because define_method calls rb_intern behind the scenes, even though we are passing in a dynamically defined (i.e. to_sym) symbol, it will be converted to an immortal symbol so it can be used for method lookup. Hopefully, you wouldn’t be doing that anyway, but it’s still good to point out dangerous bits in Ruby.

Variables also use symbols behind the scenes.

before = Symbol.all_symbols.size
eval %Q{
  BAR = nil
  FOO = nil
}
GC.start
after = Symbol.all_symbols.size
puts after - before
# => 2

Even though the variable is nil, it uses a symbol behind the scenes that will never get garbage collected. In addition to avoiding randomly defining methods based on user input, also watch out for creating variables based on user input:

self.instance_variable_set( "@step_#{ params[:step] }".to_sym, nil )

To be truly safe, you should periodically check Symbol.all_symbols.size after running GC.start to ensure that the symbol table isn’t growing. Moving into the future, hopefully some good standards around what is and isn’t safe to do with symbols becomes more general knowledge. If you find another really common gotcha, reach out to me on twitter and I’ll try to keep this section updated.

Thanks to @nari3 for reviewing this section and providing feedback. For more information about internals and implementation about this Read Nari’s slide’s or listen to the presentation at Ruby Kaigi.

I Feel the Need for Speed

In addition to security, the biggest reason you should care about this feature is speed. There’s a ton of code written around turning symbols into strings to avoid accidentally allocating symbols from user input. Generally when you put the words “ton” and “code” together, the results aren’t fast.

The most common example of avoiding Symbol allocations is Rail’s (ActiveSupport’s) HashWithIndifferentAccess. Since I wrote about subclasses of Hash like Hashie being slow, you may not be surprised to find that this behavior in Rails comes with a huge performance penalty.

require 'benchmark/ips'

require 'active_support'
require 'active_support/hash_with_indifferent_access'

hash = { foo: "bar" }
INDIFFERENT = HashWithIndifferentAccess.new(hash)
REGULAR     = hash

Benchmark.ips do |x|
  x.report("indifferent-string") { INDIFFERENT["foo"] }
  x.report("indifferent-symbol") { INDIFFERENT[:foo] }
  x.report("regular-symbol")     { REGULAR[:foo] }
end

When we run this:

Calculating -------------------------------------
  indifferent-string   115.962k i/100ms
  indifferent-symbol    82.702k i/100ms
      regular-symbol   150.856k i/100ms
-------------------------------------------------
  indifferent-string      4.144M (± 4.4%) i/s -     20.757M
  indifferent-symbol      1.578M (± 3.7%) i/s -      7.939M
      regular-symbol      8.685M (± 2.4%) i/s -     43.447M

We see that indifferent access hash with a string is about half the speed of a regular hash with symbol keys. We also see that using a symbol to access the value in an indifferent access hash is a whopping 5 times slower than using a regular hash with symbol keys. I wrote about how string key performance in Ruby 2.2 is getting a big improvement, however, accessing a hash with a symbol is still the fastest and, some might argue, the most aesthetically pleasing way to access a hash. Now with Ruby 2.2, we could use symbol keys in parameters in Rails. If we made that switch, we don’t have to worry about security, and we wouldn’t have to incur the overhead of the HashWithIndifferentAccess tax.

Note: You should do benchmarking at the application level before making any big performance changes, especially whenever it requires an API deprecation. Don’t ever submit a performance patch with the justification that “some blog said it was faster” even if that blog is mine. Always verify claims with a case by case benchmark.

Recap

Symbol GC saves your butt from DoS attacks and allows you the flexibility of using symbols wherever you want. Coupled with Ruby’s 2.2’s host of other performance features, including incremental GC and string de-duplication in with Hash keys, there’s no reason not to upgrade right away. Install locally:

$ ruby-install ruby 2.2.0

Run in production (if you’re using Heroku):

$ echo "ruby '2.2.0'" >> Gemfile

Don’t wait, the future of Ruby is now!

@schneems writes on Ruby, performance, and symbols, follow him for all that Jazz.

Frequently Asked Questions (FAQs) about Ruby 2.2 Symbol GC

What is the significance of Symbol GC in Ruby 2.2?

Symbol GC, introduced in Ruby 2.2, is a significant feature that helps in garbage collection of symbols. Prior to Ruby 2.2, symbols were not garbage collected, which could lead to memory leaks and potential system crashes in extreme cases. With the introduction of Symbol GC, symbols can now be garbage collected, thus improving memory management and overall system performance.

How does Symbol GC work in Ruby 2.2?

In Ruby 2.2, symbols are garbage collected in the same way as other Ruby objects. When a symbol is no longer referenced by any other object, it becomes eligible for garbage collection. The Ruby interpreter’s garbage collector will then free up the memory occupied by the symbol, thus preventing memory leaks.

What is the difference between symbols and strings in Ruby?

Symbols and strings in Ruby are both used to represent text. However, they are used in different contexts and have different behaviors. A symbol is a unique and immutable object that is used as an identifier or a name. On the other hand, a string is a mutable object that is used to manipulate text. Unlike symbols, strings can be modified after they are created.

How can I convert a string to a symbol in Ruby?

You can convert a string to a symbol in Ruby using the to_sym method. Here is an example:

string = "hello"
symbol = string.to_sym

In this example, the string “hello” is converted to the symbol :hello.

How can I convert a symbol to a string in Ruby?

You can convert a symbol to a string in Ruby using the to_s method. Here is an example:

symbol = :hello
string = symbol.to_s

In this example, the symbol :hello is converted to the string “hello”.

Can I use symbols as keys in a hash in Ruby?

Yes, you can use symbols as keys in a hash in Ruby. In fact, using symbols as keys is a common practice in Ruby because it is more efficient than using strings. Here is an example:

hash = {:key => "value"}

In this example, :key is a symbol used as a key in the hash.

What are the benefits of using symbols in Ruby?

Symbols in Ruby are beneficial for several reasons. First, they are immutable, which means they cannot be changed once they are created. This makes them ideal for use as identifiers or names. Second, they are unique, which means that every occurrence of a symbol represents the same object. This makes them more efficient than strings in certain contexts. Finally, with the introduction of Symbol GC in Ruby 2.2, symbols can now be garbage collected, which improves memory management.

How can I check if a symbol exists in Ruby?

You can check if a symbol exists in Ruby using the Symbol.all_symbols method. This method returns an array of all symbols currently in Ruby’s symbol table. You can then use the include? method to check if a specific symbol is in the array. Here is an example:

symbols = Symbol.all_symbols
exists = symbols.include?(:hello)

In this example, exists will be true if the symbol :hello exists, and false otherwise.

Can I create a symbol with spaces in Ruby?

Yes, you can create a symbol with spaces in Ruby by using quotes. Here is an example:

symbol = :"hello world"

In this example, :hello world is a valid symbol.

Can I use symbols in an array in Ruby?

Yes, you can use symbols in an array in Ruby. Here is an example:

array = [:hello, :world]

In this example, :hello and :world are symbols used in an array.

Richard SchneemanRichard Schneeman
View Author

Ruby developer for Heroku. Climbs rocks in Austin & teaches Rails classes at the University of Texas. You can see more of Richard's work at http://schneems.com/

GlennG
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week