Profiling Ruby’s Memory Allocation with TCmalloc

How does memory allocation work in Ruby?

Ruby gets memory in chunks, called pages, new objects are saved here.

Then…

When these pages are full, more memory is needed.

Ruby requests more memory from the operating system with the malloc function.

This malloc function is part of the operating system itself, but there are alternative implementations you can use.

One of those implementations is Google’s tcmalloc.

TCmalloc is part of the Google Performance Tools suite.

You can use these tools to explore exactly how Ruby allocates memory.

And thanks to the LD_PRELOAD environment variable (in Linux) we can replace your system’s malloc function with tcmalloc.

Like this:

LD_PRELOAD="/usr/lib/libtcmalloc.so" ruby -e "puts 123"

But that only loads the library, it doesn’t enable any data collection yet.

Let’s see how that’s done.

Enabling The Profiler

You can enable tcmalloc’s profiler with an additional environment variable (HEAPPROFILE).

LD_PRELOAD="/usr/lib/libtcmalloc.so" HEAPPROFILE=/tmp/profile ruby -e "puts 123"

This will produce the following output:

Starting tracking the heap
123
Dumping heap profile to /tmp/profile.0001.heap (Exiting, 2 MB in use)

Here you see a confirmation that the profiler has been enabled.

Then we see:

  • The program’s output
  • The filename (profile.0001.heap)
  • The amount of memory used by our program (2 MB)

To read this file you will need another tool included with tcmalloc.

pprof --text `which ruby` /tmp/profile.0001.heap | head -10

This will produce the following output:

Total: 2.4 MB
   1.1  44.7%  44.7%      1.1  44.7% 0x00005570fa4df074
   0.7  27.8%  72.5%      0.7  27.8% 0x00005570fa4e0c09
   0.4  15.3%  87.8%      0.4  15.3% 0x00005570fa4db460
   0.1   5.9%  93.7%      0.1   5.9% 0x00005570fa4df19f
   0.1   3.2%  96.9%      0.1   3.2% 0x00005570fa6349a0
   0.0   1.4%  98.3%      0.0   1.4% 0x00005570fa589924
   0.0   0.3%  98.6%      0.0   0.3% 0x00005570fa59c4f2
   0.0   0.3%  98.8%      0.0   0.3% 0x00005570fa4db48a
   0.0   0.2%  99.0%      0.0   0.2% 0x00005570fa4dbaa5
   0.0   0.2%  99.1%      0.0   0.2% _dl_new_object

Well, that’s just a bunch of memory addresses! You need a version of Ruby with debugging symbols to be able to see the function names.

Then you will get this output:

Using local file ruby.
Using local file /tmp/profile.0001.heap.
Total: 2.9 MB
   1.0  36.2%  36.2%      1.0  36.2% objspace_xmalloc0
   0.7  26.1%  62.4%      0.7  26.1% aligned_malloc
   0.5  18.8%  81.1%      0.5  18.8% objspace_xcalloc
   0.3   9.9%  91.0%      0.3   9.9% stack_chunk_alloc
   0.1   3.7%  94.7%      0.1   3.7% objspace_xrealloc
   0.1   2.7%  97.4%      0.1   2.7% Init_Method
   0.0   1.3%  98.7%      0.0   1.7% onig_new_with_source
   0.0   0.4%  99.2%      0.8  26.6% heap_page_allocate
   0.0   0.2%  99.4%      0.0   0.2% add_bitset

What you see here is how much memory was allocated by which MRI’s functions.

It’s interesting to know that aligned_malloc is the function used to allocate new pages for Ruby objects, stack_chunk_alloc is used by the GC itself during the marking phase, and objspace_xmalloc0 / objspace_xcalloc allocate space for strings, arrays & any other data that doesn’t fit in a RVALUE struct.

Now:

TCmalloc knows nothing about Ruby objects, the only thing it does is track calls to malloc, calloc & realloc to find out how much memory is requested.

If you want to get a heap dump at the Ruby level you can use ObjectSpace.dump_all. This gives you a JSON file with all the live objects in your application & their memory size.

But what tcmalloc can show you is a visualization of all the C functions that end up requesting memory.

pprof --web `which ruby` /tmp/profile.0001.heap

This will open Chrome or Firefox with an SVG file that looks like this:

ruby heap profile

Not only does TCmalloc give you this nice profiling capability, but using it you can also increase your application’s performance by 4-9%! You could also try jemalloc, which is another malloc implementation that also includes a profiler.

Summary

You have learned how to use gperftools (Google Performance Tools) to visualize & analyze the memory usage of the Ruby interpreter.

Thanks for reading! 🙂