Tenderlove Making

Object ID in MRI

Objects in Ruby are 40 bytes. Objects are allocated in to pages (or arenas) that are 2^14 (2 to the 14th power) bytes. Pages are allocated with an aligned malloc where the divisor is the size of a page. The first object is allocated at the first address inside the page that is divisible by 40 (or page_start + (40 - page_start % 40) % 40). This means all Ruby object addresses are divisible by 40, and that some pages hold one more object than others.

page layout

The code to find the “first object inside the page” can be found here. If the page address isn’t divisible by 40, this calculates the first offset inside the page where an object can be allocated. Objects are allocated at 40 byte offsets in order to support tagged pointers.

Tagged pointers

Every number divisible by 40 when represented in binary always ends in three 0s:

irb(main):012:0> sprintf("%0#15b", 80)
=> "0b0000001010000"
irb(main):013:0> sprintf("%0#15b", 120)
=> "0b0000001111000"
irb(main):014:0> sprintf("%0#15b", 160)
=> "0b0000010100000"
irb(main):015:0> sprintf("%0#15b", 200)
=> "0b0000011001000"

MRI exploits this fact in order to represent some objects like integers without allocating anything. Pointers are just numbers, so if the number is not divisible by 40 (in other words doesn’t end in 000), then we know it is something special.

Integers are examples of tagged pointers. Integers are encoded by shifting the number left one, then setting the last bit to 1. So the number 1 will be encoded as 3 (or in binary 11), and the number 40 will be encoded as 81 (or 1010001 in binary). If the pointer we’re dealing with has a 1 in the last bit, we know it’s a Ruby integer, and we can decode it (convert to a C integer) by shifting right 1.

Just to reiterate, a Ruby integer 20 (0b10100), is encoded to 41 (0b101001) by shifting left, then adding 1.

So C integers are converted to Ruby integers by shifting left by one, then adding one. Ruby integers are converted to C integers by shifting right by one. This is one reason why Ruby can’t represent a full 64bit number as an “immediate” value. One bit is used for encoding (the other bit is used for the sign).

A diagram of the tagging scheme is here.

Calculating Object ID

“Non special objects” are objects that don’t use the 3 bits on the right side for any meaning. An example of a “non special object” would be Object.new.

Non special objects encode their object id as their address in memory + 1. The encoding code is here. Normally, to convert a C integer to a Ruby integer, the integer is shifted left, then add one. But the address of a non special Ruby object will always be divisible by 40, so we know that the last bit is 0. So this code simply changes the last bit from a 0 to a 1. Clobbering the last bit means that when Ruby side of the program see it, it will be the address of the object shifted right by one.

If object X is at memory location 40, then the object id (in Ruby) will be 20. The Ruby integer 20 (0b10100) is encoded by shifting left then adding one, which results in 41 (0b101001). Since this code simply takes the location (in this case 40) and adds one, the result is 41 (0b101001) which is the same as 20 on the Ruby side.

In other words, object_id returns the address of the object shifted right one and we can get the actual address of the object back by shifting left one.

Calling inspect shows the actual address as hex. We can see that shifting left one, then converting to hex will give us the same number:

irb(main):021:0> x = Object.new
=> #<Object:0x00000100827400>
irb(main):022:0> x.object_id.to_s(16)
=> "80413a00" # not what we want
irb(main):023:0> (x.object_id << 1).to_s(16)
=> "100827400" # this is it!

Calculating Page Number

We can use the address of an object to calculate the arena (or page) in to which the object was allocated. Pages are aligned at 2^14, so the page header will always be divisible by 2^14. The page size is never larger than 2^14, so any offset inside the page will not be evenly divisible by 2^14. Knowing this, we can remove the lower bits of the object by using a mask that is 14 bits wide:

irb(main):001:0> MASK = ~(0) << 14
=> -16384
irb(main):002:0> 10.times { p (Object.new.object_id << 1) & MASK }
4305141760
4305141760
4305141760
4305141760
4305141760
4305141760
4305141760
4305141760
4305141760
4305141760

Now we can group objects by what page they were allocated on.

The End.

« go back