Open Source Thursdays Expert Guest session with Aaron Patterson (tenderlove)

A class on Ruby Internals with TenderJIT with Aaron Patterson (tenderlove)

A class on Ruby Internals and TenderJIT. Aaron Patterson (tenderlove) joined us on an Open Source Thursdays Expert Guest session to teach you about compilers, interpreters, and how to contribute to TenderJIT.

Aaron Patterson is a Senior Staff Engineer working at Shopify where he is focusing on Ruby core and Rails core development.

He joined us on this Open Source Thursdays session to work on a Just-in-time (JIT) compiler for Ruby that’s written in Ruby. With the additional tenderlove we all love 💎

Get the slides for this session here 🖼️

In this post summary, you’ll learn how Ruby’s virtual machine works, CPUs, Execution code, and more. Then, we recommend watching the recording. In it, Aaron teaches us how to build a JIT for Ruby in Ruby.

Do you want to know how Tenderlove is always having fun while working on cool projects to keep his skills sharp? You can learn all about it on Get to Senior. You will also get exercises to keep your skills sharp as well!

Compilers and interpreters

Compilers and interpreters are very similar things. They are programs that turn your source code into other representations: a tree structure, bytecode, or machine code.

Whatever the compilers and interpreters output gets executed by the Ruby Virtual Machine (RVM).

Ruby is a compiled language as in ”Ruby is compiled into bytecode”, but it’s not compiled into machine code.

YJIT and TenderJIT

TenderJIT is a project to help you learn how JIT compilers work. Although it was created as a demo project, it’s still a serious Just in Time (JIT) compiler.

That means its performance should be similar enough to a more robust JIT compiler such as YJIT. The JIT would just work and you wouldn’t know it’s doing anything. Any time you want your program to go faster, you want to use a JIT compiler.

YJIT is JIT-friendly, so JIT makes Ruby programs faster. YJIT was merged in Ruby and it’s going to be an official part of the Ruby 3.1 release.

Code Execution: how do machines work?

There are two machine types: Stack-based and Register Machines.

Stack machine

Stack machines essentially manipulate a stack of code.

The machine pushes values to the stack or pop values from the stack, executes some instruction (like adding two values and pushing the result back to the stack).

The compiler converts the instructions into machine code. In the example below, the stack manipulation instructions execute that code:

Code Execution by a Stack Machine

Code Execution by a Stack Machine

YARV (Yet Another Ruby VM) is a stack-based machine powering Ruby. That’s how the Ruby Machine works.

Exercise time: run the following commands to check the instructions generated by a Ruby program:


$ cat thing.rb

5 + 3

$ ruby --dump=insns thing.rb

== disasm: #<ISeq:<main>@some_code.rb:1 (1,0)-(1,5)> (catch: FALSE)

0000 putobject                              5                         (   1)[Li]

0002 putobject                              3

0004 opt_plus                               <calldata!mid:+, argc:1, ARGS_SIMPLE>

0006 leave

“Control-frame pointer” (CFP)

A CFP is a data structure representing the current stack frame that is being executed. It contains information about the current frame.

CFP contains these fields:

  • iseq points to the instruction sequence (a function, a method, a block, or a proc)
  • PC (what operation are we running right now?)
  • SP stack pointer (what is on the stack?)

Program Counter (PC)

The machine code is an array or list of bytes. When the stack machine executes, it increments the PC and points to the instruction that will be executed next in the stack.

Stack Pointer (SP)

A SP points to a list that holds the values in the machine stack.

Register Machine (x86, ARM, etc)

A Register machine is another type of machine. Chips and processors are register machines, for example.

These machines have a bunch of ”registers”. They hold values in these registers, and ask the CPU to perform operations:

Code Execution by a Register Machine

Code Execution by a Register Machine

In this example, the register machine is performing an addition operation:

mov instruction means write some value to a named register.

mov r1, 5 is an instruction to write the value 5 into register r1.

Similarly, mov r2, 3 instructs the machine to write the value 3 into register r2.

Then, add r1, r2 will add two values together from registers r1 and r2, then overwrite the value in r1 with the result which is 8.

Contributing to TenderJIT

There are missing instructions you can implement and run the tests.

By doing that, you will learn more about how these instructions work, and you can also help YJIT in the future as the knowledge is transferable between both projects.

There’s no need to implement all the instructions, just the ones that are popular and frequently used.

Aaron’s goal is to get you working on TenderJIT so you learn how a JIT works, then get you to work on YJIT!

Resources

Here are some resources for you to learn more about compilers and interpreters:

Enjoy!