Ruby's Main Object Does What?

Last week, Mischa talked about debugging a problem with Rake tasks polluting the global namespace. This week, I'll talk about how that happens.

Spoiler: there will be a bunch of spelunking, first in Ruby and then in Ruby's C source code. We'll talk about how to find code in both places.

What Are We Looking For?

When we define methods on Ruby's top-level "main" object, they show up as instance methods on Object so they're callable absolutely everywhere. But "main" is mostly just a random instance of Object, and isn't part of the standard class hierarchies like, say, Kernel or Object. So why and how do those methods show up on Object?

First I checked what was going on in irb. You don't know what you're looking for if you can't write an example! Here's what I did:

2.5.0 :001 > def bobo; print "Bobo!"; end
=> :bobo
2.5.0 :002 > class NotBobo; def really; puts "I am pretty sure."; bobo; end
2.5.0 :003?>   end
=> :really
2.5.0 :004 > a = NotBobo.new
=> #<NotBobo:0x00007fc0fe031940>
2.5.0 :005 > a.really
I am pretty sure.
Bobo! => nil

That bit where it prints "Bobo!" instead of giving a no-method exception? That's the feature we're talking about.

(I hadn't actually known this happened. How'd I miss that? If you already knew that, congratulations! This is a good time to feel superior about it ;-)

This is a good example of a feature that can be mysterious if you don't already know it. So let's talk about how you'd track this feature down, by talking about how I recently did do that.

Searching in Ruby

Ruby is a language with great reflection methods. So first, we'll check where that behavior could be coming from in Ruby.

The obvious places to look for weird behavior in Ruby are the object in question ("main" in this case,) and Object and Kernel.

Let's see what "main" actually is in irb:

2.5.0 :001 > self
 => main
2.5.0 :002 > self.class
 => Object
2.5.0 :003 > self.singleton_class
 => #<Class:#<Object:0x00007f9ea78ba2e8>>
2.5.0 :005 > self.methods.sort - Object.methods
 => [:bindings, :cb, :chws, :conf, :context, :cws, :cwws, :exit, :fg,
     :help, :install_alias_method, :irb, :irb_bindings, :irb_cb,
     :irb_change_binding, :irb_change_workspace, :irb_chws,
     :irb_context, :irb_current_working_binding,
     :irb_current_working_workspace, :irb_cwb, :irb_cws, :irb_cwws,
     :irb_exit, :irb_fg, :irb_help, :irb_jobs, :irb_kill, :irb_load,
     :irb_pop_binding, :irb_pop_workspace, :irb_popb, :irb_popws,
     :irb_print_working_binding, :irb_print_working_workspace,
     :irb_push_binding, :irb_push_workspace, :irb_pushb, :irb_pushws,
     :irb_pwb, :irb_pwws, :irb_quit, :irb_require, :irb_source,
     :irb_workspaces, :jobs, :kill, :popb, :popws, :pushb, :pushws,
     :pwws, :quit, :source, :workspaces]

Huh. That feels like a lot of instance methods on the top-level object, and a lot of them irb-flavored. Maybe this is somehow an irb thing? Let's check in a separate Ruby source file with no irb loaded.

# not_in_irb.rb
def method_on_main
  print "Bobo!"
end

class NotBobo
  def some_method
    print "Yup.\n"
    method_on_main
  end
end

a = NotBobo.new
a.some_method

print "\n\n\n"

print (self.methods.sort - Object.methods).inspect

Feel free to run it. But when I do, I get an empty list for the methods. In other words, that whole list of methods on main that aren't on Object came from irb, not from plain Ruby. Which seems like there's nothing terribly special there. Hrm.

What about the singleton class (a.k.a. eigenclass) for main? Maybe it has something individually that works only on that one object? Again, I'll run this outside of irb.

dir noah.gibbs$ cat >> test2.rb
print self.singleton_class.instance_methods
dir noah.gibbs$ ruby test2.rb
[:inspect, :to_s, :instance_variable_set, :instance_variable_defined?,
 :remove_instance_variable, :instance_of?, :kind_of?, :is_a?, :tap,
 :instance_variable_get, :public_methods, :instance_variables, :method,
 :public_method, :define_singleton_method, :singleton_method,
 :public_send, :extend, :pp, :to_enum, :enum_for, :<=>, :===, :=~,
 :!~, :eql?, :respond_to?, :freeze, :object_id, :send, :display,
 :nil?, :hash, :class, :clone, :singleton_class, :itself, :dup,
 :taint, :yield_self, :untaint, :tainted?, :untrusted?, :untrust,
 :trust, :frozen?, :methods, :singleton_methods, :protected_methods,
 :private_methods, :!, :equal?, :instance_eval, :instance_exec, :==,
 :!=, :__id__, :__send__]

Ah. There's a lot there, and I'm not finding it terribly enlightening. Plus, it's really likely whatever method I care about is going to be defined in C, at least in CRuby. Let's bit the bullet and check Ruby's source code...

No Luck - Let's Make More Luck

Okay. That pretty much exhausts our Ruby options. Beyond this point, it's time to check in C. I'll use the latest (August 2018) Ruby code, because this is a long-term feature and hasn't changed any time recently -- sometimes you'll need a very specific version of the Ruby code, depending what you're looking for.

So: first we're looking for the "main" object. The word "main" is used in lots of places in Ruby, so that will be hard to track down. How else can we search?

Luckily, we know that if you print out that object, it says "main". Which means we should be able to find the string "main", quotes and all, in C. I'm going to use The Silver Searcher, a.k.a. "ag", for code search here - you can also use Ack, rgrep, or your favorite other tool.

We can ignore anything under "test", "spec", or "doc" here, so I won't list those out.

thread.c
4936:    rb_define_singleton_method(rb_cThread, "main", rb_thread_s_main, 0);

object.c
3692: *     String(self)        #=> "main"

vm.c
3177:    return rb_str_new2("main");

addr2line.c
782:    if (line->sname && strcmp("main", line->sname) == 0)

lib/rdoc/generator/template/darkfish/index.rhtml
15:<main role="main">

lib/rdoc/generator/template/darkfish/servlet_root.rhtml
16:<main role="main">

lib/rdoc/generator/template/darkfish/servlet_not_found.rhtml
13:<main role="main">

lib/rdoc/generator/template/darkfish/page.rhtml
15:<main role="main" aria-label="Page <%=h file.full_name%>">

lib/rdoc/generator/template/darkfish/table_of_contents.rhtml
2:<main role="main">

lib/rdoc/generator/template/darkfish/class.rhtml
19:<main role="main" aria-labelledby="<%=h klass.aref %>">

iseq.c
742:    const ID id_main = rb_intern("main");

(...)

Okay. The one in object.c is on a comment line. The ones in the rdoc generator are classes on markup elements -- not what we're looking for. The one in thread.c is for the name of the main thread. Let's investigate addr2line.c, iseq.c and vm.c as the promising choices.

In addr2line.c it's messing with printed text in dumping a backtrace, not defining methods on the top-level object:

void
rb_dump_backtrace_with_lines(int num_traces, void **traces)
{
  /* ... */
    /* output */
    for (i = 0; i < num_traces; i++) {
        line_info_t *line = &lines[i];
        uintptr_t addr = (uintptr_t)traces[i];
        /* ... */
        /* FreeBSD's backtrace may show _start and so on */
        if (line->sname && strcmp("main", line->sname) == 0)
            break;

(By the way - in C, anything surrounded by the slash-star star-slash things are comments.)

In iseq.c, it's naming the types of instruction sequences -- not what we want:

static enum iseq_type
iseq_type_from_sym(VALUE type)
{
    const ID id_top = rb_intern("top");
    const ID id_method = rb_intern("method");
    const ID id_block = rb_intern("block");
    const ID id_class = rb_intern("class");
    const ID id_rescue = rb_intern("rescue");
    const ID id_ensure = rb_intern("ensure");
    const ID id_eval = rb_intern("eval");
    const ID id_main = rb_intern("main");
    const ID id_plain = rb_intern("plain");

Finally, in vm.c we find something really promising:

/* top self */

static VALUE
main_to_s(VALUE obj)
{
    return rb_str_new2("main");
}

VALUE
rb_vm_top_self(void)
{
    return GET_VM()->top_self;
}

That's more like it! And it's next to something being referred to as "top self", which sounds pretty main-like. So let's see where main_to_s gets used. I checked with ag, and it's only in vm.c:

void
Init_top_self(void)
{
    rb_vm_t *vm = GET_VM();

    vm->top_self = rb_obj_alloc(rb_cObject);
    rb_define_singleton_method(rb_vm_top_self(), "to_s", main_to_s, 0);
    rb_define_alias(rb_singleton_class(rb_vm_top_self()), "inspect", "to_s");
}

Cool. That looks useful - it's defining the main object's method "to_s", so we have the right object. It's saying that main is an instance of Object (called "rb_cObject" here) and defining to_s and inspect on it. That doesn't tell us anything else special about main... But knowing that it's called "top_self" does let us find other files that use main, so let's look for it.

Main, top_self, Ruby

If we search for 'top_self', we can see what we do with main. In fact, we can see everything Ruby does with the main object...

inits.c
24:    CALL(top_self);

eval.c
1940:    rb_define_private_method(rb_singleton_class(rb_vm_top_self()),
1942:    rb_define_private_method(rb_singleton_class(rb_vm_top_self()),

internal.h
1908:PUREFUNC(VALUE rb_vm_top_self(void));

vm_method.c
2131:    rb_define_private_method(rb_singleton_class(rb_vm_top_self()),
2133:    rb_define_private_method(rb_singleton_class(rb_vm_top_self()),

ruby.c
671:    VALUE self = rb_vm_top_self();

vm.c
466:    vm_push_frame(ec, iseq, VM_FRAME_MAGIC_TOP | VM_ENV_FLAG_LOCAL | VM_FRAME_FLAG_FINISH, rb_ec_thread_ptr(ec)->top_self,
2142:   rb_gc_mark(vm->top_self);
2443:    RUBY_MARK_UNLESS_NULL(th->top_self);
2574:    th->top_self = rb_vm_top_self();
3093:   th->top_self = rb_vm_top_self();
3101:   th->ec->cfp->self = th->top_self;
3181:rb_vm_top_self(void)
3183:    return GET_VM()->top_self;
3187:Init_top_self(void)
3191:    vm->top_self = rb_obj_alloc(rb_cObject);
3192:    rb_define_singleton_method(rb_vm_top_self(), "to_s", main_to_s, 0);
3193:    rb_define_alias(rb_singleton_class(rb_vm_top_self()), "inspect", "to_s");

vm_eval.c
1394:    return eval_string_with_cref(rb_vm_top_self(), rb_str_new2(str), NULL, file, 1);
1406:    return eval_string_with_cref(rb_vm_top_self(), arg->str, NULL, arg->filename, 1);
1474:    VALUE self = th->top_self;
1479:    th->top_self = rb_obj_clone(rb_vm_top_self());
1480:    rb_extend_object(th->top_self, th->top_wrapper);
1484:    th->top_self = self;
1516:       val = eval_string_with_cref(rb_vm_top_self(), cmd, NULL, 0, 0);

vm_core.h
596:    VALUE top_self;
870:    VALUE top_self;

lib/rdoc/parser/c.rb
476:      next if var_name == "ruby_top_self"

proc.c
3175:    rb_define_private_method(rb_singleton_class(rb_vm_top_self()),

load.c
577:    volatile VALUE self = th->top_self;
589:    th->top_self = rb_obj_clone(rb_vm_top_self());
591:    rb_extend_object(th->top_self, th->top_wrapper);
619:    th->top_self = self;
996:            handle = (long)rb_vm_call_cfunc(rb_vm_top_self(), load_ext,

mjit.c
1503:    mjit_add_class_serial(RCLASS_SERIAL(CLASS_OF(rb_vm_top_self())));

variable.c
2188:    state->result = rb_funcall(rb_vm_top_self(), rb_intern("require"), 1,

Since we're looking for what happens when we define a method on main, we can ignore some of what we see here.  Anything that's being initialized, we can skip. And of course, we can still ignore anything in doc, test or spec. Here's roughly how I'd summarize what I see in these files when I check around for top_self:

  • inits.c: initialization, like the name says
  • internal.h, vm_core.h: in C, files ending in ".h" are declaring structure, not code; ignore them
  • mjit.c: this is initialization, so we can skip it.
  • lib/rdoc/parser/c.rb: this is documentation.
  • load.c: this is interesting, and could be what we're looking for... except it happens only on require or load.
  • variable.c: this is only on autoload, and it just makes sure we autoload into the main object.
  • vm_method.c: this is defining "public" and "private" as methods on main. Neat, but not what we're looking for.
  • eval.c: this is defining "include" and "using" as methods. Also neat, also not quite it.
  • ruby.c: this takes some tracking down... But it's just requiring any library you pass with "-r" on the command line into main.
  • vm_eval: this is defining various flavors of "eval", which happen on the "main" object.
  • vm.c: there's a lot going on here - initialization and garbage collection, mostly. But in the end, it's not what we want.
  • proc.c: finally! This is what we're looking for.

Any Ruby source file can be great for learning more about Ruby. It's worth your time to search for "top_self" in any of these files to see what's going on. But I'll skip to the actual thing we're looking for - proc.c.

 

The Hunt... And the Capture!

In proc.c, if you look for top_self, here's the relevant bit:

void
Init_Proc(void)
{
  /* ... */
  rb_define_private_method(rb_singleton_class(rb_vm_top_self()),
                          "define_method", top_define_method, -1);

And that's what we were looking for. It's in an init method, it turns out - d'oh! But it's also redefining 'define_method' on main, which is exactly what we're looking for.

What does top_define_method do? Here it is, the whole thing:

static VALUE
top_define_method(int argc, VALUE *argv, VALUE obj)
{
    rb_thread_t *th = GET_THREAD();
    VALUE klass;

    klass = th->top_wrapper;
    if (klass) {
        rb_warning("main.define_method in the wrapped load is effective only in wrapper module");
    }
    else {
        klass = rb_cObject;
    }
    return rb_mod_define_method(argc, argv, klass);
}

Translated from C, and from Ruby internals this is saying several things.

If you're doing a load-in-an-anonymous-module, it tells you that you probably won't get what you want. Your top-level definition won't be properly global since you asked it not to be. And then it defines the method on Object, as an instance method, not just on main. And that is the feature we were looking for, defined in Ruby.

Winding Up

We've succeeded! We tracked down a simple but well-hidden feature in the CRuby source. It's cocoa and schnapps all around!

Benoit Daloze of TruffleRuby points out that this is all much easier to read if you define your Ruby internals in Ruby, like they do. He's not wrong.

But in case you're still using CRuby for things like Rails... It may be worth your time to learn to look around in the internals! And I find that nothing makes it work better than practice.

Also, this tells us exactly what's up with last week's post about Rake! Now we know how that works, which may be important when we wish to fix that...