Lucas Luitjes

Freelance dev/devops/security. Click here to learn more about my background and the services I offer.

erb2builder

13 Mar 2022

Introducing erb2builder

As we were building a new kind of ruby-based no-code platform, we ran into a problem. There is great tooling for parsing Ruby code, and there is great tooling for parsing HTML code. But when you deal with HTML ERB templates, you end up mostly ignoring either the Ruby or the HTML.

This is why editors, linters, and tools like Rubocop generally don’t do quite as well on HTML ERB templates.

We ended up combining Nokogumbo, Treetop, and the parser gem, to build a custom HTML+ERB parser. This parser generates an abstract syntax tree that retains both the HTML structure and the Ruby structure.

Pretty cool, but of course there is zero tooling for our weird custom abstract syntax tree format. Then we remembered the excellent XML builder library. Turns out converting our strange format into builder is very simple. And because builder templates are pure Ruby, you can use any existing tooling that deals with Ruby code!

Long story short, if we have an user.html.erb file containing the following:

<h1>
  <% if @user %>
    Hello, <%= @user.name %>
  <% else %>
    Not logged in
  <% end %>
</h1>

We can convert it to builder:

> input = File.read("user.html.erb")
> output = Unparser.unparse(Erb2Builder.parse(input))
> puts output

xml << "      "
xml.h1 do
  xml << "\n        "
  xml << ""
  if @user
    xml << "\n          Hello, "
    xml << ""
    xml << @user.name
    xml << "\n        "
    xml << ""
  else
    xml << "\n          Not logged in\n        "
    xml << ""
  end
  xml << "\n      "
end
xml << "\n"

As you can see we insert a lot of whitespace and the occasional empty string into the generated XML. This builder template won’t win any beauty prizes, but it will retain all indentation and formatting when converted back into the original ERB.

Back into the original ERB:

> intermediate = Builder2Erb.parse(output)
> puts IntermediateView.reconstruct_erb(intermediate, true)

<h1>
  <% if @user %>
    Hello, <%= @user.name %>
  <% else %>
    Not logged in
  <% end %>
</h1>

Disclaimer

This is not how you’re supposed to write a parser. It’s a prototype. It’s hacky and weird and not very maintainable. It was however, a quick way to get this to work in the limited time we had.

If you want to convert ERB to builder in a production system, you should have a look at the internal code. Then you should think long and hard about whether you ever want to be in a situation where you have to debug this codebase in a hurry. If you do decide to use it, it would be wise to at least refactor a bit.

Nowadays there exists the lovely Gammo. If you were to build erb2builder from scratch, it might be easier to use that as a starting point.

All that said, it works pretty well for the templates we’ve thrown at it.

How it works

First step: split up the HTML and Ruby code.

By defining a simple Treetop grammar we can easily grab all the ERB tags in a template. We store the Ruby code they contain in an array. Then we replace the ERB tags with HTML comments. The comments contain a long random string and an index. Later on we can use the index to find the original Ruby code in the array.

HTML with comments:

<h1>
  <!-- GXWPeq29fONtpJrjAMKQA-erb-eval-snippet_index-0-strip_whitespace-true -->
    Hello, <!-- GXWPeq29fONtpJrjAMKQA-erb-print-snippet_index-1-strip_whitespace-true -->
  <!-- GXWPeq29fONtpJrjAMKQA-erb-eval-snippet_index-2-strip_whitespace-true -->
    Not logged in
  <!-- GXWPeq29fONtpJrjAMKQA-erb-eval-snippet_index-3-strip_whitespace-true -->
</h1>

Array with Ruby code:

[
  "if @user",
  "@user.name",
  "else",
  "end"
]

Now that all ERB tags have been replaced with HTML comments, our template is valid HTML. Time to parse that HTML! To be resilient against buggy templates, we use the Nokogumbo HTML5 parser.

As for the Ruby code, we join all the lines in the array with newlines, then parse the result with the parser gem:

s(:if,
  s(:ivar, :@user),
  s(:send,
    s(:ivar, :@user), :name), nil)

Now we have two tree structures: an abstract syntax tree of the Ruby code, and the DOM of the HTML. Time to merge these trees into a single tree using my favorite method name: doubletree_recurse.

This method recursively walks both trees, and outputs a single tree. We use the line number data of the AST nodes to determine which ERB tags they come from. That way we know, for example, which part of the HTML lives between the if and else ERB tags.

Probably the tree should consist of objects of custom classes, but currently it’s just arrays and hashes. Internally we refer to this format as an intermediate view.

[
  {
     :type => :text,
    :value => "      "
  },
  {
          :type => :html_tag,
           :tag => [
      {
         :type => :text,
        :value => "h1"
      }
    ],
    :attributes => {},
      :children => [
      {
         :type => :text,
        :value => "\n        "
      },
      {
         :type => :text,
        :value => ""
      },
      {
                  :type => :if,
                :lvasgn => nil,
               :keyword => "if",
          :print_output => false,
         :true_children => [
          {
             :type => :text,
            :value => "\n          Hello, "
          },
          {
             :type => :text,
            :value => ""
          },
          {
                    :type => :code,
            :print_output => true,
                    :code => " @user.name "
          },
          {
             :type => :text,
            :value => "\n        "
          },
          {
             :type => :text,
            :value => ""
          }
        ],
        :false_children => [
          {
             :type => :text,
            :value => "\n          Not logged in\n        "
          },
          {
             :type => :text,
            :value => ""
          }
        ],
                  :code => "@user"
      },
      {
         :type => :text,
        :value => "\n      "
      }
    ]
  },
  {
     :type => :text,
    :value => "\n"
  }
]

Side note: textractor operates directly on this intermediate view format. Source code coming in a future post.

Now we can just walk the tree and output an abstract syntax tree of builder code:

s(:begin,
  s(:send,
    s(:send, nil, :xml), :<<,
    s(:str, "      ")),
  s(:block,
    s(:send,
      s(:send, nil, :xml), :h1),
    s(:args),
    s(:begin,
      s(:send,
        s(:send, nil, :xml), :<<,
        s(:str, "\n        ")),
      s(:send,
        s(:send, nil, :xml), :<<,
        s(:str, "")),
      s(:if,
        s(:ivar, :@user),
        s(:begin,
          s(:send,
            s(:send, nil, :xml), :<<,
            s(:str, "\n          Hello, ")),
          s(:send,
            s(:send, nil, :xml), :<<,
            s(:str, "")),
          s(:send,
            s(:send, nil, :xml), :<<,
            s(:send,
              s(:ivar, :@user), :name)),
          s(:send,
            s(:send, nil, :xml), :<<,
            s(:str, "\n        ")),
          s(:send,
            s(:send, nil, :xml), :<<,
            s(:str, ""))),
        s(:begin,
          s(:send,
            s(:send, nil, :xml), :<<,
            s(:str, "\n          Not logged in\n        ")),
          s(:send,
            s(:send, nil, :xml), :<<,
            s(:str, "")))),
      s(:send,
        s(:send, nil, :xml), :<<,
        s(:str, "\n      ")))),
  s(:send,
    s(:send, nil, :xml), :<<,
    s(:str, "\n")))

Ready for our final step, we use the unparser gem to convert the AST into Ruby code. And here’s the result:

xml << "      "
xml.h1 do
  xml << "\n        "
  xml << ""
  if @user
    xml << "\n          Hello, "
    xml << ""
    xml << @user.name
    xml << "\n        "
    xml << ""
  else
    xml << "\n          Not logged in\n        "
    xml << ""
  end
  xml << "\n      "
end
xml << "\n"

Going back from builder to ERB is more straightforward. Parse the builder code into an AST, walk the AST and output an intermediate view, walk the intermediate view and generate ERB.

By the time it occurred to us to output builder, we already had code for converting from ERB to the intermediate view format. Converting intermediate view into builder was to quickest way for us to get this to work. But if you are building this all from scratch, you could probably skip the intermediate format and go straight from the two trees to the builder AST.

And that about wraps it up!

Our long-running side project has been on hold for more than a year due to time constraints. We are publishing some of the core technology now. The code is unpolished, but (to our knowledge) innovative. If you have a use for it, and want to spar with us, feel free to e-mail us at info@snootysoftware.com.