How To Make A DSL, Hygienically

??? words · ??? min read

In this article, I’m going to show you how to implement a DSL like this:

xml version: '1.0', encoding: 'UTF-8'

weather at: @time.iso8601 do
  description @description
  temperature "#{@temp} C"
  wind do
    velocity "#{@wind_vel} kts"
    direction @wind_direction
  end
end

That produces XML like this:

<?xml version="1.0" encoding="UTF-8"?>
<weather at="2016-11-29T22:54:15+11:00">
  <description>Bright &amp; sunny.</description>
  <temperature>18.3 C</temperature>
  <wind>
    <velocity>14 kts</velocity>
    <direction>SSE</direction>
  </wind>
</weather>

Using code like this:

template = RXT::Template.from_file('weather.rxt')
puts template.render(
  time: Time.now,
  description: 'Bright & sunny.',
  temp: 18.3,
  wind_vel: 14,
  wind_direction: 'SSE',
)

I’ve given this Ruby XML templating language a spicy, exotic name: Ruby XML Template (RXT).

I know, XML doesn’t make for the most exciting DSL, and the usefulness of this particular DSL is questionable. But it will do nicely for the purpose of demonstration. It gives me the opportunity to showcase some of the more bendy, flexible parts of Ruby.

All the code is available on GitHub: tomdalling/ruby_xml_template

What Is A DSL?

Domain Specific Languages (DSLs) are custom-made computer languages, designed to be convenient for specific tasks. The DSL in this article is a shorthand for making XML documents, instead of something like this:

document = Ox::Document.new(version: '1.0', encoding: 'UTF-8')

weather = Ox::Element.new('weather')
document << weather

description = Ox::Element.new('description')
description << 'Bright & sunny.'
weather << description

# and so on...

In most other programming communities, DSLs are truly custom languages – with custom syntax, parsers, interpreters, compilers, and all that heavy-duty stuff.

In Ruby, DSLs are typically just Ruby code. These are much easier to implement, because you don’t need to write a parser or anything like that – you just use the parser built in to Ruby. This limits the DSL to the syntax of Ruby, but this is often an acceptable trade-off given that Ruby is such a flexible language.

When Should You Make A DSL?

In my opinion, DSLs are best used for making repetitive programming tasks more convenient. If you’re implementing new kinds of XML documents every day, and your XML gem is a bit cumbersome to use, a DSL could be helpful.

But beware – there are hidden costs. A bad DSL is worse than no DSL at all.

DSLs add an extra layer of complexity to your code. The combination of X + DSL will always be more complicated than X alone. You need to weigh this complexity against the convenience you are gaining.

DSLs are also notorious for being inflexible. They are designed to accomplish specific, not generic, tasks. You may adopt or create a DSL only to find that it doesn’t quite meet your requirements, and there is no simple workaround. This is why I recommend that DSLs should be an optional layer built on top of a flexible API. Solid design should come first, and convenience second.

So, while DSLs are certainly cool from a programming language perspective, do keep in mind that they are not all cupcakes and rainbows. Use them sparingly. If it feels more like a hassle than a convenience, consider ditching the DSL.

Step One: Write The DSL You Want

Start by writing the DSL code you wish you had, even though it can’t run yet. It’s supposed to be convenient, so make sure to include all the niceties that you’re looking forward to.

Here is the template code that I’m aiming for in this article:

xml version: '1.0', encoding: 'UTF-8'

weather at: @time.iso8601 do
  description @description
  temperature "#{@temp} C"
  wind do
    velocity "#{@wind_vel} kts"
    direction @wind_direction
  end
end

Remember that the code must be syntactically-correct Ruby. You can check the syntax with Ruby’s -c command line flag. This will parse the code and display any syntax errors, but will not run the code.

$ ruby -c weather.rxt
Syntax OK

This would be a good point to write a test. Take your dream DSL code, pass in some dummy input, and assert that the output is correct.

Note: The syntax highlighter doesn’t seem to handle heredocs properly, so the colors below are slightly wrong. They should also be squiggly heredocs (<<~) to avoid indentation issues.

RSpec.describe RXT do
  it 'is a template DSL for generating XML' do
    template = RXT::Template.new(<<-'END_TEMPLATE')
      xml version: '1.0', encoding: 'UTF-8'

      weather at: @time.iso8601 do
        description @description
        temperature "#{@temp} C"
        wind do
          velocity "#{@wind_vel} kts"
          direction @wind_direction
        end
      end
    END_TEMPLATE

    input = {
      time: Time.new(2016, 11, 30, 1, 2, 3, '+11:00'),
      description: 'Bright & sunny.',
      temp: 18.3,
      wind_vel: 14,
      wind_direction: 'SSE',
    }

    expected_output = <<-END_OUTPUT
      <?xml version="1.0" encoding="UTF-8"?>
      <weather at="2016-11-30T01:02:03+11:00">
        <description>Bright &amp; sunny.</description>
        <temperature>18.3 C</temperature>
        <wind>
          <velocity>14 kts</velocity>
          <direction>SSE</direction>
        </wind>
      </weather>
    END_OUTPUT

    expect(template.render(input)).to eq(expected_output)
  end
end

Step Two: Make The Wrapper API

Running the test above fails, complaining that RXT::Template does not exist, so let’s start there. This class is a wrapper for the DSL – it runs the DSL code, but does not implement the DSL itself. Here is the whole class:

module RXT
  class Template
    def self.from_file(path)
      new(File.read(path), path)
    end

    def initialize(rxt_source, filename='(rxt)', lineno=1)
      @block = CleanBinding.get.eval(<<-END_SOURCE, filename, lineno-1)
        Proc.new do
          #{rxt_source}
        end
      END_SOURCE
    end

    def render(instance_variables={})
      dsl = DSL.new

      instance_variables.each do |name, value|
        dsl.instance_variable_set("@#{name}", value)
      end

      dsl.instance_eval(&@block)

      root = dsl.__root
      Ox.dump(root, with_xml: root.attributes.any?)
    end

    module CleanBinding
      def self.get
        binding
      end
    end
  end
end

Precompilation

The initialize method is where things start to get interesting.

def initialize(rxt_source, filename='(rxt)', lineno=1)
  @block = CleanBinding.get.eval(<<-END_SOURCE, filename, lineno-1)
    Proc.new do
      #{rxt_source}
    end
  END_SOURCE
end

Here, all the template source code is being compiled into a Proc object using eval. The eval method takes a string, and parses it as Ruby code. This is how we use the builtin Ruby parser, instead of writing our own. All Ruby DSLs run code through eval at some point.

In this particular case, we are precompiling the template. The template is parsed and stored as a callable function object (a Proc). We could instance_eval the template source code in render, but that would reparse the template source every time. Using this precompilation approach, the template is parsed just once, and can then be reused every time render is called. It’s a minor difference that gives slightly better performance.

File Names And Line Numbers

The eval method takes a filename and lineno argument. These are optional, but important. When these are provided, and an exception is raised from within a template, you will get nice errors like this:

weather.rxt:6:in `block (2 levels) in get': oops (RuntimeError)

This error tells you which line (6) of which template file (weather.rxt) the error was raised from.

If you don’t provide a filename and line number, the same exception will give you an error like this:

rxt.rb:34:in `block (2 levels) in get': oops (RuntimeError)

Line 34 of rxt.rb is where eval was called, not where the exception was raised from. Good luck hunting down that bug!

Clean Bindings

Whenever you use eval, you must specify a binding. Bindings describe which local and instance variables are accessible, and what the value of self is.

It doesn’t really matter what self is in this case, for reasons we will soon see.

However I am concerned about local variables leaking into the templates. I don’t want variables that are magically accessible to every template. That sounds like a recipe for nasty bugs.

Ideally, the binding for the DSL should have no local variables. That’s where this little module comes in:

module CleanBinding
  def self.get
    binding
  end
end

Calling CleanBinding.get will return a binding object containing no local variables, where self is equal to CleanBinding, which is essentially an empty module. This stops variables from leaking into the templates, and limits the damage that templates could accidentally inflict via self.

Running The Template

The final step is to actually run the precompiled template code.

def render(instance_variables={})
  dsl = DSL.new

  instance_variables.each do |name, value|
    dsl.instance_variable_set("@#{name}", value)
  end

  dsl.instance_eval(&@block)

  root = dsl.__root
  Ox.dump(root, with_xml: root.attributes.any?)
end

We start by creating a clean, new DSL object. We will implement this class shortly.

The template parameters are passed into render as a hash argument. Each one is assigned to an instance variable on the dsl object using instance_variable_set.

The precompiled template code is then run against the dsl object using instance_eval. This runs the template code as if it were a method on the dsl object. The block will have access to all the methods and instance variables available on the dsl object.

After the template code has been run, the results (dsl.__root) are pulled out of the dsl object and converted into an XML string. You might be wondering why the __root method has two underscores, and why the dsl object isn’t responsible for creating the XML string itself. Let’s look at both of those points as we implement the final class: RXT::DSL.

Step Three: Make A DSL Class

The RXT::DSL class provides all of the DSL features accessible from the template files. The template source code is run as if it were a method defined on this class. Here is the entire class:

module RXT
  class DSL
    attr_reader :__root

    def initialize
      @__root = Ox::Document.new
      @__element_stack = [@__root]
    end

    def xml(attrs={})
      attrs.each do |key, value|
        @__root[key] = value
      end
    end

    def respond_to_missing?(method_name, include_private=false)
      true # responds to all methods
    end

    def method_missing(method_name, *args)
      elem = __make_element(method_name, *args)
      @__element_stack.last << elem
      @__element_stack.push(elem)

      yield if block_given?

      @__element_stack.pop
    end

    def __make_element(name, attributes_or_content={}, content=nil)
      if attributes_or_content.is_a?(Hash)
        attributes = attributes_or_content
      else
        attributes = {}
        content = attributes_or_content
      end

      Ox::Element.new(name.to_s).tap do |elem|
        attributes.each { |key, value| elem[key] = value }
        elem << content.to_s unless content.nil?
      end
    end
  end
end

Double Underscores

Instance variables are used as template parameters in this DSL, but that poses a problem. The RXT::DSL implementation needs a couple of instance variables in order to work at all, but these instance variables should not be used by templates. The double underscores indicate that these instance variables are private. They are still accessible from every template, because there is no easy alternative in Ruby, but this naming convention is a widely-understood warning sign. It says, “do not touch!”

The underscores also serve to avoid name collisions. It’s entirely plausible for a template to use a parameter called @root. If @__root had no underscores, it would be overwritten by the template parameter.

The same goes for the __make_element and __root methods. What if the XML output is supposed to have a <make_element> or <root> element? Without the underscores, trying to create these elements would result in a bug.

The general idea here is to avoid polluting the DSL namespace as much as possible. This is why RXT::DSL is a separate class to RXT::Template. It’s also why the XML string generation is in RXT::Template#render instead of RXT::DSL. The DSL class should have as few private methods and instance variables as possible. If methods can be pulled out and placed somewhere else, then do so. The few that are left should have a naming convention that discourages their use, and avoids collisions.

Defining DSL Methods

The first method called from the example template is xml:

xml version: '1.0', encoding: 'UTF-8'

Here is the corresponding implementation on RXT::DSL:

def xml(attrs={})
  attrs.each do |key, value|
    @__root[key] = value
  end
end

The details aren’t important. They are more about how the ox gem works, than how to make a DSL.

The important thing to note is that all methods defined on this class will be callable from the template code.

method_missing

In this DSL, XML elements are made by calling a method of the same name. For example, the method call name "Tom" results in the XML <name>Tom</name>. But element names are arbitrary, with infinite possibilities. It’s impossible to implement a method for each one.

Normally when you call a method that doesn’t exist, Ruby raises a NoMethodError. But before it raises an exception, it gives the object an opportunity to handle the call within method_missing.

The strategy for this DSL is to let the templates call methods that don’t exist, and catch them all in method_missing. Here is the implementation:

def method_missing(method_name, *args)
  elem = __make_element(method_name, *args)
  @__element_stack.last << elem
  @__element_stack.push(elem)

  yield if block_given?

  @__element_stack.pop
end

Every time a non-existent method is called, we create a new element using the attempted method’s name. The new element is appended to its parent element. Then we call the block, if one was given, to create the child elements.

Again, the XML-specific details aren’t super important. The important part is the use of method_missing to make the DSL work.

respond_to_missing?

Any time you implement method_missing you should also implement respond_to_missing?. Using method_missing alone breaks a few methods inherited from Object.

This is how Ruby objects are supposed to behave:

x = "hello"
x.length # works
x.respond_to?(:length) #=> true
x.method(:length) #=> #<Method: String#length>

If you can successfully call a method on an object, then respond_to? should return true, and method should return a method object.

But this is how an RXT::DSL object behaves without respond_to_missing? implemented:

dsl = RXT::DSL.new
dsl.whatever # works
dsl.respond_to?(:whatever) #=> false
dsl.method(:whatever) #=> NameError: undefined method `whatever' for class `RXT::DSL'

You can fix this discrepancy by implementing respond_to_missing?. It indicates which method names will be handled by method_missing.

This specific DSL handles all method calls, regardless of their name, so respond_to_missing? just returns true.

def respond_to_missing?(method_name, include_private=false)
  true # responds to all methods
end

Conclusion

That’s what a simple DSL implementation looks like, in under 100 lines of code.

DSLs are just Ruby code. They often don’t look like normal Ruby code, because they are implemented with the most dynamic, flexible parts of the language.

Use DSLs to make repetitive, cumbersome tasks more convenient. But beware, if used inappropriately, they can add unnecessary complexity to your codebase for little benefit.

When implementing a DSL, try to keep the DSL object clean.

  1. Keep unintended variables out of the binding.
  2. Pull functionality out into other objects, wherever possible.
  3. Use a naming convention to discourage the use of private methods and instance variables.

All the code is available on Github: tomdalling/ruby_xml_template

Got questions? Comments? Milk?

Shoot an email to [email protected] or hit me up on Twitter (@tom_dalling).

← Previously: Q&A With Piotr Solnica

Next up: Methods Can Be Longer Than Five Lines →

Join The Pigeonhole

Don't miss the next post! Subscribe to Ruby Pigeon mailing list and get the next post sent straight to your inbox.