Parse Ruby Objects in Haskell

2017-04-24

In 2015 I released my first Haskell project ruby-marshal. It’s a package that uses the binary package to parse Ruby objects serialised with Marshal.dump. I wrote it in my spare time because I was curious to know whether I could devise a strategy to incrementally migrate legacy Ruby on Rails applications over to Haskell without the risk associated with a full rewrite.

My hypothesis was that if I could decrypt and de-serialise Rails sessions then I’d be able to piggyback on the Rails application’s authentication mechanism. Not long after, I had the opportunity to use this package at work, and put this theory to the test, by writing a Haskell web application that shared sessions with Rails.

It has been running in production – without any issue – for almost two years.

Marshal

Ruby’s Marshal library serialises Ruby objects to a bytestring e.g. dumping true results in [4, 8, 84] where 4 and 8 are the Marshal version number and true is represented as 84 or ASCII T.

Compound objects, e.g. hash maps, can also be serialised using Marshal.dump. This might explain why it was used as the default cookie serialiser in Rails until version 4.1, after which JSON serialisation became the default.

More information about the Marshal.dump binary format can be found in a series of blog posts by @jakegoulding or by reviewing the ruby-marshal source code.

Design

The ruby-marshal package allows us to transform this binary format into Haskell values and follows a pattern you’ll see elsewhere in the Haskell ecosystem. It consists of:

  • An abstract syntax tree (AST) that represents Ruby objects.
  • A collection of parser combinators to transform the Marshal binary representation into an AST.
  • A custom monad to enrich the underlying Get monad with additional effects.

AST

The Ruby AST represents a subset of values that can be encoded by Marshal.dump.

This is a common pattern you’ll see in other packages e.g. msgpack:Object and aeson:Value.

Parsers Combinators

Parsers are combined to build an AST e.g. parsing a raw bytestring is defined as follows.

It is then used by other parsing functions e.g. parsing a Ruby symbol.

Before being used in the top level parsing function that combines parsing functions and lifts values in to the Ruby AST.

Marshal Monad

A quirk of the Marshal format is that it saves space by encoding repeated objects as indexes into a symbol cache and an object cache. We use StateT to keep track of these during de-serialisation and enrich the underlying Get monad by creating a custom monad.

This allows us to write to and read from our cache during parsing without having to manually thread state through our parsing functions.

Examples

File IO

Let’s take a simple example of a Ruby string, serialise it and dump it to the file system using irb.

Switching over to Haskell we set up our imports.

Define a function to read our example from the file system.

Define a function that uses the Rubyable typeclass to convert a RubyObject to a more convenient representation.

Before putting it all together to print the Ruby string to the console.

Memcache

Let’s take another example of de-serialising Ruby objects stored in memcache using the the dalli gem.

We’ll reuse our existing Haskell code but add another import.

Define a function that creates a new memcache client.

Before putting it all together to pull the value out of memcache and print the Ruby string to the console.

Conclusion

By writing the ruby-marshal package, I was able to create a Haskell web application that coexisted with a Rails application. This approach has been a success at work and appears to be one way in which you could gradually migrate an existing web application written in Ruby over to Haskell without the risk associated with a full rewrite.