JSONify your Ruby Translations

JSON is a ubiquitous format for storing and transmitting complex nested data, originally in Javascript but today in every major programming language. JSON is also increasingly used as a persisted data type, notably by PostgreSQL with its json and jsonb types, and more recently MySQL also supporting JSON natively. Ruby ORM like Sequel and ActiveRecord1 offer interfaces for querying on this kind of JSON-formatted data.

JSON Translation Data

Given its usefulness, I’ve naturally been interested in applying JSON to the storage of i18n data, a topic which I have written about before. As an arbitrary-depth hash format, JSON is a good fit for this kind of data, which tends to assume a similar nested key/value structure. With databases now providing ways to search JSON data quickly, we’ve got everything we need for a powerful translation storage and querying engine.

In this post I’ll introduce a new strategy for leveraging JSON data formats to store translations in a single table column. The strategy is implemented in Mobility, the i18n gem I’ve written about here before, and is inspired by (among other things) a library named Trans for Elixir.

Before jumping into this new JSON-based storage strategy, however, I’d like to first take a step back and think about the broader “translation mapping” problem, and how JSON fits into it.

Translation Mapping

When you think of translation as a Ruby developer, you may think of a YAML file with lots of nested translation keys and their corresponding translated values, popularized in Ruby by the I18n gem. Or you may think of gettext and .po (portable object) files. Or you may think of some other similar file-based key/value storage.

A key property of these kinds of internationalization and localization systems is that the context in which translations are written is fundamentally different from the context in which translations are used. Your users, for example, can’t edit the YAML i18n files for your application directly through the application itself.

JSON Translation Data

Some YAML translations

That may be obvious to you, but it’s important because I want to contrast this role of i18n strings with another, fundamentally different role translation can play in a running application. The key characteristic of translation in this second role is that reading and writing happen in the same context.

An example of this second category would be a blog platform which allows authors to translate their articles into different languages, and query for articles in any language. Another example would be an event site that allows its users to translate their event listings. But it could also include any of many other contexts in which translation data is dynamically added, updated, deleted, and queried through the application itself.

Event Traslations on Doorkeeper.jp

Translation Interface on Doorkeeper.jp

If ActiveRecord and Sequel are “Object Relational Mappers” (ORM), I like to think of the libraries that help with this type of translation mapping as Translation Relational Mappers. They act as a layer between the ORM (data) and the interface for writing, reading and querying translations.

Mobility is unique among libraries I have seen (in Ruby and any other language) in that it takes the “mapping” part of this problem very seriously. While other libraries tend to support one storage strategy or another, Mobility supports many storage strategies through one common interface.

Which brings us to the question: what is a “storage strategy”? An example will help make things more concrete. In Ruby at least, there is one approach that is more popular than any other, and it does not use JSON.

Translations in Tables

Globalize is the most well-known gem for translating ActiveRecord i18n model data and has popularized a particular approach to storing persisted translation data.

In this strategy, translations for a model are stored in a model-specific table. If your model Post uses a table named posts, you would create a translation table named post_translations with a locale string column and other columns for each translated attribute. Then when you access post.title, the gem fetches the translation for the model in the current locale and returns the value of the title column on the translation.

Keeping all translations for a given model in the same place has some advantages. It means, among other things, that you can call post.translations.each and cycle through each of your translations as a separate record; SpinaCMS does this when generating its sitemap, for example. Mobility supports this type of translation strategy with its Table backend, which is what Spina uses.

Although it’s nice having translations in a separate table, there are also some significant downsides. Querying in particular becomes much more complicated since you need to constantly JOIN your translation table(s).

Here’s the kind of query you get with this strategy (taken from the Mobility Wiki):

SELECT "posts".* FROM "posts"
  INNER JOIN "post_translations" ON "post_translations"."post_id" = "posts"."id"
  AND "post_translations"."locale" = 'en'
  WHERE "post_translations"."title" = 'JSONify your Ruby Translations'

That’s not too bad, but you can imagine that once you start adding complexity you quickly hit edge cases which can push the limits of your ORM. This has prompted many people, myself included, to consider other options for storing translations. One of these is to store translations in JSON format.

Enter the B

A few years back, with the release of its version 9.4, PostgreSQL introduced a new datatype called JSONB. The “B” in “JSONB” signifies that JSON data are stored in this column format not as a string, but as binary data. This allows for fast lookups and more powerful search queries, through new operators like “contains” (@) and “exists” (?) which apply to JSON objects and keys, respectively. It also allows you to store a mix of integers, strings, arrays and hashes in the same column. This is a considerable step up from PostgreSQL’s Hstore format which, although also a key/value store, is limited to depth-1 string-valued data.

One of my original goals with Mobility was to support this type of innovative format alongside other strategies like the table-based one described in the last section. This, indeed, is Mobility’s killer feature: just like how you can swap MySQL for PostgreSQL as your database and keep your ActiveRecord code mostly unchanged, Mobility allows you to swap storage backends while retaining the same interface for accessing and querying translations.

The first implementation of JSON in Mobility was a backend called Jsonb. Using this backend looks like this:

class Post < ApplicationRecord
  extend Mobility
  translates :title, backend: :jsonb
end

Given that you have a title column of jsonb type, Mobility will allow you to call a virtual attribute named title and (if we’re in the English locale) will return to you the value on the en key of the JSON hash.

post = Post.create(title: "JSONify your Ruby Translations")
post.title
#=> "JSONify your Ruby Translations"
Mobility.with_locale(:ja) { post.title = "Rubyの翻訳をJSON化しよう" }
post.save
post = Post.first
post[:title]
#=> {"en"=>"JSONify your Ruby Translations", "ja"=>"Rubyの翻訳をJSON化しよう"}

It will also allow you to query on this attribute just like it was a table column (via a scope named i18n):

Post.i18n.where(title: "JSONify your Ruby Translations").to_sql
#=> SELECT "posts".* FROM "posts" WHERE (("posts"."title" -> 'en') = "JSONify your Ruby Translations")

So Mobility is mapping the query on title to a query format specific to jsonb, using the -> operator.2

Mobility actually does some more subtle things here. If you pass an array as the query value, for example, Mobility will treat this as a set of values to match, just like you would with a normal table column. So the query:

Post.i18n.find_by(title: ["foo", "bar"])

would generate the SQL:

SELECT "posts".* FROM "posts"
  WHERE (("posts"."title" -> 'en') = '"foo"' OR
         ("posts"."title" -> 'en') = '"bar"')

With the Table backend mentioned earlier, the same code would generate a very different query:

SELECT "posts".* FROM "posts"
  INNER JOIN "post_translations" ON "post_translations"."post_id" = "posts"."id"
  AND "post_translations"."locale" = 'en'
  WHERE "post_translations"."content" IN ('foo', 'bar')

Whereas the Jsonb backend uses OR, the Table backend uses IN to match translations that have a column value in ('foo', 'bar'). As a Mobility user you don’t need to really worry about these details; just write your ActiveRecord (or Sequel) code as you would for an untranslated attribute and the query should “just work”TM.

Keeping ‘em together

One of the nice things about supporting many storage strategies in a single gem is that common implementation patterns can be leveraged to re-use code and minimize duplication. While there are gems offering Hstore-based model translation and gems offering JSON-based model translation, the implementation for each one is different in arbitrary ways (translates/translatable etc). Rather than re-invent the wheel, Mobility uses shared modules for accessing and querying PostgreSQL data types to eliminate duplication.

Which brings us to the new storage strategy mentioned at the top of this article. While playing around with Elixir recently, I came upon a package named Trans which uses a translation strategy I hadn’t previously considered. Like the Jsonb backend mentioned in the last section, this strategy uses a json or jsonb column in a PostgreSQL database to store translations. However, rather than store each set of attribute translations on its own column, Trans uses the fact that json columns can be deeply nested to store all translation on a single column, named translations. This would not be possible using Hstore since it is a depth-1 data type.

An example of this is shown in the figure below.

JSON Translation Data

Model Translations as a depth-2 JSON hash

This is an interesting storage format with some notable advantages:

  • It minimizes the number of migrations needed to just one per translated model. Translated attributes can be added and removed from a model with only minimal code changes, since the database schema does not change when you add new keys to a json/jsonb hash.
  • It requires no complex JOIN queries since translations are all on the same table.
  • It allows cycling through model translations like with table translations (with post.translations.each), except without a JOIN.

Borrowing the name from Trans, I added this strategy as a new backend named “Container” in Mobility 0.4. In Mobility 0.5., I also added support for json-type columns in addition to jsonb (there are subtle differences between the two that are out of the scope of this post, but in general you probably want to use jsonb).

Just like the examples with other backends mentioned earlier, Mobility thus now supports both reading/writing and querying on translations stored in a single json or jsonb column3. Just specify the container backend in your model (or as a default in your Mobility configuration):

class Post < ApplicationRecord
  extend Mobility
  translates :title, backend: :container
  # Or if you want to set a different column name:
  # translates :title, backend: :container, column_name: :my_translations
end

Mobility will generate the appropriate query to find translations on the shared translations column. So:

Post.i18n.where(title: "JSONify your Ruby Translations")

will generate the SQL:

SELECT "posts".* FROM "posts"
  WHERE ((("posts"."translations" -> 'en') -> 'title') = "JSONify your Ruby Translations")

You will notice that this query is very similar to the earlier one for querying on a JSON/JSONB column for a single attribute. The difference is that we have changed the clause:

"posts"."title" -> 'en'

to the clause:

("post"."translations" -> 'en') -> 'title'

Mobility leverages this similarity such that only small tweaks are necessary to support querying on the Container backend. Indeed, although as of version 0.5 there are now three JSON-based backends in Mobility (Json, Jsonb and Container), the actual amount of code is relatively small given how many features are supported. This is exactly the benefits I was after when I designed Mobility in the first place.

JSONify it, even just a little bit

The JSON format offers many advantages in storing translations, but of course it’s not a silver bullet. Some downsides include:

  • Storing all translations on the same column can stress the database, triggering space reallocation and possibly slow down performance.
  • Parsing translation data may be more complex that simply grabbing the value on a string or text column.
  • The range of query options available is limited by database support for the column type (json or jsonb), which will generally lag query options for normal columns.
  • Only works on PostgreSQL currently (no Ruby ORM support yet for MySQL json columns).

If you look around the nets, you can find a lot of debate about the advantages and disadvantages of different storage strategies for i18n data. Some argue for storing each translation on its own column. Others argue for storing translations on a separate table. Others argue for the JSON data type strategies described here.

Well, Mobility supports all these strategies with the minimum code required to do so. It does so, moreover, in such a way that you can mix or match however you like. Want to give JSON a whirl on one attribute, but stick with another gem for the other attributes? You can do that. Want to use one backend for most attributes, but test a JSON backend on some other attributes? You can do that too.

The gem’s flexibility is why I gave Mobility its name. So if you’re considering storing translations in your application, I’d highly recommend giving JSON a try, alongside other options. No single solution will work best for everybody, but there are enough options to mix and match that something out there should fit your needs.

And of course, if you find JSON doesn’t work for you, translation tables don’t work for you, translatable columns don’t work for you, a polymorphic key/value store doesn’t work for you… well, you can always roll your own.

If you do, let me know what you find! 😉

Notes

1 Actually, ActiveRecord’s support is pretty dismal compared to Sequel.
2 Incidentally, this operator accepts any JSON object, so you can translate and query translated hash data in the same way, for example. See the Mobility Wiki for more details.
3 You don’t need to specify whether you are using a JSON column or a JSONB column, Mobility will check the db schema and figure it out based on the attribute name.

Posted on March 20th, 2018. Filed under: translation, mobility, ruby, rails, sequel, postgres, json.
blog comments powered by Disqus