Ruby 3.1’s incompatible changes to its YAML module (Psych 4)

The Ruby programming language released version 3.1 back in . Among the changes was a big update to Psych version 4.0, Ruby’s built-in YAML Ain’t a Markup Language (YAML, a recursive acronym) interpreter. A major version change indicates incompatible changes, and version 4 sure does deliver on that promise.

It broke everything from modules in the Ruby standard library and major frameworks like Ruby on Rails, to Nanoc: the static-site generator powering this blog. Let’s dive in and explore the changes!

For over a decade, Psych’s load() and load_file() method have (in practice) been aliases for the unsafe_load() and unsafe_load_file() methods. In version 4, these methods have changed to their safe_ prefixed equivalents by default.

A safe load is better than an unsafe one, right? Yeah, it’s probably a good call to change the default. However, it has also broken a large number of Ruby packages.

Developers use the load() and load_file() method because they intuit its name and it works. (Intuitive method names are one of the Ruby language’s core strengths.) If developers had been aware of the safe_load() and safe_load_file() methods, they’d probably use them instead.

So, what’s the difference between the safe and unsafe methods? The former has a stricter syntax parser and won’t serialize as many data types as the latter does by default. The stricter parser is a better choice when loading untrusted and potentially malicious YAML files. (Interpreters are always at risk of misinterpreting unexpected data.)

Your use of YAML may be unaffected, or your program might fail after upgrading to Psych 4. Here are some highlights of the differences between safe_load() and unsafe_load() in Psych 4:

  • safe_load() disallows YAML aliases (a potentially recursive data structure) by default (override with the aliases: true argument).
  • safe_load() only deserializes a handful of default classes (below; extend the list with the permitted_classes: [Class] argument).

Here‘s the default list of serializable classes when running in safe mode:

  • TrueClass
  • FalseClass
  • NilClass
  • Integer
  • Float
  • String
  • Array
  • Hash

That should have your basics covered, right? Well, you might not have noticed, but the class list is missing the Date and Time classes. Many programs also expect Psych to serialize Symbol.

Almost everything I’ve ever used YAML for in Ruby includes a date or time object. If you do try to load a YAML file containing a date, you’ll now get the following error message:

Psych::DisallowedClass: Tried to load unspecified class: Date

This is the same error you’d get with earlier versions if you explicitly used the safe_load() methods without allowing the classes. You can extend the list of allowed classes by adding a permitted_classes: [Date] argument. It’s not a big deal to add the argument once, but let’s just say I’ve had to add it to a lot of places after the update.

To further complicate matters, Psych 4 dropped support for legacy positional arguments and now requires the use of named arguments. If you used to rely on safe_load(yaml, [Date]), you now need to migrate to safe_load(yaml, permitted_classes: [Date]).

The change away from positional arguments had been noted in the Psych module documentation for some years already. However, the software itself never issued deprecation warnings.

I believe the changes in and of themselves are good. The changes came from a good place and nothing but good intentions. However, the execution and introduction of incompatible changes were poorly handled.

Ideally, Psych should have printed warnings to standard error output (STDERR) to notify developers of the upcoming changes. Developers should have been given a heads-up warning and time to migrate. Instead, developers probably first noticed the changes when their programs stopped working.