From ActiveRecord to Diesel


Learning something new every day is exercise for your mind.
(source: Jeremy Bowcock via Wikimedia Commons)

As software developers we’re paid to be professional learners. People hire us (or should hire us) for what we are capable of learning, not for what we already know. To stay in shape and keep up with our peers we need to practice by learning something new every day. We need to get daily exercise for our minds.

For me the best way to do this is to make simple tasks harder. Many times each day I need to copy files, parse XML, generate JSON, or send an HTTP request. The boring daily chores of a modern knowledge worker. To make my life more interesting and to exercise my mind, I sometimes choose new tools or programming languages that are harder to use, not easier, to accomplish these small tasks.

Let’s take an example. Recently I needed to run a simple SQL query on a Postgres database and produce a one-off report. I could have done this in 5 minutes using Ruby and ActiveRecord. Instead, I decided to use Rust and Diesel - a language and a tool I hadn’t used before. Instead of 5 minutes it took several hours, but I learned something new. I've written up the steps I took here today. Get your mind’s exercise for today and read on to learn how to execute a SQL statement using Rust.

TL/DR: I posted the code from this article if you want to skip the explanation and just try it.

Running a SQL Query Using Ruby and ActiveRecord

My actual data set looked different, but suppose as an example I had a Postgres table called users:

create table users (
    id serial primary key,
    first_name varchar(40),
    last_name  varchar(40)
);

…and I needed to count the number of users with multiple last names. People from Spanish speaking countries, like my wife, often have two last names. Because I was in a rush and had to get an answer quickly, I turned to ActiveRecord. This simple Ruby script ran a SQL query and also a bit of Ruby post-processing to find the number of users with multiple last names:

user_groups = User.all.partition do |user|
  user.last_name.split.size > 1
end

puts "Found #{user_groups.first.size} users with more than one last name."
puts "Found #{user_groups.last.size} users with one last name."

Ruby and ActiveRecord worked brilliantly. I was done. Neither the language nor the tools got in my way and I had the quick answer I needed. After just five minutes I moved on to more important work.

Time to Exercise My Mind

I had a quick answer, but didn’t learn anything. I realized that if I spent all of my time getting my work done as quickly as possible and never learned anything new, I’d fall behind in the long run.

Because this task was so simple, I decided to try to write the same report using Rust instead of Ruby. But how? Rust is a much more complex, low level language than Ruby is. How in the world could I learn about its type system, its borrow checker, its compiler/build tools, its debugging tools and, of course, its language syntax without spending months and months reading books?

The answer was just to try it. And what better way to try a new language than to work on a simple task like this. It was time for my daily exercise.

First, I installed Rust, poked around the The Rust Programming Language book a bit, and read the excellent Diesel tutorial. Rust has great documentation and getting up to speed with this extremely complex language is just a matter of time, focus and attention. Then I rewrote that simple Ruby/ActiveRecord report script above using Rust and Diesel. Here’s how I did it.

A New Rust Project

To get started, I created a new Rust project using Cargo:

$ cargo new last_names --bin
     Created binary (application) `last_names` project
$ cd last_names
$ cargo build
   Compiling last_names v0.1.0 (file:///Users/pat/path/to/last_names)
    Finished dev [unoptimized + debuginfo] target(s) in 7.68 secs
$ target/debug/last_names
Hello, world!

As you can see, Cargo got me started quickly, and I was able to print “Hello World” with just a few commands.

Cargo placed the source code in a Rust file called src/main.rs, which looks like this:

As a comparison, I show the equivalent Ruby code below in pink. Pretty simple.

From Gemfile to Cargo.toml

I knew my script needed to connect to Postgres and execute a SQL statement, so I added Diesel as a dependency to my Cargo.toml file:

Cargo is a dependency manager similar to Ruby’s Bundler. The Cargo.toml file lists the Rust libraries, or “crates,” that my script will use. It plays the same role as the Gemfile does for Ruby projects.

Then I installed Diesel by running “cargo build” again:

$ cargo build
    Updating registry `https://github.com/rust-lang/crates.io-index`
 Downloading diesel v1.3.0
 Downloading pq-sys v0.4.5

etc…

  Compiling diesel_derives v1.3.0
   Compiling diesel v1.3.0
   Compiling last_names v0.1.0 (file:///Users/pat/path/to/last_names)
    Finished dev [unoptimized + debuginfo] target(s) in 47.30 secs

Connecting to Postgres

Ok now I’m ready to start working on my report code, back in main.rs. The first thing I need to do is open a connection to Postgres. After that, I can figure out how to execute my report.

Here’s how I did it - again I show the Rust code in grey and the equivalent Ruby code below in pink:

The Rust code is fairly simple and easy to understand: It just passes a Postgres connection string to PgConnection::establish. One interesting and important detail here is the call to expect. PgConnection::establish returns a ConnectionResult value, which is derived from the Result enum:

As The Rust Programming Language explains, expect checks whether the result value contains Ok or Err. If it contains Err, expect prints out the given error message.

Assuming my Postgres server is up and running and contains a database called “names,” I can now compile, run my Rust script and connect to Postgres:

$ cargo build
   Compiling last_names v0.1.0 (file:///Users/pat/path/to/last_names)
    Finished dev [unoptimized + debuginfo] target(s) in 0.71 secs
$ target/debug/last_names
Connected!

Stepping back for a moment, the Ruby and Rust versions of this code are actually very similar. However, the Rust version is a bit more complex because I have to explicitly handle the possibility that PgConnection::establish might return an error. In Ruby I can ignore this possibility, although my script would crash if ActiveRecord::Base.establish_connection ran into a problem and raised an exception.

Inspecting My Postgres Column Types

Using Ruby, ActiveRecord automatically inspects my database table and creates methods on my model class for each column. But in Rust, I have to type in code that declares each Postgres table and column I plan to use. (Actually, Diesel can also inspect my database columns automatically using the infer_schema! macro, but as a beginner I find it easier to type this information in explicitly.)

Fortunately, Diesel provides a command line tool that makes this easy:

$ diesel print-schema --database-url=postgres://pat@patshaughnessy.net:5432/names --whitelist users
table! {
    users (id) {
        id -> Int4,
        first_name -> Nullable<Varchar>,
        last_name -> Nullable<Varchar>,
    }
}

The diesel tool printed out a series of Rust macros that declare the name of my Postgres table, and the names and types of each of the table’s columns. Notice that I provided two command line parameters: the Postgres connection string and my database table name.

The Diesel macros resemble the original SQL definition of my Postgres table:

Diesel used Int4 for the primary key column, and Nullable<Varchar> for each of the Postgres varchar string columns. Nullable must mean that my Postgres SQL schema allows null values for first_name and last_name. In other words, it means that I didn’t specify NOT NULL for these columns in my CREATE TABLE command.

Creating a Model

Next, using these macros as a guide, I wrote a Rust struct to serve as my database model. An instance of this struct was going to represent each row in my Postgres table:

Just as Diesel mapped each Postgres column type to a Rust macro with the corresponding type, I had to choose a Rust type for each Postgres column in my new struct. For the id column, I picked Rust’s i32 integer type. This made sense because the Int4 implied a 4 byte or 32 bit integer value. And for each of the string columns, I chose the Rust String type.

Finally, I pasted all of this into my main.rs source code file, as follows:

Reading this over, the Ruby version is much more concise. In Rust I specified the name and type of each column, while in Ruby ActiveRecord figured all of that out for me. (Although as I mentioned above, I could have asked Diesel to detect the column types automatically also.)

Trying to Execute a SQL Statement in Rust

Now after all of this setup, I’m finally ready to execute my report. Here’s the Rust code I wrote inside the main() function, following the call to PgConnection::establish:

It took me a while to figure out the proper syntax, especially the bits that use generic type notation, like Vec<User> and Vec<&str>. Also the .collect::<Vec<&str>>() code seemed a bit odd to me as well, because it includes a type inside a function name. The reason for this complex notation is that the collect function converts the iterator returned by split into an array, but the Rust compiler needs to know what type of values will be in that array. Rust normally uses type inference to figure this out, but in this case there’s no way for it to know. Therefore, I needed to specify the Vec<&str> type manually, telling Rust I wanted an array of string references.

But in the end I was surprised how similar the Rust and Ruby versions of this code are! Both load all the user records into memory in a similar way, although the Rust version has to check for errors explicitly. And both Rust and Ruby use partition with a closure that checks whether each last name string contains multiple words or not, by calling split.

I was very excited to try this out, so I ran the Rust compiler:

$ cargo build
   Compiling last_names v0.1.0 (file:///Users/pat/path/to/last_names)

error[E0277]: the trait bound `*const str:
            diesel::deserialize::FromSql<diesel::sql_types::Nullable<diesel::sql_types::Text>, _>`
            is not satisfied

  --> src/main.rs:30:25
   |
30 |     let results = users.load::<User>(&connection).expect(
   |                         ^^^^ the trait
   |                              `diesel::deserialize::FromSql<diesel::sql_types::Nullable
   |                              <diesel::sql_types::Text>, _>` is not implemented for `*const str`
   |
   = help: the following implementations were found:
             <*const str as diesel::deserialize::FromSql<diesel::sql_types::Text, DB>>
             <*const [u8] as diesel::deserialize::FromSql<diesel::sql_types::Binary, DB>>
   = note: required because of the requirements on the impl of `diesel::deserialize::FromSql<diesel::sql_types::Nullable<diesel::sql_types::Text>, _>` for `std::string::String`
   = note: required because of the requirements on the impl of `diesel::Queryable<diesel::sql_types::Nullable<diesel::sql_types::Text>, _>` for `std::string::String`
   = note: required because of the requirements on the impl of `diesel::Queryable<(diesel::sql_types::Integer, diesel::sql_types::Nullable<diesel::sql_types::Text>, diesel::sql_types::Nullable<diesel::sql_types::Text>), _>` for `(i32, std::string::String, std::string::String)`
   = note: required because of the requirements on the impl of `diesel::Queryable<(diesel::sql_types::Integer, diesel::sql_types::Nullable<diesel::sql_types::Text>, diesel::sql_types::Nullable<diesel::sql_types::Text>), _>` for `User`
   = note: required because of the requirements on the impl of `diesel::query_dsl::LoadQuery<_, User>` for `users::table`

error: aborting due to previous error

error: Could not compile `last_names`.

To learn more, run the command again with --verbose.

Ugh - what happened? I started to realize how difficult error messages from the Rust compiler can be to understand, especially for beginners like me.

And I certainly didn’t want to try the “—verbose” option the Rust compiler suggested. This message was already verbose enough!

Rust’s Type System Keeps Me Out of Trouble

After a while, I realized what the problem was. Reading the error message again more carefully, I saw the line:

diesel::sql_types::Nullable ...is not satisfied

Ah - the problem must have to do with the Nullable macro the Diesel utility generated for me. That indicated that my underlying Postgres column allowed null values. At first, I thought maybe I should add NOT NULL to my Postgres table and prevent users with null last names. That must be the answer.

But thinking about this some more, I realized that maybe having no last name is a valid state. That some people really don’t have last names… Bono, Lady Gaga, or Sting. Or maybe these are their last names and they have no first names? Either way it seemed to me that my report code should handle null values, and that it shouldn’t make assumptions about what’s in the data set.

I decided to adapt my Rust code to make this work, and to do that I needed a way of saving null values in my User struct. The problem, what the Rust compiler was trying to tell me, was that my User struct could not handle nulls as written. I needed to use an option type instead:

Like the Result enum we saw above, Option is an enum that can contain either Some or None… i.e. a null value:

To learn more about the Option type, read Chapter 6 from The Rust Programming Language.

After changing my User struct, I then had to rework the closure inside of partition. Here’s what I ended up with:

I wrote if let Some(ref name) to check whether or not the last name of each user struct was null. In other words, I “unwrapped” the last name. The ref keyword satisfies Rust’s borrow checker, which enforces Rust’s unique and innovative memory ownership model. Without ref, the Rust compiler would give me another error: “cannot move out of borrowed content.” To learn more about Rust’s ownership model, see Chapter 4 of The Rust Programming Language.

Finally, I compiled and ran my Rust code, and saw the results I expected:

$ target/debug/last_names
Connected!
Found 20 users with more than one last name.
Found 80 users with one last name.

As a test, I tried adding a user with no last name and got the expected results - the user with no last name was counted as if they had one:

$ psql names
psql (9.6.3)
Type "help" for help.
names=# INSERT INTO users (first_name, last_name) VALUES ('Bono', NULL);
INSERT 0 1
names=# \q

$ target/debug/last_names
Connected!
Found 20 users with more than one last name.
Found 81 users with one last name.

But when I reran my original Ruby code with a null last name, it crashed!

$ ruby main.rb
Connected.
main.rb:18:in `block in \': undefined method `split' for nil:NilClass (NoMethodError)

Rewriting my Ruby code in Rust revealed a bug I never knew I had. As shown above, I had to rework my Ruby code to check for a null last name inside the block, similar to what I wrote in Rust. Even though it required more typing and more work, Rust’s strict type system told me my code might crash at compile time, while Ruby didn't tell me anything was wrong until runtime.

Rust vs. Ruby

In the end I was able to rewrite my Ruby code in Rust and get the same results. This gist contains the Rust code I ended up with, if you’re interested in trying this yourself.

Writing the same algorithm in both languages and comparing, I noticed a few obvious differences. First, the runtime execution performance was far superior using Rust. I was able to load and process large sets of User records much, much faster using Rust. But development performance was better using Ruby. The speed at which I was able to write working code was faster using Ruby. It took minutes to write the Ruby version, but hours to write the Rust version. Of course, I’m a Rust novice. An experienced Rust developer who knew what they were doing could probably have written this in just a few minutes as well. Second, Ruby is concise, while Rust is verbose. It took much more typing to write Rust. But this extra thoroughness led me to find a bug in my Ruby code I never knew I had.

Which language is better? Neither. Ruby and Rust are different tools that are best used in different circumstances.

In the end what matters is that I learned something. One language isn’t better than another - but now I’m a better developer because I took the time to learn something new. By taking 5 hours instead of 5 minutes to accomplish a simple task, I gained valuable insight and knowledge. I gave my mind the exercise it was so hungry for. And thankfully no one noticed it took me 5 hours to run a report that should have taken 5 minutes.