Working with MongoDB's aggregation pipelines in Ruby
If you needed to create a report
with details of how many users signed up for your app each month,
it would be trivial to do using SQL.
All you need to do is write a group_by
query
and voila, there’s your report.
I’ve always missed SQL’s aggregation features in MongoDB. The standard way to achieve this was to use map-reduce jobs. When you do this in Ruby, it ends up being very inelegant, with strings containing JavaScript code sprinkled inside the Ruby.
Recently I found out that MongoDB ships with aggregation pipelines since version 2.2. This makes it much simpler to do such queries without using map-reduce.
Testing in the Mongo shell
Before writing any Ruby,
let’s play around with the feature in Mongo shell.
For the sake of simplicity,
let’s only consider the users collection to contain 3 fields -
id, email and a created_at
date.
A single document from the collection might look like this:
To get our report, we use aggregation pipelines like this:
Here stage1
and stage2
are stages of the pipeline
that transform the data to create the final result set.
Each stage performs some transformation on the collection
and passes the resulting documents to the next stage.
Let’s look at how we can build up the pipelines. The pipelines are simple hashes that describe the operations to be performed on the data.
First of all, we need to
extract the month and year from the signup date.
For this, we can use a transformation called $project
.
We’ll put this in a variable called project
that we can later use in place of stage1
above.
This transforms each document into the form shown below.
The year and month of the signup date
are extracted into the signup_year
and signup_month
fields.
This makes it easy for us to group the documents
by year and month in the next stage.
The next stage of the pipeline is the $group
stage.
This is similar to group by
in SQL.
The below code for the group stage tells MongoDB
to group the data by month and year.
The signups: { '$sum': 1 }
line
adds 1 for each occurence of the year/month combination.
Let’s now go back to the line where we used the aggregation pipeline
and replace stage1
and stage2
by the variables project
and group
.
The hashes we created tell MongoDB how to transform the data to convert it into the required form. This will produce the final result in this form:
Ruby and mongoid
Now let’s try doing the same in Ruby
using the mongoid
gem.
Assuming that we have a User
model
that talks to MongoDB,
we can write:
For the project
and group
variables,
you just need to copy the code we typed into the mongo shell.
Since Ruby’s syntax for hashes is so similar to JavaScript’s,
we don’t even have to change anything there.
In this way we could easily apply more transformations on the collection. For instance, if we needed to sort the results, we could do this:
Even though I’ve used Mongoid gem in this example,
it’s not actually necessary to use this feature.
We could actually accomplish this with the moped
gem,
which is the driver mongoid
uses to talk to the database.
Using moped
we would do:
Here, session[:users]
returns the same object as User.collection
from the previous example.
Letting MongoDB take care of this responsibility means that our Ruby programs have to deal with less data and fewer object allocations.
If you find yourself doing such operations by fetching the entire collection and then transforming it in Ruby, you will find aggregation pipelines a very useful tool. Take a look at the list of available stage operators that you can use.