Benito Serna Tips and tools for Ruby on Rails developers

Fetching the top n per group with window functions

August 22, 2022

Have you ever needed to get the most recent N posts for each user in rails, but didn’t know how to do it without using map?

Or maybe something similar like:

Sometimes could be ok to just fetch all elements and filter with ruby, but sometimes it is not possible. Also it can cause n+1 queries if your are not careful.

Here I want to show you how you can solve this problem using window functions.

An example problem

Imagine you have a User and a Post record and a User has many Posts, like this:

class User < ActiveRecord::Base
  has_many :posts
end

class Post < ActiveRecord::Base
  belongs_to :user
end

And you want to fetch a list of users with their last 3 posts, but for performance reasons you don’t want to fetch all posts for all users in the list, you just want to ask the database for the 3 posts per user that you want to use.

Building the SQL query

First we need to order the posts and produce a rank for each user.

SELECT posts.*, dense_rank() OVER (
  PARTITION BY posts.user_id
  ORDER BY posts.id DESC
) AS posts_rank
FROM posts

And then use that query as a subquery to now select just the posts with rank less than or equals to 3.

SELECT * FROM (
  SELECT posts.*, dense_rank() OVER (
    PARTITION BY posts.user_id
    ORDER BY posts.id DESC
  ) AS posts_rank
  FROM posts
) AS ranked_posts
WHERE posts_rank <= 3

This will return a list with just the last three posts for each user.

Scope with a fixed n

Now we need to pass this query to rails in a way that it could build Post objects from this information.

This is how I made it work.

class Post < ActiveRecord::Base
  scope :last_n_per_user, -> {
    select_sql = <<-SQL
      posts.*, dense_rank() OVER (
        PARTITION BY posts.user_id
        ORDER BY posts.id DESC
      ) AS posts_rank
    SQL

    ranked_posts = select(select_sql)
    from(ranked_posts, "posts").where("posts_rank <= 3")
  }
end

From the sql we need the part that selects the posts with the rank, to then use it in the rails select method.

Then we use from to use the ranked_posts relation as subquery and then use where to define the post_rank that we want to filter.

The post_rank is what defines the “n”. The number of posts per user that we want to fetch.

At the end the sql it builds is very similar to our original sql:

SELECT posts.*
FROM (
  SELECT posts.*, dense_rank() OVER (
    PARTITION BY posts.user_id
    ORDER BY posts.id DESC
  ) AS posts_rank
  FROM posts
) posts
WHERE posts_rank <= 3

Scope with a variable n

If you want to make the “n” variable, you can pass it as a parameter to the where method:

class Post < ActiveRecord::Base
  scope :last_n_per_user, ->(n) {
    select_sql = <<-SQL
      posts.*, dense_rank() OVER (
        PARTITION BY posts.user_id
        ORDER BY posts.id DESC
      ) AS posts_rank
    SQL

    ranked_posts = select(select_sql)
    from(ranked_posts, "posts").where("posts_rank <= ?", n)
  }
end

Put it in a has_many association

Now that you have build the scope, you can build a has_many association passing that scope, like this:

class User < ActiveRecord::Base
  has_many :posts
  has_many :last_posts, -> { last_n_per_user(3) },
    class_name: "Post"
end

And now you will be able to preload just the last_posts like this:

users = User.preload(:last_posts).limit(5)

users.each do |user|
  puts user.last_posts.map(&:id).inspect
end

Code to run the example

Here is the full code that you can use to run with the Active Record Playground Runner.

schema do
  create_table :users do |t|
    t.string :name
  end

  create_table :posts do |t|
    t.integer :user_id
    t.string :title
  end

  add_index :posts, :user_id
end

seeds do
  users = create_list(User, count: 10) do
    { name: FFaker::Name.name }
  end

  create_list_for_each_record(Post, records: users, count: 100) do |user|
    { user_id: user.id, title: FFaker::CheesyLingo.title }
  end
end

models do
  class User < ActiveRecord::Base
    has_many :posts
  end

  class Post < ActiveRecord::Base
    belongs_to :user
  end
end

example "SQL query" do
  sql = <<-SQL
    SELECT * FROM (
      SELECT posts.*, dense_rank() OVER (
        PARTITION BY posts.user_id
        ORDER BY posts.id DESC
      ) AS posts_rank
      FROM posts
    ) AS ranked_posts
    WHERE posts_rank <= 3
  SQL

  result = ActiveRecord::Base.connection.execute(sql)
  puts result.to_a.inspect
end

example "With a fixed n" do
  class Post < ActiveRecord::Base
    scope :last_n_per_user, -> {
      select_sql = <<-SQL
        posts.*, dense_rank() OVER (
          PARTITION BY posts.user_id
          ORDER BY posts.id DESC
        ) AS posts_rank
      SQL

      ranked_posts = select(select_sql)
      from(ranked_posts, "posts").where("posts_rank <= 3")
    }
  end

  posts = Post.last_n_per_user.preload(:user)
  pp posts.group_by(&:user).map { |user, posts| [user.name, posts.map(&:id)] }
end

example "With a variable n" do
  class Post < ActiveRecord::Base
    scope :last_n_per_user, ->(n) {
      select_sql = <<-SQL
        posts.*, dense_rank() OVER (
          PARTITION BY posts.user_id
          ORDER BY posts.id DESC
        ) AS posts_rank
      SQL

      ranked_posts = select(select_sql)
      from(ranked_posts, "posts").where("posts_rank <= ?", n)
    }
  end

  posts = Post.last_n_per_user(3).preload(:user)
  pp posts.group_by(&:user).map { |user, posts| [user.name, posts.map(&:id)] }
end

example "In a has many association" do
  class User < ActiveRecord::Base
    has_many :posts
    has_many :last_posts, -> { last_n_per_user(3) }, class_name: "Post"
  end

  class Post < ActiveRecord::Base
    scope :last_n_per_user, ->(n) {
      select_sql = <<-SQL
        posts.*, dense_rank() OVER (
          PARTITION BY posts.user_id
          ORDER BY posts.id DESC
        ) AS posts_rank
      SQL

      ranked_posts = select(select_sql)
      from(ranked_posts, "posts").where("posts_rank <= ?", n)
    }
  end

  users = User.preload(:last_posts).limit(5)
  pp users.map { |user| [user.name, user.last_posts.map(&:id)] }
end

You can find the code also on: github.com/bhserna/last_n_per_user_window_functions

Related articles

No more… “Why active record is ignoring my includes?”

Get for free the first part of the ebook Fix n+1 queries on Rails that will help you:

  • Explain what is an n+1 queries problem
  • Identify when ActiveRecord will execute a query
  • Solve the latest comment example
  • Detect n+1 queries by watching the logs
  • Learn the tools to detect n+1 queries
Get the first part of the ebook for free