Start Your SEO Right with Sitemaps on Rails

Share this article

Start Your SEO Right with Sitemaps on Rails

After crafting your website, the next step usually involves taking care of search engine optimization (SEO). With that in mind, creating a sitemap is one of the tasks that you will need to solve. According to the protocol, sitemaps are UTF-8 encoded XML files that describe the structure of your site. They are quite simple, but for large sites creating them by hand is not an option. Therefore, it’s a smart move to automate generating sitemaps.

There are a number of solutions for Rails to generate sitemaps available, but I prefer a gem called sitemap_generator. It is being actively maintained and has a number of cool features:

  • It is framework-agnostic, so you may use it without Rails
  • It is very flexible
  • It has own configuration file and is not strictly bound to your app’s routes
  • It allows you to automatically upload sitemaps to third-party storage
  • It automatically pings search engines when a new sitemap is generated
  • It supports multiple sitemap files and various types of sitemaps (video, news, images, etc.)

In this article we will see SitemapGenerator in action by integrating it into a sample Rails app and discussing its main features. I will also explain how to export sitemaps to cloud storage so that everything works properly on platforms like Heroku.

The source code for this article is available at GitHub.

Creating a Sample Site

As usual, start off by creating a new Rails application:

$ rails new Sitemapper -T

I will be using Rails 5.0.1 but SitemapGenerator works with virtually any version.

We will need some sample models, routes, and controllers. Views can be omitted for this demo – it does not really matter what content the site actually has.

Suppose we are creating a blog that has posts and categories; one category can have many posts. Run the following commands to generate models and migrations:

$ rails g model Category title:string
$ rails g model Post category:belongs_to title:string body:text
$ rails db:migrate

Make sure that models have the proper associations set up:

models/category.rb

[...]
has_many :posts, dependent: :destroy
[...]

models/post.rb

[...]
belongs_to :category
[...]

Now let’s set up some routes. To make things a bit more interesting, I will make them nested:

config/routes.rb

[...]
resources :categories do
  resources :posts
end
[...]

Also, while we are here, add the root route:

config/routes.rb

[...]
root to: 'pages#index'
[...]

Now create the controllers. We don’t really need any actions inside, so they will be very simple:

categories_controller.rb

class CategoriesController < ApplicationController
end

posts_controller.rb

class PostsController < ApplicationController
end

pages_controller.rb

class PagesController < ApplicationController
end

Great! Before proceeding, however, let’s also take care of sample data inside our application.

Loading Sample Data

To see SitemapGenerator in action we will also require some sample data. I am going to use the Faker gem for this task:

Gemfile

[...]
group :development do
    gem 'faker'
end
[...]

Install it:

$ bundle install

Now modify the db/seeds.rb file:

db/seeds.rb

5.times do
  category = Category.create({
                               title: Faker::Book.title
                             })

  5.times do
    category.posts.create({
                            title: Faker::Book.title,
                            body: Faker::Lorem.sentence
                          })
  end
end

We are creating five categories each with five posts that have some random content. To run this script use the following command:

$ rails db:seed

Nice! Preparations are done, so let’s proceed to the main part.

Integrating Sitemap Generator

Add the gem we’re using into the Gemfile:

Gemfile

[...]
gem 'sitemap_generator'
[...]

Install it:

$ bundle install

To create a sample config file with useful comments, employ this command:

$ rake sitemap:install

Inside the config directory you will find a sitemap.rb file. The first thing to do here is specify the hostname of your site:

config/sitemap.rb

SitemapGenerator::Sitemap.default_host = "http://www.example.com"

Note that this gem also supports multiple host names.

The main instructions for SitemapGenerator should be placed inside the block passed to the SitemapGenerator::Sitemap.create method. For example, let’s add a link to our root path:

config/sitemap.rb

SitemapGenerator::Sitemap.create do
  add root_path
end

The add method accepts a bunch of arguments. Specify that the root page is being updated daily:

config/sitemap.rb

add root_path, :changefreq => 'daily'

What about the posts and categories? They are being added by the users dynamically so we must query the database and generate links on the fly:

config/sitemap.rb

[...]
Category.find_each do |category|
  add category_posts_path(category), :changefreq => 'weekly', :lastmod => category.updated_at

  category.posts.each do |post|
    add category_post_path(category), :changefreq => 'yearly', :lastmod => post.updated_at
  end
end
[...]

Note that here I’ve also provided the :lastmod option to specify when the page was last updated (the default value is Time.now).

Running Generator and Inspecting Sitemap Files

To generate a new sitemap (or update an existing one) run the following command:

$ rails sitemap:refresh

Note that if, for some reason, a sitemap fails to be generated, the old version won’t be removed. Another important thing to remember is that the script will automatically ping Google and Bing search engines to notify that a new version of a sitemap is available. Here is the sample output from the command above:

+ sitemap.xml.gz          1 sitemap /  251 Bytes
Sitemap stats: 62 links / 1 sitemap / 0m01s

Pinging with URL 'http://www.example.com/sitemap.xml.gz':
    Successful ping of Google
    Successful ping of Bing

If you need to ping additional engines, you may modify the SitemapGenerator::Sitemap.search_engines hash. Also you may omit pinging of search engines by saying

$ rails sitemap:refresh:no_ping

Generated sitemaps will be placed inside the public directory with the .xml.gz extension. You may extract this file and browse it with any text editor. If for some reason you don’t want files to be compressed with GZip, set the SitemapGenerator::Sitemap.compress option to false.

Now that you have a sitemap in place, the public/robots.txt file should be modified to provide a link to it:

public/robots.txt

Sitemap: http://www.example.com/sitemap.xml.gz

SitemapGenerator may create an index file depending on how many links your sitemap has. By default (the :auto option) if there are more than 50 000 links, they will be separated into different files and links to them will be added into the index. You can control this behavior by changing the SitemapGenerator::Sitemap.create_index option. Other available options are true (always generate index) and false (never generate index).

If you wish to add a link directly into the index file, use the add_to_index method that is very similar to the add method.

Multiple Locales

Now suppose our blog supports two languages: English and Russian. Set English as the default locale and also tweak the available_locales setting:

config/application.rb

[...]
config.i18n.default_locale = :en
config.i18n.available_locales = [:en, :ru]
[...]

Now scope the routes:

config/routes.rb

[...]
scope "(:locale)", locale: /#{I18n.available_locales.join("|")}/ do
    resources :categories do
      resources :posts
    end

    root to: 'pages#index'
end
[...]

It is probably a good idea to separate sitemaps for English and Russian locales into different files. This is totally possible, as SitemapGenerator supports groups:

config/sitemap.rb

[...]
{en: :english, ru: :russian}.each_pair do |locale, name|
  group(:sitemaps_path => "sitemaps/#{locale}/", :filename => name) do
    add root_path(locale: locale), :changefreq => 'daily'

    Category.find_each do |category|
      add category_posts_path(category, locale: locale), :changefreq => 'weekly', :lastmod => category.updated_at

      category.posts.each do |post|
          add category_post_path(category, post, locale: locale), :changefreq => 'yearly', :lastmod => post.updated_at
      end
    end
  end
end
[...]

The idea is very simple. We are creating a public/sitemaps directory that contains ru and en folders. Inside there are english.xml.gz and russian.xml.gz files. I will also instruct the script to always generate the index file:

config/sitemap.rb

[...]
SitemapGenerator::Sitemap.create_index = true
[...]

Deploying to Heroku

Our site is ready for deployment, however, there is a problem: Heroku does not allow us to persist custom files. Therefore we must export the generated sitemap to cloud storage. I will use Amazon S3 for this demo, so add a new gem into the Gemfile:

Gemfile

[...]
gem 'fog-aws'
[...]

Install it:

$ bundle install

Now we need to provide a special configuration for SitemapGenerator explaining where to export the files:

config/sitemap.rb

[...]
SitemapGenerator::Sitemap.adapter = SitemapGenerator::S3Adapter.new(fog_provider: 'AWS',
                                                                    aws_access_key_id: 'KEY',
                                                                    aws_secret_access_key: 'SECRET',
                                                                    fog_directory: 'DIR',
                                                                    fog_region: 'REGION')

SitemapGenerator::Sitemap.public_path = 'tmp/'
SitemapGenerator::Sitemap.sitemaps_host = "https://example.s3.amazonaws.com/"
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
[...]

SitemapGenerator::S3Adapter.new contains configuration for S3. To obtain a key pair, you need to log into aws.amazon.com and create an account with read/write permission to access the S3 service. Do not publicly expose this key pair! Also create an S3 bucket in a chosen region (default is us-east-1).

Next, we are setting tmp/ for the public_path option – that’s the directory where the file will be initially created before being exported to S3.

sitemaps_host should contain a path to your S3 bucket.

sitemaps_path is a relative path inside your bucket.

Some more information about this configuration can be found on this page.

Another problem is that some platforms (Bing, for example) require sitemaps to be located under the same domain, therefore we need to take care of it as well. Let’s add a route /sitemap to our application that will simply perform redirect to S3:

config/routes.rb

[...]
get '/sitemap', to: 'pages#sitemap'
[...]

The corresponding action:

pages_controller.rb

[...]
def sitemap
  redirect_to 'https://example.s3.amazonaws.com/sitemaps/sitemap.xml.gz'
end
[...]

As you remember, by default SitemapGenerator will ping search engines but it will provide a direct link to S3 which is not what we want. Utilize the ping_search_engines method to override this behavior:

config/sitemap.rb

[...]
SitemapGenerator::Sitemap.ping_search_engines('http://example.com/sitemap')
[...]

Do note that now you need to generate sitemap by running

$ rake sitemap:refresh:no_ping

because otherwise SitemapGenerator will ping search engines with both the direct link and http://example.com/sitemap.

Lastly, update the robots.txt with a new link:

public/robots.txt

Sitemap: http://www.example.com/sitemap

This is it, now your site is ready to be published to Heroku!

Conclusion

We’ve reached the end of this article! By now you should be familiar with SitemapGenerator’s key features and be able to integrate it into your own application. If you have any questions, don’t hesitate to post them into the comments. Also, browse the gem’s documentation, as it has a number of other features that we haven’t discussed.

Thanks for staying with me and see you soon!

Frequently Asked Questions (FAQs) about SEO and Sitemaps on Rails

How do I generate a sitemap for my Rails application?

Generating a sitemap for your Rails application involves several steps. First, you need to add the ‘sitemap_generator’ gem to your Gemfile and run the ‘bundle install’ command. Next, you need to create a configuration file for the sitemap generator. This file will specify the pages you want to include in your sitemap. You can then run the ‘rake sitemap:refresh’ command to generate the sitemap. The sitemap will be created in the public directory of your Rails application.

How can I automate the process of sitemap generation?

Automating the process of sitemap generation can save you a lot of time and effort. You can use the ‘whenever’ gem to schedule the ‘rake sitemap:refresh’ task to run at regular intervals. This will ensure that your sitemap is always up-to-date, even if you forget to manually refresh it.

How do I deploy my sitemap to Heroku?

Deploying your sitemap to Heroku involves a few extra steps. First, you need to configure your sitemap generator to store the sitemap in a public directory. Next, you need to add the ‘aws-sdk-s3’ gem to your Gemfile and configure it to upload your sitemap to an S3 bucket. Finally, you need to set up a rake task to refresh the sitemap and upload it to S3 whenever you deploy your application to Heroku.

How can I ensure that search engines find my sitemap?

To ensure that search engines find your sitemap, you need to add a reference to it in your robots.txt file. This file tells search engines where to find your sitemap. You can also submit your sitemap directly to search engines like Google and Bing through their webmaster tools.

What should I include in my sitemap?

Your sitemap should include all the pages on your website that you want search engines to index. This typically includes all your static pages, as well as dynamic pages like blog posts and product pages. You can also include images and videos in your sitemap to help search engines understand your content better.

How do I handle large sitemaps?

If your website has a lot of pages, you may need to split your sitemap into multiple files. The ‘sitemap_generator’ gem supports this out of the box. You just need to specify the maximum number of links per sitemap in your configuration file.

How do I handle changes to my website structure?

If you make changes to your website structure, you should regenerate your sitemap to reflect these changes. You can do this manually by running the ‘rake sitemap:refresh’ command, or you can automate the process using the ‘whenever’ gem.

How do I handle multilingual websites?

If your website is available in multiple languages, you should create a separate sitemap for each language. You can do this by creating multiple configuration files for the ‘sitemap_generator’ gem, one for each language.

How do I handle pagination in my sitemap?

If your website uses pagination, you should include all paginated pages in your sitemap. The ‘sitemap_generator’ gem makes this easy by providing a ‘paginate’ method that you can use in your configuration file.

How do I handle errors in my sitemap?

If there are errors in your sitemap, search engines may not be able to index your website properly. You can check for errors by validating your sitemap using a sitemap validation tool. If you find any errors, you should fix them and regenerate your sitemap.

Ilya Bodrov-KrukowskiIlya Bodrov-Krukowski
View Author

Ilya Bodrov is personal IT teacher, a senior engineer working at Campaigner LLC, author and teaching assistant at Sitepoint and lecturer at Moscow Aviations Institute. His primary programming languages are Ruby (with Rails) and JavaScript. He enjoys coding, teaching people and learning new things. Ilya also has some Cisco and Microsoft certificates and was working as a tutor in an educational center for a couple of years. In his free time he tweets, writes posts for his website, participates in OpenSource projects, goes in for sports and plays music.

GlennGRuby on Rails
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week