Ryan Bigg

⟵ Posts

A tour of Twist

11 Aug 2021

When I’m writing my books, it’s more often than not that reviewers read and review these books on an application that I built myself called Twist. It looks like this:

Twist

In this post, I’d like to cover just one half of Twist: the backend half. There’s also a frontend half that’s just as interesting!

What’s most interesting about this backend application is that it is a Ruby application that is not a Rails application, and it is not a Sinatra application. While it borrows inspiration (and gems!) from the Hanami project, it is not a Hanami application. It’s a hybrid application, one that is built on top of some pretty great foundations from those Hanami gems, and also some gems from the dry-rb and rom-rb suites of gems. I’d like to spend this post covering a few key details here about Twist and its unique structure.

In this post I cover:

  • The application container object
  • Sub-system dependencies
  • Entities, Repositories and Relations from rom-rb
  • Receiving webhook requests from GitHub
  • Background jobs
  • Transactions
  • The GraphQL API

To cover what Twist does in a nutshell: Twist receives webhook notifications from GitHub after new commits are pushed to a book’s GitHub repository. Twist then reads a book’s Markdown or AsciiDoc files and converts those formats into HTML and stores it in a database. Later on, that HTML is served out through a GraphQL endpoint in the form of books, chapters, and elements. Readers can then read the book and leave notes on the book as they go. I use these notes to improve the quality of the books I write.

If you want to get an idea of what working on this app is like, I’ve live streamed about 10 hours of active development on this project. The first episode is on Youtube here.

The application container

The application starts off in a similar way to a Rails application. There’s a config.ru file that goes off and loads config/boot.rb. That sets up Bundler and Dotenv, configuring the gems and environment necessary for running the application. After that, we come to the application container.

An application container is a concept from the dry-system gem. It’s a space to define the components of your application and how they’re loaded. Twist does this with the following code:

require 'dry/system/container'
require 'dry/auto_inject'
require 'dry/system/loader/autoloading'
require 'zeitwerk'

module Twist
  class Container < Dry::System::Container
    config.root = __dir__

    config.component_dirs.loader = Dry::System::Loader::Autoloading
    config.component_dirs.add_to_load_path = false
    config.component_dirs.default_namespace = 'twist'

    config.component_dirs.add "lib" do |dir|
      dir.auto_register = true
      dir.default_namespace = 'twist'
    end
  end

  Import = Dry::AutoInject(Container)
end

loader = Zeitwerk::Loader.new
loader.inflector.inflect "graphql" => "GraphQL"
loader.inflector.inflect "cors" => "CORS"
loader.push_dir Twist::Container.config.root.join("lib").realpath
loader.setup

When this code defines a container for our application, it specifies component_dirs for that application. These are directories that contain code that is useful to our application. From here, twist specifies that anything in lib is something we want to register as components for this application. What’s great about this is that we’ve set this application up to be autoloaded, and so things are only ever loaded when we reference them. You get this feature with Rails, sure, but you also get it in dry-system too.

That autoloading is taken care of by the Zeitwerk gem. This gem will notice when you’re referencing a constant like Twist::Entities::Book, and if that constant hasn’t been loaded, it will attempt to load it from the lib/twist/entities/book.rb file. Throughout this application, all constants follow this naming pattern of where the path to the file matches the namespacing of the constant.

There are two special cases for Zeitwerk here where we want to inflect some words different to how Zeitwerk might try to. For GraphQL, Zeitwerk would look for a directory called graph_q_l by default. We can inform Zeitwerk should look for a directory called graphql by specifying that inflection rule near the bottom of this file.

When this container is finalized (with a call to finalize! in config/environment.rb), it will scan through the lib directory and register everything underneath it as keys within the application container. As a small example of the keys registered, here’s what happens when we ask the container for its keys:

Twist::Container.keys
=> ["oauth.client",
 "database",
 "authorization.book",
 "entities.book",
 "entities.book_note",
 "entities.branch",
 "entities.chapter",
 "entities.comment",
 "entities.commit",
 "entities.element",
 "entities.footnote",
 "entities.image",
 "entities.note",
 "entities.note_count",
 "entities.permission",
 "entities.reader",
 "entities.section",
 "entities.user",
 ...

These keys and their naming will be relevant in a minute.

System Dependencies

dry-system has a concept of system dependencies – parts of your application that you can opt-in to booting along with finalizing your application container. In Twist, we have a large core dependency in system/boot/core.rb:


Twist::Container.boot(:core, namespace: true) do
  use :persistence

  init do
    require 'babosa'
    require 'redcarpet'
    require 'nokogiri'
    require 'pygments'
    require 'sidekiq'

    require 'dry/monads'
    require 'dry/monads/do'
  end

end

This file is responsible for requiring the different Ruby libraries that this application will need during its runtime.

The other file in this directory is called persistence.rb, and it defines the persistence system dependency:

Twist::Container.boot(:persistence) do
  init do
    require "rom-repository"
    require "rom-changeset"
    require "sequel"
  end

  start do
    container = ROM.container(:sql, Sequel.connect(ENV['DATABASE_URL']), extensions: [:pg_json]) do |config|
      config.auto_registration(File.expand_path("lib/twist"))

      # config.gateways[:default].use_logger Logger.new(STDOUT)
    end
    register(:database, container)
  end
end

It is possible to configure these dependencies in such a way that core could be booted separately from persistence, but I haven’t found a reason to uncouple these yet.

In persistence, there’s require statements for the persistence libraries, and then when the application starts up, a new container for ROM is created here. The container for the application that we saw earlier contains configuration for the application. The ROM container contains configuration for the database itself. This new container automatically registers ROM-specific things from lib/twist, which we’ll look at now.

Entities, Repositories and Relations

In order to interact with this database, Twist uses rom-rb. Where ActiveRecord would tie everything together in the one class:

Active Record

These responsibilities are split out into several distinct classes within ROM-based applications.

  • Entities: Ruby classes that use ROM::Struct to define attributes.
  • Relations: Classes that interact with a single database table and where complicated query logic lives.
  • Repositories: Classes that serve as an adapter layer between relations and the application as a whole.

Entities

Here’s a quick example of an entity class:

module Twist
  module Entities
    class Book < ROM::Struct
      attribute :id, Types::Integer
      attribute :title, Types::String
      attribute :permalink, Types::String
      attribute :github_user, Types::String
      attribute :github_repo, Types::String

      def path
        git = Git.new(
          username: github_user,
          repo: github_repo,
        )
        git.local_path
      end
    end
  end
end

The attributes are defined right in the class, and so we don’t need to go skulking through db/schema.rb or the database itself or a console session to figure out what they are. We also get a type-safety of sorts here… if we attempt to assign a number to those string fields, this is what happens:

Dry::Struct::Error: [Twist::Entities::Book.new] 2 (Integer) has invalid type for :title violates constraints (type?(String, 2) failed)

Whereas a Rails app would just silently coerce the number into a string, ROM::Struct and its underlying Dry::Struct do not do that coercion and will warn you before you go ahead and do something you might later regret.

The path method here uses attributes from the entity to give us the path to a book on disk.

Note here that there is absolutely nothing to do with talking to a database in this class. Or validations. Or callbacks. The only code here is code that works with the attributes from the entity itself.

Repositories & Relations

Repositories are used to provide an interface layer between the relation classes that speak with the database and application-specific classes. Once data is returned from a repository, it is no longer possible to make database queries by calling additional methods on those returned objects.

In order to set this up to use the configuration from persistence.rb, Twist has a Repository class that imports the configuration:

module Twist
  class Repository < ROM::Repository::Root
    struct_namespace ::Twist::Entities

    include Import[container: "database"]
  end
end

The include statement here uses dry-auto_inject to find the database key’s configuration, and imports it here as container. All ROM repositories need to know the container that they’re operating in, so that they’re aware of which database to speak to.

A quick example of a repository would be the BranchRepo:

module Twist
  module Repositories
    class BranchRepo < Twist::Repository[:branches]
      commands :create, use: :timestamps, plugins_options: { timestamps: { timestamps: %i(created_at updated_at) } }

      def by_book(book_id)
        branches.where(book_id: book_id).to_a
      end

      def by_id(id)
        branches.by_pk(id).one
      end

      ...

ROM Repositories do not immediately offer us things such as create or update – we have to opt into them. That’s what’s happening here with the first line that specifies commands here.

The by_book and by_id methods here use another method called branches. This method is an instance of the Twist::Relations::Branches class and we can then call methods on that to query the database.

If we have particularly complicated query logic, we extract that logic out to relations. An example of this is the code from NoteRepo:

# rubocop:disable Metrics/AbcSize
def count(element_ids, state)
  counts = notes.counts_for_elements(element_ids, state)

  missing = element_ids.select { |id| counts.none? { |c| c.element_id == id } }
  counts += missing.map { |m| Entities::NoteCount.new(element_id: m, count: 0) }
  counts.map { |nc| [nc.element_id, nc.count] }.to_h
end
# rubocop:enable Metrics/AbcSize

The counts_for_elements method used here is defined within Twist::Relations::Notes:

module Twist
  module Relations
    class Notes < ROM::Relation[:sql]
      schema(:notes, infer: true)

      def counts_for_elements(element_ids, state)
        where(element_id: element_ids)
        .where(state: state)
        .select { [element_id, function(:count, :id).as(:count)] }
        .group(:element_id)
        .order(nil)
        .to_a
      end
    end
  end
end

What’s happening here is that we’ve got two spots of complicated logic: one where we need to query the database for the note counts for a range of elements, and another where we need to do some massaging of that data into Entities::NoteCount objects. By having the code split up into repositories and relations here, we’re able to have a clear “home” for the code that talks to the database (that’s the relation) and then code that massages that data belongs in the repository. In a typical Rails application, you would throw everything together into one super method in the model.

This count method on the NoteRepo will return a confusing looking hash:

Twist::Container["repositories.note_repo"].count([16], "open")
=> {16=>1}

This hash is not fit for human consumption, but is later sent through an API and then made fit for human consumption. The key is the element ID, and the values are the number of notes for that element.

How about another example, that is “fit for human consumption”?

Twist::Container["repositories.branch_repo"].by_book(1)
=> [
  #<Twist::Entities::Branch
    id=1
    name="master"
    default=true
    book_id=1
    created_at=2021-08-11 14:59:48.203662 +1000
    updated_at=2021-08-11 14:59:48.203662 +1000
  >
]

When we call the by_book method on the BranchRepo, that class reaches out to the branches relation and performs a where query, finding all branches for a particular book. When the data comes back from the database, it’s mapped into Twist::Entities::Branch objects. These entities are then very bare-bones:

[10] pry(main)> ls Twist::Container["repositories.branch_repo"].by_book(1).first
#<Dry::Core::Equalizer:0x00007fba268d3140>#methods: freeze  hash
Dry::Core::Equalizer::Methods#methods: ==  eql?
Dry::Struct#methods: []  __attributes__  __new__  attributes  deconstruct_keys  inspect  new  to_h  to_hash
ROM::Struct#methods: fetch
Twist::Entities::Branch#methods: default  id  name
Twist::Entities::Branch#methods: book_id  created_at  updated_at
instance variables: @attributes

Not counting methods from Object.methods, there are only 13 methods available on this object. A typical Active Record model will have around 300 different methods, but that varies based on the number of attributes it has too, as it adds in meta-programmed methods like title_will_change? from ActiveModel::Dirty and friends. I think 300 methods is a lot.

Receiving webhook requests

Twist’s primary purpose is to serve as a two-endpoint Rack app. The first endpoint – /books/:permalink/receive – is used by GitHub’s webhooks system. The second is /graphql.

GitHub notifies Twist through the “receive” endpoint whenever a book’s repository is updated. The route for this request is defined in the web part of Twist.

It’s here that it’s probably best to mention that Twist’s backend is split into two distinct parts. There’s the regular Ruby stuff, and the web stuff.

The “regular Ruby stuff” includes things like Twist::Entities::Book, a class that is used to represent data about a book within the application. The “regular” side of the application also includes the relations and repositories that we saw earlier. Let’s not also forget to mention that the “regular” directory also contains transactions, which are classes that perform one, and only one, transaction within the application. Things like creating a new note, or inviting a new reader to a book.

Over on the “web” side of the application (lib/twist/web), you’ll find the stuff you know and love from Rails: your router, your controllers, the actions.

The router is defined as a Hanami router:

module Twist
  module Web
    Router = Hanami::Router.new do
      if ENV['APP_ENV'] == "development"
        require 'sidekiq/web'
        mount Sidekiq::Web, at: '/sidekiq'
      end

      post '/graphql', to: Controllers::GraphQL::Run.new
      options '/graphql', to: Controllers::GraphQL::Run.new

      post '/books/:permalink/receive', to: Controllers::Books::Receive.new

      get '/oauth/authorize', to: Controllers::Oauth::Authorize.new
      get '/oauth/callback', to: Controllers::Oauth::Callback.new
    end
  end
end

As you can see from this code, routes do not route to one giant controller class with several actions inside it. Instead, each action is its own class. This concept makes it much easier to jump to the code for one particular action, without being confused about what code is relevant to which action inside a controller – like what might happen to you in a standard Rails application. An example of this is that the authorize and callback actions associated with the OAuth parts of this application are their own separate classes here.

We’re straying a little off track there, so let’s get back on track: the /books/:permalink/receive route. That route goes to the Controllers::Books::Receive action, which is defined in lib/twist/web/controllers/books/receive.rb. Note here that the path here matches the constant – this is something that Zeitwerk enforces so that it can automatically load our constant when we need it.

This controller starts out in an interesting way:

module Twist
  module Web
    module Controllers
      module Books
        class Receive < Hanami::Action
          include Twist::Import[
            find_book: "transactions.books.find",
            find_or_create_branch: "transactions.branches.find_or_create"
          ]

An include statement at the top of a controller that isn’t doing something like include ApplicationHelper? This certainly seems like some sort of black magic – at least that’s what I thought when I originally saw it.

What this is doing is including two helpful transactions from within our application’s container. This code will find the classes registered under the keys transactions.books.find and transactions.branches.find_or_create and it will load those classes at this point. Not only are these classes loaded, but new instances of these classes are initialized at this point too. And not only that, but these instances are then made available through the methods find_book and find_or_create_branch within this controller.

This is made possible by another dry-rb gem called dry-auto_inject.

Later on this controller, we can refer to these instances and call the call method on these transactions by writing code such as:

book = find_book.(permalink: req.params[:permalink])

And:

branch = find_or_create_branch.(book_id: book.id, ref: payload["ref"])

It’s worth noting that similar includes / imports can be found in those transaction classes too:

module Twist
  module Transactions
    module Books
      class Find
        include Twist::Import["repositories.book_repo"]

        def call(permalink:)
          book_repo.find_by_permalink(permalink)
        end
      end
    end
  end
end

The benefit here of using dry-auto_inject is that we can quickly include dependencies from across our application into other classes. dry-auto_inject works by re-defining the initialize method wherever it is used to something similar to this:

def initialize(find_book: Twist::Container['transactions.books.find'], find_or_create_branch: Twist::Container['transactions.branches.find_or_create'])
  @find_book = find_book
  @find_or_create_branch = find_or_create_branch
end

This means that we could change away from the default values here to something else entirely. A good example of where we might like to do that is within tests. You’ll see an example of this over in backend/spec/twist_web/controllers/books/receive_spec.rb.

describe Twist::Web::Controllers::Books::Receive do
  let(:find_or_create_branch) do
    ->(book_id:, ref:) { branch }
  end

  subject do
    described_class.new(
      find_or_create_branch: find_or_create_branch,
      ...
    )
  end

The test for this action immediately stubs out find_or_create_branch and will return a branch object for whatever’s passed in. Later on in the test, that variable is defined as:

let(:branch) { double(Twist::Entities::Branch, name: "master") }

This means that we don’t need to set up messy test data within factories – we can stub out our transactions to return simple doubles here instead. This means that the tests for this controller action are remarkably quick. In fact, by following this pattern throughout the whole application all of the application’s tests are remarkably quick! The slowest ones are ones that need to interact with a filesystem.

There are about 150 tests for this application so far and they take about 12 seconds to run.

Speaking of: when Twist receives a book, it enqueues that book for processing through one of its processor classes, either Twist::Processors::Markdown::BookWorker or its AsciiDoc equivalent. Let’s look at what happens then.

Background jobs

Receiving a webhook from GitHub takes a very short time. I really wish I could say the same for processing a book! However, there are some complexities here.

First, Twist needs to pull the latest changes from GitHub. By this point, Twist has only been notified that changes have happened – it doesn’t exactly know what’s changed. Twist runs a background job with Sidekiq that does a git pull on the book’s Git repository. From there, Twist will read each of the book’s chapters in a separate background job and convert the chapters into HTML. From there, it processes each individual HTML element using classes such as Twist::Processors::Markdown::ElementProcessor. I won’t go so much into this code, as it’s more Twist-specific than applicable to other codebases.

The complexities here involve downloading the code from GitHub, and uploading the images to S3. Each image is processed as another async job separate from the book and chapter async workers. The main issue here is that, for a few brief seconds, the words of a chapter can be available on Twist, but sometimes the images will not have finished processing.

In the end, what we end up with in the database is a new commit from GitHub tied to the book, and under that commit are the relevant chapters, and under those chapters their elements, such as section headings, paragraphs and images.

Transactions

Transactions are classes within Twist that perform a single operation / action / transaction. Let’s look at how a JWT is decoded within Twist.

module Twist
  module Transactions
    module Users
      class FindCurrentUser < Transaction
        include Twist::Import["repositories.user_repo"]

        def call(token)
          return Failure("no token specified") if !token || token.length == 0

          token = token.split.last
          return Failure("no token specified") unless token

          payload, _headers = JWT.decode token, ENV.fetch('AUTH_TOKEN_SECRET'), true, algorithm: 'HS256'
          user = user_repo.find_by_email(payload["email"])
          user ? Success(user) : Failure("no user found")
        end
      end
    end
  end
end

This class inherits from Twist::Transaction, and that class defines some common transaction things:

module Twist
  class Transaction
    def self.inherited(base)
      base.include Dry::Monads[:result]
      base.include Dry::Monads::Do.for(:call)
    end

    def permission_denied!
      Failure("You must be an author to do that.")
    end
  end
end

The dry-monads gem here provides Result objects, which can be used to short-circuit the execution of transactions to bail out early if things are going wrong. In the call method of this transaction, we have failure cases where the token might be missing or too short, or if a user couldn’t be found that matched a provided token. We have a single Success() case if the user could be found here.

We’re able to assert on the outcome of this transaction by calling .success? or .failure? on it, as indicated by this class’s tests:

require "spec_helper"

module Twist
  describe Transactions::Users::FindCurrentUser do
    include Twist::Import[
      "transactions.users.generate_jwt",
      create_user: "transactions.users.create"
    ]

    let(:user) do
      create_user.(
        email: "me@ryanbigg.com",
        name: "Ryan Bigg",
        password: "password",
        github_login: "radar",
      ).success
    end

    let(:token) { generate_jwt.(email: user.email).success }

    it "finds a user by a given JWT token" do
      result = subject.(token)
      expect(result).to be_success
      expect(result.success).to eq(user)
    end

    it "finds no user with a nil token" do
      result = subject.(nil)
      expect(result).to be_failure
    end

    it "finds no user with a blank token" do
      result = subject.("")
      expect(result).to be_failure
    end
  end
end

The benefit of using Success and Failure objects from dry-monads is that we can easily check for the success or failure of any transaction within the application.

GraphQL

As I mentioned at the start of this post, Twist is two distinct applications – a frontend application written in TypeScript + React and a backend application written in Ruby. To get the two to communicate nicely with each other I use GraphQL. GraphQL is great here as we can easily define the types of objects on the Ruby side of things, and have those same types available in the frontend.

Requests come in through the router and head to the Twist::Web::Controllers::GraphQL::Run action. This action sets up a Twist::Web::GraphQL::Runner to run the queries themselves. This second class exists so that I can test GraphQL queries separate from a request / response stack if I need to.

Ultimately, this code path leads us to twist/web/graphql/query_type.rb, which defines fields and how they’re resolved for queries:

module Twist
  module Web
    module GraphQL
      class QueryType < ::GraphQL::Schema::Object
        graphql_name "Query"
        description "The query root of this schema"

        field :books, [Types::Book], null: false, resolver: Resolvers::Books

The types are defined like this:

module Twist
  module Web
    module GraphQL
      module Types
        class Book < ::GraphQL::Schema::Object
          graphql_name "Book"
          description "A book"

          field :id, ID, null: false
          field :title, String, null: false
          field :blurb, String, null: false
          field :permalink, String, null: false

While the resolver above is defined like this:

module Twist
  module Web
    module GraphQL
      module Resolvers
        class Books < Resolver
          def resolve
            books = context[:book_repo].all
            books.select do |book|
              authorization = Authorization::Book.new(
                book: book,
                user: current_user,
                permission_repo: context[:permission_repo],
              )
              authorization.success?
            end
          end
        end
      end
    end
  end
end

It’s tempting to me to move most of this logic of the resolver out to a distinct Twist::Transaction class, but this code is only used within the resolver here. I think I would move it out to that transaction if I wanted to share it across the different parts of the application.

It’s important to note here that because we’re in GraphQL-land, we do as the GraphQL’ers do and read things like the book_repo and permission_repo from the context, rather than include-importing them. This then groups together all different context-related things into one variable, rather than having to chop and change between context and imported values.

When writing these GraphQL queries, I’ll typically start building out something in the frontend first. Once I’m happy with the structure, I’ll map that to a GraphQL query and then write a spec around it before writing the Ruby code to fulfill the query. Here’s the related spec for this books query:

require "requests_helper"

module Twist
  describe "Books", type: :request do
    let(:create_book) { Twist::Container["transactions.books.create"] }
    let!(:create_user) { Twist::Container["transactions.users.create"] }
    let!(:grant_permission) { Twist::Container["transactions.users.grant_permission"] }
    let!(:book) { create_book.(title: "Rails 4 in Action", default_branch: "master").success }

    let!(:user) do
      create_user.(
        email: "me@ryanbigg.com",
        password: "password",
        name: "Ryan Bigg",
      ).success
    end

    before do
      grant_permission.(user: user, book: book)
    end

    it "gets a list of books" do
      query = <<~QUERY
      {
        books {
          id
          permalink
          title
          defaultBranch {
            name
          }
        }
      }
      QUERY

      query!(query: query, user: user)
      expect(json_body).to eq({
        "data" => {
          "books" => [
            {
              "id" => book.id.to_s,
              "permalink" => book.permalink,
              "title" => book.title,
              "defaultBranch" => {
                "name" => "master"
              }
            }
          ]
        }
      })
    end
  end
end

It’s important to note here that the request specs in this application are the only place where we’re interacting with a database. I feel like this is necessary to ensure that the queries are working completely from end-to-end. If I was to stub something here, there’s no guarantee that the shape of what I’m stubbing matches exactly the shape of what would be returned by a database query.

So instead of hoping / praying that my stubbing is correct, I instead create real objects for books, users and permissions in this test. When the GraphQL query runs, it gets served by the real action and that queries a real (test) database and I can then assert that the data being returned here is what I intended it to be.

What’s really great here is that if I change the query on the frontend, I can just copy and paste it right into this spec without having to map it to something Ruby-ish. The same principle applies if I’m going the other way too. I really like that about GraphQL.

The final thing I’ll mention about GraphQL is that over in the Rakefile for this application I have a task setup to dump the GraphQL schema out:

require "graphql/rake_task"

GraphQL::RakeTask.new(
  schema_name: "Twist::Web::GraphQL::Schema",
  directory: "../graphql",
  dependencies: [:environment]
)

This dumps out a graphql/schema.graphql file as well as a graphql/schema.json file. These files are then read by some frontend tasks to automatically generate TypeScript types using the graphql-codegen library. This is then used to ensure that the types that I’ve defined in the GraphQL schema are exactly matching to the ones used over in the frontend. Without this, I reckon ensuring consistency between the backend and frontend applications would’ve been a major headache.

Summary

So there you have it, a grand tour of the Twist application with my favourite highlights. I really enjoy working within this application, moreso than any Rails application that I’ve ever touched. And sure, that might be because I was the one who wrote this app and I’m the only one who maintains it.

But, really, it’s more than that.

The clean layers of separation between things like the entities, repositories and relations make it really easy to reason about where code should go.

Transactions provide me a clear way of separating out application operations away from code that might’ve otherwise gone inside actions. I can then share this code between different parts of my application easily without worrying about having to untie it from a controller’s grasp.

And the GraphQL API provides a clear, distinct layer between my backend and frontend application parts, while still allowing an enforcement of the datatypes across the barrier.

It’s really a joy to work in.


If you like what you see in this post, you might also like to read my book Maintainable Rails which is just $5 at the moment. This book demonstrates how to bring the concepts shown in this post into a brand new Rails application, ultimately leading to an application that is easier to maintain in the long-term.