My Experience Migrating a Rails API to Crystal and AWS Lambda – Part I

The First Iteration on Rails

Hey folks!

Today I am going to tell you how was my first project as a Codeminer42 employee. Two colleagues and I were charged of building a microservice for a logistics company. It was a great experience to learn about new services, features, and even a new programming language.

You can read the part 2 of this post here.

Without further ado, let’s start!

Contextualizing

Before telling the whole thing, I would like to bring in some context. As I said, this was my first official project as a Codeminer42 employee. Our job was to build an API that receives a list of locations and returns the best route, going through all locations. This is widely known as the Travelling Salesman Problem (TSP).

The TSP describes a salesman who must travel between "N" cities. The order in which he does so is something he does not care about, as long as he visits each once during his trip, and finishes where he was at first. The salesman wants to keep both the travel costs, as well as the distance he travels as low as possible.

Early Analysis

Our first concern was processing time since the TSP is an NP-hard problem. In Theoretical Computer Science, there are two basic classes of problems: P and NP. P includes the ones that can be solved "efficiently", or in Computer Science terms: in polynomial time. NP (Non-Deterministic Polynomial time) includes the problems where the answer can be verified in polynomial time.

What is unknown is whether problems such as the TSP have an efficient algorithm for finding the solution. This is the P versus NP problem: It asks whether every problem whose solution can be quickly verified can also be quickly solved. You can find more information about these classes of problems here, and even win one million dollars if you solve it.

Translating to our problem, the more locations we have, the more time to process all possibilities and find an optimal solution. Just as an example, if we have ten cities, there will be 181.440 possibilities to analyze. From this, we conclude that a Trial and Error algorithm would not be the best path to follow.

Since I have a Computer Science background, I know that NP-hard problems can be solved using heuristic approaches. Heuristic algorithms use shortcuts to produce good quality solutions that may not be optimal but are admissible due to a limited amount of time.

Genetic Algorithms

We decided to use a class of heuristic algorithms called Genetic Algorithms (GA) since my colleagues and I worked with it before, and they are relatively simple and easy to implement.

A Genetic Algorithm is a search heuristic that is inspired by the Theory Of Natural Evolution. This algorithm tries to reflect the process of natural selection where the fittest individuals are selected from a population and used for reproduction to produce offspring of the next generation. This is an excellent algorithm to study more about optimization techniques, and you can find more information about it here.

We found a great example of a GA in Ruby to solve the TSP that helped us understand how we could start to model our algorithm, but in the end, we used it for studying purposes only. You can check it out here, but do not try to use it on your application.

Also, I will not give more details about Genetic Algorithms because this is getting big, but you can find a Ruby implementation of them here.

The Rails API

As I said before, we decided to use Ruby on Rails due to our knowledge and previous experience with the framework. Since our application was supposed to have only one endpoint, we would be able to get the API out the door very quickly.

On this endpoint, we received a JSON with an origin, a destination, and a list of locations with their respective latitude and longitude values.

{
  "origin": 0,
  "destination": 5,
  "waypoints": [
    {"id": 0, "lat": -29.83635, "lng": -51.17556},
    {"id": 1, "lat": -30.83635, "lng": -52.17556},
    {"id": 2, "lat": -31.83635, "lng": -53.17556},
    {"id": 3, "lat": -32.83635, "lng": -54.17556},
    {"id": 4, "lat": -33.83635, "lng": -55.17556},
    {"id": 0, "lat": -34.83635, "lng": -56.17556}
  ]
}

As a response, we gave another JSON with the route distance and the order of the places (provided by our GA).

{
  "origin": 0,
  "destination": 5,
  "route": [0, 1, 2, 3, 4, 5],
  "total_distance": 12345
}

Now, I will talk about some interesting concepts and libraries that we used to develop this API.

Struct

A Struct is a convenient way to bundle many attributes together using accessor methods, without having to write an explicit class. Using Structs, we saved a lot of time defining our classes, and this was crucial mainly because we had only one month to finish the project.

Route = Struct.new(:origin, :destination) do
  def full_sentence
    "We are going from #{origin} to #{destination}!"
  end
end

route = Route.new("Paris", "Versailles")
route.origin         # => "Paris"
route.full_sentence  # => "We are going from Paris to Versailles!"

Why did you decide to use it?

First of all: the data we were defining had a simple structure. The primary information was about the coordinates of each location and the calculated distance. Sure, we could have created a class for this, but maybe that would be overkill.

Besides, the data did not need to be persisted since each input would result in a different route calculation. Even with the same locations, the results could be different between requests due to the GA heuristic behavior; therefore, there was no need to define Active Record models and save data on a database.

Disclaimer: The use of Structs is not necessarily a good practice for a few reasons. First, structs are data classes with no behavior (or very simple behavior,) so they carry a baggage of methods and protocols that are more appropriate for representing data; second, the constructor is not strict and lets you create instances with missing parameters. If you are not dealing with data semantics, I recommend you explicitly write classes with initializers.

Google Distance Matrix API

Since our input was a JSON with only latitude and longitude, we needed to calculate the distance between the locations. We used a very cool (but very costly) Google service called Distance Matrix API.

This service takes a matrix of origins and destinations and returns the distance and the amount of time to move from each origin to each destination. Moreover, there are a lot of optional parameters and travel modes you can use to get personalized results. As an example, you can set your travel mode to "driving" or "walking" and even "bicycling."

Again, to save time, we used an open-source Ruby client to invoke the Distance Matrix API. It helped a lot with building up the parameters since we would need to do much parsing on our input to build the request appropriately.

Sidekiq

As the project progressed, we noticed that due to the heavy data processing, it would be impossible to keep the calculations synchronous and still free of timeout failures. To avoid this, we decided to use Sidekiq for asynchronous processing.

"Sidekiq is a full-featured background processing framework for Ruby. It aims to be simple to integrate with any modern Rails application and much higher performance than other existing solutions."

Using Sidekiq, we changed a little bit of the application behavior, adding one more endpoint.

Now, our first endpoint would receive the JSON input mentioned above and return a Sidekiq JID (Job ID). An additional endpoint would be used to retrieve the result of the asynchronous processing (or a message in case the process has not finished), using JID as the parameter.

Conclusion

Even with all the optimizations, the algorithm stilled slowly to bigger datasets (with 70 or 80 locations,) and we started to think about solutions to make it faster. The first one was a hybrid solution using Crystal, a compiled language with Ruby-like syntax and type inference!

This will be in Part II, and I will explain why we chose Crystal to build our hybrid and then our permanent solution using AWS Lambda to deploy to production.

We want to work with you. Check out our "What We Do" section!