WebSockets in the Ruby Ecosystem

What the heck is a “WebSocket”, exactly? Some of us have heard about the changes that are coming to Rails with regard to WebSockets (e.g. Action Cable in Rails 5) but it’s a bit difficult to pinpoint exactly what WebSockets are meant to do. What problem do they solve and how do they fit into the realm of HTTP, the web, and Ruby? That’s what we’ll cover in this article.

Why?

Let’s dial the time machine to the beginning of the web. Way back in the day, as we all know, websites consisted of static pages with a bunch of links between them. These pages are called “static” because nothing about them really changes on a per-user sort of basis. The server just serves up the same thing to every single user based on the path that the user requests. We quickly realized that this sort of thing was all well and good if all we wanted the web to be was the equivalent of an easily available book, but we could actually do a lot more. So, with input elements and CGI (Common Gateway Interface – a way for external scripts to talk to the web server), dynamic elements creeped into web pages.

Now, we could actually process input data and do something with it. As websites got busier, we realized that CGI was pretty terrible at scaling. Along came a slew of options such as FastCGI to remedy this problem. We came up with all sorts of frameworks to make writing back-ends a lot easier: Rails, Django, etc. All this progress happened, but at the end of the day, we were still serving up some HTML (through a variety of methods), the user was reading this mostly static HTML and then requesting some different HTML.

Then, developers realized the power of Javascript and communication with the server through AJAX. No longer were pages just blobs of HTML. Instead, Javascript was used to alter the content of these pages based on asynchronous communication with the server. Often, state changes that occurred on the server had to be reflected on the client. Taking a very simple example, maybe we want a notification to show up on the admin panel when the number of users on the server exceeds a certain limit. However, the methods used to do this sort of thing weren’t the best. One common solution was HTTP long polling. With it, the client (i.e. Javascript code running in the browser) sends the HTTP server a request which the server keeps open (i.e. the server doesn’t send any data but doesn’t close the connection) until some kind of update is available.

You might be wondering: Why do we do this waiting stuff if the client could somehow just ask the server to tell the client when an update comes along? Well, unfortunately, HTTP doesn’t really let us do that. HTTP wasn’t designed to be a server-driven protocol. The client sends the requests, the HTTP server answers them. Period. Long polling isn’t really a great solution since it causes all sorts of headaches when it comes to scaling, users switching between Wi-Fi and cellular, etc. How do we solve this problem of letting the server talk to the client?

WebSockets

This is where WebSockets come into the picture. A bunch of very smart people got together and figured out a protocol that could be used for bi-directional communication between the client and server. They then wrote it up in an RFC (i.e. Request for Comments; which are more or less documentation for the various parts of the Internet), specifically RFC 6455. The document defines a TCP-like protocol that allows the server and the client to communicate. The client is given a specific API to talk to the server (known as the WebSocket API). The power of WebSockets becomes obvious when you realize that the client can now create events based on input from the server. Want to update your UI when a stock reaches a certain price? No problem, just have the server tell the client when that price limit is reached and write some event-driven code.

How does all of this stuff work alongside HTTP? Most of it works more or less independently of HTTP. However, the “handshake” has to be carried out over HTTP. If you’re familiar with TCP, you know that a handshake takes place as soon as you open a new connection. The handshake is meant to exchange some basic information and initiate the connection. For WebSockets, the handshake exists to let the server know that it should open up a WebSocket communication channel with the client. HTTP is used to convey this message with a special “HTTP Upgrade” request that has some fields specific to WebSockets.

But, this means that the HTTP server we use has to implement WebSockets, our web framework has to give us some kind of way to work with WebSocket connections, and our Javascript running on the browser (which serves as the client) has to be able to speak WebSocket.

Javascript and WebSockets

The last bit of the problem (Javascript talking WebSocket) comes in the form of WebSocket implementations provided by browser vendors. Recent browser versions include WebSocket support so the technology is a reasonable choice for many situations.

The WebSockets Javascript API has pretty good documentation from MDN. Although we’re focusing mostly on the concepts behind WebSockets, it’s worthwhile to take a quick peek at the Javascript API since most of the knowledge can be re-used on the back-end (remember: WebSockets are bi-directional, so we’d expect the API on both client and server to be very similar). We can create a WebSocket object:

var socket = new WebSocket("ws://domain.com/server", "protocol")

where “protocol” can be replaced by a string representing the protocol which is handled by the given server. Notice that the URL begins with ws://. This is the case for cleartext WebSockets, wss:// can be used for WebSockets over SSL. We can send data too:

socket.send("whatever")

Most importantly, we can receive data from the server:

socket.onmessage = function(ev) {
  console.log(ev.data);
}

So, Javascript support is pretty good. How about on the server side where our Ruby runs?

Ruby and WebSockets

The relationship between real-time messaging and Ruby has been complicated. The Ruby community was taken for quite a spin when Node.js shot out of nowhere and grabbed a ton of mindshare. The reason this happened wasn’t that Node is particularly more pleasant for writing run of the mill web applications. In fact, one could argue that a framework like Express is quite a chore to use in comparison to the more full-featured Rails. But, Node made real-time and event-driven interactions easy. I think Socket.io played a huge rule in this: it was finally easy to make the client and server communicate in a simple, hassle-free manner. For quite a bit of time, Rails was behind the curve when it came to real-time and it still is, to some extent. But, we do have several options that make WebSockets possible in Rails.

Faye

Faye is somewhat of a standard solution when it comes to real-time and Rails. Although it uses something called the Bayeux protocol, recent versions of it also include a standard WebSocket server. In fact, we had a fairly recent series detailing how to put together a real-time application using Faye.

EventMachine

EventMachine is an event processing library for Ruby that also happens to support WebSocket-based communication using em-websocket. It is a pretty simple API that allows us to put together real-time communication pretty quickly. EngineYard has a nice blog post on using WebSockets with EventMachine.

Outsourcing

This technique is a bit of a controversy. The idea is that even if you use Rails for most of your application, you don’t have to use it for the real-time messaging components of your application. For this, you could basically anything (say, Node) and run it under a different subdomain. There are a number of reasons why I dislike this approach. Generally, context-switching between two different languages gets incredibly annoying, especially since the real-time vs. non-real-time portions of your application are probably intimately connected. Secondly, code duplication comes for free using this approach, since you may have to reimplement some of your models in Node in order to serve up the right kind of real-time data. There are a few situations in which this sort of thing makes sense, however. If you have a largely traditional web application with only one or two very simple real-time components, it might be reasonable to implement these in Node.

ActionCable

This is the new kid in the block who hasn’t quite moved in yet. ActionCable is one of the new features of Rails 5 and, basically, allows us to use WebSockets within Rails without any hassle. However, we don’t have a ton of information on what ActionCable will turn out to be when Rails 5 is put in our hands (Fall 2015). I definitely believe that ActionCable can serve to rejuvenate interest in Rails within the web development community. Rails has been eclipsed by a ton of activity in the Javascript/Node world for a bit especially as real-time and SPAs have become the name of the game. Hopefully that will change.

Wrapping It Up

WebSockets are a pretty big deal because they alter the way we think about the client-server relationship when it comes to HTTP. In a sense, they blur the distinction between the client and the server, fixing the balance between the two. As more and more browsers implement the WebSockets standard, they’ll probably become crucial to how we develop web applications. Hopefully this article gave you an idea of how WebSockets came to be and how to use them within the Ruby ecosystem.