How Our Amazon ECS Autoscaling Works

Our new autoscaling service for Amazon ECS services and clusters is live! 🎉

But how does it work? It’s proprietary, of course... (prepares NDA)

Totally kidding. We’ve gotten several questions around the Judoscale/ECS integration, so we wanted to write a quick post that walks through some of the basics. Let’s start with the adapter.

Adapter and Queue Time

Running Judoscale requires the installation of an adapter into your application, and we offer several: Rails, Express, Django, Flask, and more. We’re excited to bring them to Amazon ECS! While the adapters do several background operations to track your various systems’ queue times, tracking queue time from incoming web requests requires a bit more configuration.

Our adapters passively and transparently read a special header present on incoming web requests: X-Request-Start. Amazon’s various routing systems don’t add this header for us the way some other platforms do. So we need to add it ourselves!

While there are several ways to accomplish the header addition, and many of them do work with our system, the method we recommend is the Sidecar Container pattern within the Task Definition of the web-server process. If your particular application manages incoming requests and routing in another style, feel free to configure the header there and we can help you determine if it’s working properly.

In a quick walkthrough of the Sidecar pattern, let’s start with a simple Task Definition that just runs a plain Rails Server process. It would look something like this:

Diagram representation of an ECS task with a single process container

In this case, the Task Definition has a single container (the Rails Server process) and that container fields incoming requests on port 80 directly. The Sidecar pattern essentially injects another container (the ‘sidecar’) into the Task Definition which acts as a proxy / intermediary before the request makes it to the application. That looks more like this:

Diagram representation of an ECS task with a web process and an NGINX sidecar receiving and forwarding requests to the web process

The NGINX config at play here is extremely minimal:

server {
    listen 80;

    location / {
        proxy_set_header X-Request-Start "t=${msec}";

        proxy_pass http://localhost:3000;
    }
}

And, if this specific combination (listen on 80, pass to 3000) matches your needs, you’re welcome to use the publicly-available container we prepared here. If you prefer to build your own, you need only two lines of a Dockerfile to build the fully functional container (where nginx.conf is the above config):

FROM nginx:latest
COPY nginx.conf /etc/nginx/conf.d/default.conf

Once your sidecar is running, the Judoscale adapter should start reading the request header and observing request queue times automatically.

AWS Permissions and ENV

The second piece of the how-it-works puzzle revolves around AWS platform-level setup. This consists of three steps that deal less with your application code and more with your infrastructure itself, but all three steps are guided and automated by Judoscale (though not required to be if you prefer custom).

ECS Read Permissions

When you create a new Amazon ECS team in Judoscale, you’ll be greeted with this screen:

This is our default automation step that offers to run our pre-fab CloudFormation template for you. That template is linked directly in the view and you can see it here.

The tl;dr: is that this script simply creates ‘allow’ permission rules, specifically for Judoscale, for ecs:Describe*, ecs:List*, iam:ListAccountAliases, and account:ListRegions. These permissions allow us to populate the clusters list in the screen that follows:

Screenshot of the Judoscale UI prompting the user to select a Cluster from their ECS account to link into Judoscale

And, upon clicking “Link” for a cluster, the same permissions allow us to list out and prepare the services within that cluster:

Screenshot of the Judoscale UI prompting the user to select a Service from their ECS account to link into Judoscale

It’d be tough to autoscale an app we can’t read!

Judoscale ENV

Once you select a Service to link and get your framework-specific Judoscale adapter installed, you’ll be given a unique ENV value. That’ll look something like this in the UI:

Screenshot of the Judoscale UI prompting the user to setup and install a unique ENV key and value

This ENV value needs to be present in any task/container running a Judoscale adapter, but Judoscale doesn’t dictate the means by which you implement that requirement. We’ve worked with teams that hard-code the value into their Task Definitions, teams that add the key-value pair into their Terraform setups, teams that use a Parameter/Secrets Manager structure, and plenty of others. We leave that up to you, but we’re here to help if you’re not sure which path to take! (Did you know we have open office hours?)

Once you have the Judoscale adapter running and the environment variable in place, both the “Finished and Deployed” button and the page background will be your feedback confirmation. Clicking “Finished and Deployed” will either give you an error message of “We haven’t quite seen data come in yet...” or it will confirm that we are seeing data come in from your Service. Similarly, the charts in the background will begin moving and changing in real-time to reflect your Service’s traffic and queue times once your setup is working properly.

Write Permissions

Alright, so at this point we’ve got our Secret Sauce (™️) mostly formed — we have the AWS read-permissions we need to know about your clusters/services, we have the Judoscale adapter running in your application’s runtime, and we’ve confirmed the adapter is successfully reporting queue times... now we just need to scale! Automatically, even 😉

Changing your service’s scale count requires the last piece of first-time setup: AWS write permissions for your Service(s).

The first time you click the “Autoscaling On” switch for a given service, you’ll be presented with the final modal: another automation to add write permissions:

Screenshot of the Judoscale UI prompting the user to grant service-level write-permissions on Amazon ECS

Once again that script is linked in the view and you can see the raw source here.

But the tl;dr: on this one is that we update the same permission (role) we created in the original “read permissions” script and add an ‘allow’ for ecs:UpdateService only. That singular permission is what authorizes our platform to change your service’s scale count, and we do that by simply setting the service’s “Desired Scale Count” field. That’s it!

The Fully Working Machine

We’ve got all the permissions and data sorted out, and autoscaling is running smoothly at this point! So let’s get back to the primary question; how does it work?

Let’s take an illustrative approach... let’s say you’ve got a small ECS cluster — just a web service and a single worker service:

A diagram of an ECS Cluster with a single web service and worker service, each handling requests/jobs

But let’s not be too simple — we have multiple task instances running in each of those services, so our model might be better represented as so:

A diagram of an ECS Cluster with a single web service and worker service, but expanded to show the multiple task instances within each service

The first layer we’ll add here is the Judoscale adapter: its job is to transparently report queue time data from each individual instance in any given service to Judoscale:

The prior diagram but with indicators added to show a Judoscale Adapter running on each Task instance with a Service

As the data reaches Judoscale, it’s continuously scanned and monitored at a per-service level to ensure that the service is staying within its set scaling parameters. That data is also exposed in the real-time Judoscale UI so you can observe and understand your system health directly:

A screenshot of the Judoscale UI showing that real-time data is streaming in correctly

And as soon as one of your services begins reporting queue times that breach your chosen autoscaling settings, Judoscale automatically adjusts your scale count on AWS to compensate.. quickly. We generally prompt AWS to upscale within 10-20 seconds of detected slow-downs! Let’s illustrate that like this:

A diagram similar to the priors but now showing Judoscale issuing scale directives (scale up!) to the services themselves, thus, autoscaling

And that’s essentially it. The fully operational autoscaling machine is simple at its core: receive lots of data, continuously scan it all in real time, find breaches and outliers, and adjust the scale for services that need more (or less) instances depending on their reported queue times (the only truly reliable metric to scale on). Smooth and simple.

Thanks for tuning in to this little how-it-works! We’re really excited to GA this integration and we’re looking forward to supporting the teams that have been asking for it. Amazon ECS on Judoscale is a new frontier for fast, extremely-responsive infrastructure, and it’s ready right now. Scale on, friends!