Why You Should Migrate your Heroku Postgres Database to AWS RDS

 
Migrating Heroku PostgreSQL addon database to Amazon RDS is represented by a van. Photo by Nubia Navarro (nubikini) from Pexels

Heroku PostgreSQL addon is excellent for a quick start setup of a new project. Once your web app matures, then migrating to an alternative database engine like Amazon RDS should be considered. In this blog post, I’ll describe the benefits and drawbacks of using AWS RDS instead of the default Heroku addon. I’ll also compare the pricing, available features, performance characteristics and explain why projects that care about EU GDRP compliance should avoid using the Heroku database.

I write this blog post in the context of the default Heroku public spaces. Private spaces are an enterprise feature, and the pricing starts from $1000/month.

Is there a performance overhead of using the RDS database on Heroku?

I’ve helped multiple projects to migrate their database from Heroku to AWS RDS as a part of my Rails performance audits. We’ve never noticed a measurable networking layer overhead after the switch. The reason is that Heroku uses AWS as an infrastructure provider. They currently offer two regions for public spaces: the United States and Europe. Europe is located in the AWS Ireland region (eu-west-1), United States in AWS North Virginia (us-east-1). Because underlying hardware and network are the same, there’s no performance impact of using the RDS with the Heroku application dynos.

Memory usage and cache hit ratio Heroku vs. RDS

cache hit ratio is a critical metric that indicates how much data is read from the in-memory cache instead of a slow hard drive. You can check this value for your Heroku database by installing heroku-pg-extras plugin and running:

heroku pg:cache-hit

Efficiently tuned databases should read over 99% of data from memory. A lower ratio means that the database is under-scaled or misconfigured, significantly slowing down the performance. I don’t have more proof than an anecdoate evidence but one of my clients had an issue with cache hit ratio consistently below 90%. Even upgrading the Heroku database plan did not increase it. Despite the lack of heavyweight queries, the RAM usage instantly spiked to over 99% after the warm-up, but the cache hit ratio did not improve.

BTW please don’t confuse memory usage with cache hit ratio! High cache hit ratio - GOOD, high RAM memory usage - BAD.

On Heroku, you can check the current memory usage by running:

heroku logs -t | grep heroku-postgres

You should see the PostgreSQL plugin logs output. You have to look for the following entries:

sample#memory-total=62560050kB sample#memory-free=11327524kB

After moving to AWS RDS, memory usage dropped by ~30%, and cache hits were finally consistently over 99%. One possible explanation for this phenomenon is the high (and hardcoded) value for work_mem PostgreSQL setting on Heroku. For example, for a database with 8GB of RAM, PGTune tool recommends setting work_mem to 5MB. Heroku premium-2 plan has a hardcoded value of 64MB, meaning that each client connection can consume more memory leaving less for caching. AWS RDS allows tweaking all the PostgreSQL settings, potentially letting you squeeze out better performance from the same hardware configuration.

Maybe it was a one-time edge case scenario. Still, if your Heroku database cannot achieve a decent cache hit ratio and memory usage is consistently high, then a switch to RDS could be a viable solution.

Limited permissions

Another potential issue of running your application on Heroku PostgreSQL is the fact that it does not grant superuser permissions. It means that a whole variety of tools and techniques won’t work on the Heroku addon.

You’ll probably not be aware that your app needs these tools before it’s too late. Two examples of tools that I could not use because my clients had their database on Heroku were PGRepack (reducing extended bloat without database downtime) and AWS Database Migration Service (seamless replication between database types). If your startup is continuously growing, you’ll probably need to start using more sophisticated tooling that requires full access to PostgreSQL.

On the contrary, AWS RDS grants full superuser permission to the database.

Is RDS as secure as the Heroku addon?

PostgreSQL on RDS can be significantly more secure than what Heroku has to offer. All the databases provisioned in Heroku are publicly accessible from any IP address. All you need to connect to the Heroku database instance is a valid connection URL:

psql "postgres://heroku_user:[email protected]:5432/database_name"

# psql (12.4, server 12.4 (Ubuntu 12.4-1.pgdg16.04+1))

SSL connection is enforced so you cannot connect with an SSL mode explicitly disabled:

psql "postgres://heroku_user:[email protected]:5432/database_name?sslmode=disable"

# psql: error: could not connect to server: FATAL:

Contrary to RDS, you cannot use an SSL verify-full mode because Heroku does not offer a CA certificate for database instances outside of the Private Spaces. It means that any time you directly connect to the Heroku database using psql or heroku pg:psql commands, you’re susceptible to a Man in the Middle attack. Check out those two somewhat dated but apparently still exploitable sources for more details:

MitM-ing Heroku Postgres

postgres-mitm script

If you provision an RDS database, it also has to be publicly accessible to talk to the Heroku dynos. A significant difference is that you can provide a link to database CA certificate and specify verify-full mode, which effectively protects you from MITM attempts:

psql "postgres://rds_user:[email protected]:5432/database_name?sslmode=verify-full&sslrootcert=config/amazon-rds-ca-cert.pem"

# psql (12.4 (Ubuntu 12.4-1.pgdg18.04+1), server 12.4)

You can check out this link for more info about different SSL modes in PostgreSQL

Bonus: meet your neighbors

Things get interesting if you’re using a free Hobby-dev plan of the Heroku PostgreSQL plugin. You can run the following query to see user, process and database names of people sharing the same node:

select datname, usename, application_name from pg_stat_activity;

 -- dxxxxxxxxxxxxx | kxxxxxxxxxxxxx | PostgreSQL JDBC Driver
 -- dyyyyyyyyyyyyy | kyyyyyyyyyyyyy | /app/vendor/bundle/ruby/2.6.0/bin/rake
 -- dzzzzzzzzzzzzz | kzzzzzzzzzzzzz | sidekiq 6.1.1 app [0 of 5 busy]
 -- ...
On my test instance I've had ~250 "neighbors"


Not a security threat, but maybe it’s something you’d like to keep in mind if you’re using the free plan. Worth noting that AWS offers one year of free RDS instance that is not shared with anyone.

Heroku PostgreSQL addon and GDPR

Disclaimer: I am not a lawyer, and this article does not constitute legal advice.

There’s an issue that many Heroku Postgres addon users might not be aware of. Even if your application is provisioned in the Heroku Europe region, its backups, Dataclips, and logs are still stored in the United States. Here’s an excerpt of my discussion with Heroku support about it:

Slow requests detected by Scout APM

Transcript:

Me:

Hi. I would like to know what is the physical location of the postgresql addon database backups. I have a Heroku app provisioned in EU. Does it mean that none my data is stored in the US?

Heroku Support:

When a database gets provisioned, the data associated with that database is stored within the region in which it’s created. However, a number of services that are ancillary to Heroku Postgres as well as the systems that manage the fleet of databases may not be located within the same region as the provisioned databases. Here are some:

Postgres Continuous Protection for disaster recovery stores the base backup and write-ahead logs in the same region that the database is located.

Application logs are routed to Logplex, which is hosted in the US. In addition to logs from your application, this includes System logs and Heroku Postgres logs from any database attached to your application.

Logging of Heroku Postgres queries and errors can be blocked by using the –block-logs flag when creating the database with heroku addons:create heroku-postgres:…

PG Backup snapshots are stored in the US.

Dataclips are stored in the US.

========================

European Union GDPR regulations are clear that European location is preferred for personally identifiable data of your users. If you want to be compliant, than moving away to a different database must be planned.

For some of my clients, the discovery that their users’ data is kept in the United States was an instant dealbreaker and a good enough reason to ditch the Heroku Postgres addon. Too bad, Heroku does not announce data location more readily in their docs.

You can read my other blog post for more info about GDPR compliance for web apps.

Pricing

Pricing for both solutions is comparable with a growing advantage on the side of RDS for more expensive plans. Let’s compare two highly-available database instances of 8GB and 16GB RAM each.

I choose m5 instance type that’s most suitable for general purpose database workloads. Contrary to a straightforward Heroku pricing page, RDS does not make it easy to calculate the total cost. Your best bet is to configure an instance in the console to display the final pricing. You can also use Amazon RDS Instance Comparison tool but remember to add 20% to cover the storage costs.

You don’t need to start with provisioned disk IOPS (input/output operations per second). General Purpose SSD comes with 3 IOPS/GB and burst ability, which should be enough for most cases.

RDS prices are for the eu-west-1 Ireland region.

Heroku RDS
Name Premium 2 db.m5.large
RAM 8GB 8GB
Storage 256GB 256GB
Max connections 400 Custom
Price/month $350 $318


Heroku RDS
Name Premium 3 db.m5.xlarge
RAM 16GB 16GB
Storage 512GB 512GB
Max connections 500 Custom
Price/month $750 $637


It looks like we’ve got a winner! If your database has 16GB of RAM and more, switching to RDS will bring in considerable savings.

It is worth noting that Heroku enforces a per plan connections limit that cannot be increased. On RDS, you can customize the number of max connections using max_connections setting. The Heroku database instance could stop scaling just because of the connections limit, forcing you to apply convoluted solutions like pg_bouncer. In RDS, you can tweak the maximum number of connections and other config variables to max out the currently provisioned hardware’s performance and throughput.

You can also cut the RDS costs by ~40% if you commit to a year or more upfront with Reserved Instances. From my experience, it usually not possible to predict database specs requirements so far in the future but your case could be different.

Monitoring and alerts

Heroku offers very limited ways to monitor database metrics. Console tools like:

heroku-pg-extras

pg-diagnose

give you point in time insights into database statistics. But since its text-only output it is cumbersome, to integrate them into automatic monitoring and alerts solutions.

BTW I’ve ported features offered by heroku-pg-extras to several programming languages. You can read my previous blogpost for more info about it.

On contrary, RDS shines when it comes to monitoring and alerts toolkit. With a few clicks, you can build informative Cloudwatch dashboards integrated with SNS alerts to the channel your of choice:

Abot AWS Cloudwatch dashboard requests detected by Scout APM

Abot for Slack AWS Cloudwatch dashboard. Web servers and PG stats at a glance.


Abot AWS SNS custom Slack alert

Custom AWS SNS Slack alert


But wait, there’s more. RDS offers an optional Enhanced Monitoring feature. Honestly, so far, Cloudwatch and pg-extras were enough for me to resolve database issues, and I did not have a chance to deep dive into those metrics. Still, they look super smart and useful:

Abot RDS enhanced monitoring metrics

RDS enhanced monitoring metrics

Other features

Dataclips

The one feature I’m missing in RDS after moving away from Heroku is Dataclips. They are perfect for giving instant data insights for less technical members of your team.

There’s a whole range of UI tools for PostgreSQL that can somehow replicate the same feature set, but none is as handy as Heroku Dataclips themselves.

Backups

Both solutions offer automated backups and point in time recovery. Personally, I find the RDS backups mechanism to be more robust and easier to work with.

The biggest drawback of the backups solution that Heroku offers is that they are all part of the PostgreSQL plugin instance. It means that accidental or malicious removal of your database plugin would irreversibly remove your database and all its backups. You can read more about this potentially disastrous issue and how to prevent it by adding secondary backups in my other blogpost.

Do you need a dev-ops experience to use AWS RDS?

Things cannot get any easier then Heroku PostgreSQL plug and play approach. RDS is also a fully-managed platform but requires a slightly more involved setup.

Choosing the instance type, storage type, configuring the security groups, and tweaking PostgreSQL settings… It might seem a bit overwhelming if you’ve never worked with AWS before.

If you want to do the migration, but your team is missing the required AWS/DevOps skills, then we’ve got you covered. You can now order the Complete Guide to Migrating Heroku Database to RDS.

Heroku migartion to RDS eBook cover

The guide describes all the steps required to safely move your Heroku PostgreSQL database to AWS RDS. Zero prior AWS experience is required to complete the guide. All the steps are described in a detailed way together with code snippets and AWS console screenshots.

Once you’re up and running, the most significant change when working with RDS instead of the Heroku database is that you can no longer use handy Heroku CLI commands.

Instead, you’ll need to get familiar with standard PostgreSQL CLI tools like psql, pg_restore, etc. Dev ops or not, every developer should be know who to use it, so taking your time to master them is a excellent investment.

Summary

If your application is past the proof of concept stage, I’d recommend switching Heroku addon to RDS. For the cost of a one-time setup, you’ll get a cheaper database engine that’s superior in security, compliance and robustness.



Back to index