This article is to help you understand that you should benchmark before choosing Aurora PostgreSQL over PostgreSQL. We have often noticed that a lot of Customers migrating from Oracle to PostgreSQL get confused between PostgreSQL and Aurora PostgreSQL. Majority of leaders and key decision makers incline towards Aurora PostgreSQL as it is often marketed as 3 times faster than PostgreSQL. Is Aurora PostgreSQL really faster and cheaper than RDS PostgreSQL ? I would request all the leaders and experts to spend time reading this article for interesting observations from a benchmark done against AWS RDS PostgreSQL vs AWS Aurora PostgreSQL and how one could reduce their bills through proper insights.
Facing Performance issues on Aurora Postgres ?
Looking to Migrate to Aurora Postgres ?
General observations by some of our Customers using Aurora PostgreSQL
Some of our customers, may be including you reading this article, are curious about why there is a huge CPU utilization on their Aurora instances. Because, this did force our customers to upgrade their Aurora Instance types to resolve performance issues. Some customers have also seen Aurora IOPS being the major reason for heavy bills on Aurora PostgreSQL. Some of our customers also got surprised looking at some wait events on Aurora PostgreSQL that are never seen on PostgreSQL documentation.
For this reason, I always had the thought in mind to publish a benchmark of PostgreSQL on RDS and Aurora PostgreSQL and provide some insights. RDS PostgreSQL is generally considered as PostgreSQL though there are some limitations compared to vanilla PostgreSQL. We will discuss this difference in my next article.
A high level view of the benchmark results
In a gist, I would assume that the overall magic of Aurora PostgreSQL is observed due to its huge IOPS allocation compared to RDS. Again, this does not mean that the IOPS performed on RDS PostgreSQL and Aurora PostgreSQL are the same. There is always a huge deviation and you do pay for it. You will understand this in detail by the end of this article. When RDS was tuned with slightly better IOPS limits, it outperformed Aurora Postgres each time. This benchmark also shows the mystery of Aurora's high CPU utilization that gives an understanding of why some of our Customers had to upgrade their Aurora Instances during performance issues due to 100% CPU usage.
Following are the results of the benchmark performed against RDS with tuned IOPS and Aurora PostgreSQL databases.
Before looking into how the benchmark was performed and seeing some interesting graphs, let me talk about some of my observations from the benchmark. Every detail mentioned in this long article is worth reading even if it takes some time.
Observations from the Benchmark
Aurora Postgres CPU usage could be much higher than the RDS Postgres CPU usage
One of the most common reasons I have heard as the reason for choosing Aurora Postgres is that the replication latency is considered to be too low for near to real time reads on standby or reader instances. However, based on the CPU utilization graphs from AWS monitoring dashboards, it is very clear that RDS had a CPU utilization of less than 40% with tuned IOPS limits and lesser price, but for the same workload, Aurora Postgres had a CPU utilization of over 60% with lesser performance than RDS. For this reason, I would rather use the remaining server resources on the RDS Instance to satisfy my read workload, rather than spinning up a Synchronous standby for a near to real time read, at a cost of performance.
Aurora IOPS and IO Queue depth could be much higher than RDS
While RDS had a very tiny IO Queue depth between 5 to 10 sometimes, Aurora had an IO Queue depth mostly between 15 to 25 (surprising !). The Total IOPS reported on Aurora was several times higher than the Total IOPS on RDS, thus helping me understand why one of these Aurora Blogs mentioned that 65% of the bill of Aurora is its IOPS. Unfortunately, such articles talk about tuning the Postgres database while the mystery of why Aurora Total IOPS vs RDS Total IOPS had a huge variation, is not considered. So, Customers end up spending time tuning wrong places and assume it is the untuned database or the application that is responsible for huge bills, maybe. Eventually, one may end up upgrading their Aurora PostgreSQL Instances for more CPUs and pay more than they actually should.
Facing Performance issues on Aurora Postgres ?
Looking to Migrate to Aurora Postgres ?
Contact MigOps today
AWS never claimed Aurora PostgreSQL as PostgreSQL
An important fact to consider is that AWS never claimed Aurora PostgreSQL as PostgreSQL. AWS only claims Aurora Postgres as a PostgreSQL compatible database. The biggest difference is that PostgreSQL is an Open Source database software with over 30 years of active development. The development, patches, bugs, discussions and ideas are all open to the World. Anybody is free to contribute, review and discuss. More details about PostgreSQL can be seen on postgresql.org. At the same time, there may be some white-papers explaining how AWS designed Aurora Postgres by modifying the PostgreSQL source code and eliminating checkpoints and other IO generating background processes upon shifting a lot of logic to the storage layer. This deviation from PostgreSQL may be the reason why we still see Aurora PostgreSQL 13 while PostgreSQL 14 is already released. By the way, it took almost a year for Aurora PostgreSQL 13 (Aug, 2021) to be released while PostgreSQL 13 (Sept, 2020) released much earlier.
AWS Instances chosen for this benchmark
Following are the Server specifications chosen for this benchmark.
EC2 |
RDS PostgreSQL 13.4 |
Aurora PostgreSQL 13.4 |
|
Instance Type |
r5.xlarge |
db.r5.xlarge |
db.r5.xlarge |
Region/AZ |
us-east-1a |
us-east-1a |
us-east-1a |
VPC |
Same VPC |
||
CPU(s) |
4 vCPUs |
4 vCPUs |
4 vCPUs |
RAM |
32 GiB |
32 GiB |
32 GiB |
Storage |
300 GB |
Initially 300 GB and then upgraded to 1000 GB |
Unlimited (Up to 65TB) |
IOPS |
900 |
Initially 900 and later increased to 3000 |
Up to Instance limits |
Network |
Up to 10 Gigabit |
4,750 Mbps |
4,750 Mbps |
EBS Encryption |
Enabled |
Enabled |
Enabled |
Has Standby ? |
No |
No |
No |
What was used to perform the benchmark ?
pgbench has been used to perform the data load and to run TPC-B benchmarks. pgbench has been explicitly designed and developed by the PostgreSQL community for running benchmarks against PostgreSQL databases. While doing the benchmark, it is always important to perform repeated executions and see the TPS and thus I have 4 iterations each with 4, 8 and 12 clients. By the way, I have kept all the PostgreSQL parameters/flags as default and performed the test with AWS assigned default parameter groups.
As the EC2 Instance, RDS and Aurora PostgreSQL Instances are in the same network, we can run pgbench from the EC2 instance remotely for both initialization and the benchmark.
Eliminating performance burst through free credits, during benchmark
AWS allocates 5.4 Million free IO credits for a newly created RDS instance. I have put enough load on the RDS Instances to ensure that my actual benchmark would not show performance numbers based on the free IO credits. So, my initial tests have all utilized the free IO credits.
Initial Data Load - pgbench
In order to run the benchmark using pgbench, we must start with initialization. In this stage, it creates 4 tables followed by loading some data, depending on the scale factor specified. Scale factor used for the data load was 10000.
Command used to perform the data load is as follows. Data load using the following commands created 4 tables and loaded data of size : 146 GB.
-- RDS
$ pgbench -i -s 10000 -h rds_host -U postgres -d postgres -p 5432
-- Aurora
$ pgbench -i -s 10000 -h aurora_host -U postgres -d postgres -p 5432
Benchmark - pgbench
Benchmarking using pgbench ran with different levels of concurrency against both RDS and Aurora Postgres instances for an hour.
- Using a concurrency of 4 threads and 4 jobs
- Using a concurrency of 8 threads and 8 jobs
- Using a concurrency of 12 threads and 12 jobs
The command used to perform the TPS benchmark with 4 clients is as follows. The number 4 will be replaced by 8 and 12 when benchmarking with 8 and 12 clients.
pgbench -T 3600 -j 4 -c 4 -h <host> -U postgres -d postgres -p 5432
RDS PostgreSQL vs Aurora PostgreSQL with less IOPS on PostgreSQL RDS
When I initially performed the benchmarking on RDS with 300 GB storage (GP2 SSD) that gets us 900 IOPS (3 IOPS per each GB), the TPS (Transactions Per Second) was not that great because of the IOPS limitations for the workload on RDS.
Here are the benchmarking results for both RDS PostgreSQL 13.4 with untuned IOPS and Aurora PostgreSQL 13.4, with 8 clients for a period of 1 hour for all 4 iterations.
No doubt that Aurora PostgreSQL outperformed in this test. However, this is where everybody stops and do not understand why RDS did not perform well. Instead, they go ahead and migrate to Aurora PostgreSQL as the results create an assumption that Aurora is always many times faster than RDS.
What to do when your RDS PostgreSQL performance is not as good as expected ?
What I have done to understand the performance degradation better was by having a closer look at the Wait Events on RDS. I have seen where Postgres' performance on RDS went bad. The 3 major wait events observed on RDS Performance Insights are all directly related to IO as seen below.
IO:DataFileRead - Waiting for a read from a relation data file.
IO:WALSync - Waiting for a WaL to reach the durable storage.
LWLock:WALWrite - Postgres is waiting for WAL buffers to be written to disk.
Following image is just a snippet of the top wait events from AWS Performance Insights.
Optimizing the IOPS on RDS
I have noticed that my benchmark for around 8 hours on Aurora created 849 Million IO requests as per AWS Billing, that is equivalent to $ 169.88. This is when I decided to rather utilize some of these dollars to upgrade my storage on RDS for a better IOPS. So, I have upgraded my RDS storage to 1000 GB, which costs $ 100 dollars ($ 0.10 per GB-month x 1000), it gets me 3000 IOPS approximately. This is when the performance of RDS was observed to be much better than Aurora along with low resource utilization (CPU, IOPS, IO Queue depth, etc).
Facing Performance issues on Aurora Postgres or RDS ?
Looking to Migrate to Aurora Postgres ?
Contact MigOps today
Following TPS was observed on RDS and Aurora after the TPS benchmarking was completed.
4 Clients
Consistent TPS rate
It is very important to ensure that the TPS rate is consistent throughout the benchmark duration. For example, if it is fluctuating between high and low values, then, the performance may never be considered as consistent. The same has been analyzed using the Performance Insights graphs as seen below. By the way, the timezone of the Performance Insights graphs are of UTC whereas all the other graphs mentioned later in this article are of my local timezone. So, please do not get confused with the timings between Performance Insights graphs and other graphs.
Following is the RDS TPS Rate that has been consistently around the same value throughout the 4 iterations of the benchmark with 8 clients.
CPU Utilization on RDS vs Aurora
One of the important differences observed during this benchmark is the CPU utilization. The average CPU Utilization of the RDS was around 30% and it has never gone beyond 50%. But, you could notice that the Aurora average CPU utilization was above 60% and maximum CPU utilization has gone up to 90%.
RDS CPU Graph for all the 4 iterations of the benchmark with 8 clients.
Aurora CPU Graph for all the 4 iterations of the benchmark with 8 clients.
IOPS Utilization on RDS vs Aurora
Another huge difference observed during this benchmark is the IOPS utilization. The TPS of Aurora was lesser than RDS in this benchmark but the Aurora shows a huge IOPS utilization when compared with RDS. Please remember that these IOPS numbers are without a replica. If a replica is added it could increase the IOPS.
IOPS Utilization on RDS for all the 4 iterations of the benchmark
IOPS Utilization on Aurora for all the 4 iterations of the benchmark
Facing Performance issues on Aurora Postgres ?
Looking to Migrate to Aurora Postgres ?
Contact MigOps today
IO Queue depth on RDS vs Aurora
Another mystery observed during this benchmark is the IO Queue depth. Some of the blogs on Aurora do claim that Aurora PostgreSQL is great for massive concurrent workloads and faster than PostgreSQL. Some of the customers may also assume that Aurora has almost unlimited IOPS unlike RDS. In that case, there should not be an IO Queue depth that is worse than RDS. See the following graphs showing the IO queue depth on RDS vs Aurora.
IO Queue depth on RDS for all the 4 iterations of the benchmark
IO Queue depth on Aurora for all the 4 iterations of the benchmark
Conclusion
When you do not get enough performance on RDS, please see where the performance is going bad. A good place to start is by looking at the WAIT Events. Most of the problems on RDS are related to IOPS. Tune your IOPS and compare the performance between RDS and Aurora before switching to Aurora. As noticed above, your bill for Aurora IOPS may be huge along with the need of upgrading Aurora Instances because of its weird CPU utilization when compared with RDS.
I am volunteering to spend a few hours each month talking to Customers who deployed PostgreSQL on AWS or other cloud platforms and providing necessary advice. Especially, if you are facing any performance issues or huge bills and willing to get some advice, please send an email to : talktoavi@migops.com and our team should be able to schedule a call with me. For professional services around Migrations to PostgreSQL and tuning and maintaining PostgreSQL databases on cloud and On-Premise, please contact us at sales@migops.com or submit the following form.
Basically Aurora PostgreSQL has a background process called Aurora storage daemon which takes care of the data sync to storage which consume CPU and Memory. As per my discussion with AWS team, they mentioned it is part of the design.
excellent writeup sir.. clearly indicates that as with alot of AWS services, all that glitters is not always gold
This is really a pertinent comparison between Aurora Postgresql and RDS Postgresql
Thank you, great article.
Great Insights and comparison.
Great article! What are the reasons for higher CPU usage with Aurora? Is it possible that the compute used by the storage layer is the reason?