3

Comparing AMD EPYC Performance with Intel Xeon in GCP

 2 years ago
source link: https://www.percona.com/blog/comparing-amd-epyc-performance-with-intel-xeon-in-gcp/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Comparing AMD EPYC Performance with Intel Xeon in GCP

AMD EPYC Intel XeonRecently we were asked to check the performance of the new family of AMD EPYC processors when using MySQL in Google Cloud Virtual Machines. This was motivated by a user running MySQL in the N1 machines family and willing to upgrade to N2D generation considering the potential cost savings using the new AMD family. 

The idea behind the analysis is to do a side-by-side comparison of performance considering some factors: 

  • EPYC processors have demonstrated better performance in purely CPU-based operations according to published benchmarks. 
  • EPYC platform has lower costs compared to the Intel Xeon platform. 

The goal of this analysis is to check if cost reductions by upgrading from N1 to N2D are worth the change to avoid suffering from performance problems and eventually reduce the machine size from the current 64 cores based (N1 n1-highmem-64 – Intel Haswell) to either N2D 64 cores (n2d-highmem-64 – AMD Rome) or even to 48 cores (n2d-highmem-48 – AMD Rome), to provide some extra context we included N2 (the new generation of Intel machines) into the analysis. 

In order to do a purely CPU performance comparison we created 4 different VMs:

NAME: n1-64
MACHINE_TYPE: n1-highmem-64
Intel Haswell – Xeon 2.30GHz
*This VM corresponds to the same type as the type we use in Production.

NAME: n2-64
MACHINE_TYPE: n2-highmem-64
Intel Cascade Lake – Xeon 2.80GHz

NAME: n2d-48
MACHINE_TYPE: n2d-highmem-48
AMD Epyc Rome – 2.25Ghz

NAME: n2d-64
MACHINE_TYPE: n2d-highmem-64
AMD Epyc Rome – 2.25Ghz

For the analysis, we used MySQL Community Server 5.7.35-log and this is the basic configuration:

Shell
[mysqld]
datadir   = /var/lib/mysql
socket = /var/lib/mysql/mysql.sock
log-error   = /var/lib/mysql/mysqld.err
pid-file = /var/run/mysqld/mysqld.pid
server_id                       = 100
log_bin
binlog_format                   = ROW
sync_binlog                     = 1000
expire_logs_days                = 2
skip_name_resolve
innodb_buffer_pool_size         = 350G
innodb_buffer_pool_instances    = 32
innodb_concurrency_tickets      = 5000
innodb_thread_concurrency       = 128
innodb_write_io_threads         = 16
innodb_read_io_threads          = 16
innodb_flush_log_at_trx_commit  = 1
innodb_flush_method             = O_DIRECT
innodb_log_file_size            = 8G
innodb_file_per_table           = 1
innodb_autoinc_lock_mode        = 2
innodb_buffer_pool_dump_at_shutdown = 1
innodb_buffer_pool_load_at_startup  = 1
table_open_cache                = 5000
thread_cache_size               = 2000
query_cache_size                = 0
query_cache_type                = 0

In all cases, we placed a 1TB balanced persistent drive so we get enough IO performance for the tests. We wanted to normalize all the specs so we can focus on the CPU performance, so don’t pay too much attention to the chances for improving performance for IO operations and so. 

The analysis is based on sysbench oltp read-only workload with an in-memory dataset, the reason for this is that we want to generate traffic that can saturate CPU while not being affected by IO or Memory. 

The approach for the benchmark was also simple, we executed RO OLTP work for 16, 32, 64, 128, and 256 threads with a one-minute wait between runs. Scripts and results from tests can be found here

Let’s jump into the analysis, these are the number of Queries that instances are capable to run: 

MySQL Queries

The maximum amount of TPS by Instance Type by the number of threads:

Threads/Instance N1-64 N2-64 N2D-48 N2D-64

16

164k 230k 144k

32

265k 347k 252k 268k

64

415k 598k 345k

439k

128

398k 591k 335k

256 381k 554k 328k

Some observations: 

  • In all cases we reached the maximum TPS at 64 threads, this is somehow expected as we are not generating CPU context switches. 
  • Roughly we get a maximum of 598k tps in n2-highmem-64 and 444k tps in n2d-highmem-64 instance types which are the bigger ones. While this is expected Intel-based architecture outperforms AMD by a 35% 
  • Maximum tps seems to be reached with 64 threads, this is expected considering the number of CPU threads we can use in parallel. 
  • While n1-highmem-64 (Intel Xeon) and n2d-highmem-48 (AMD Epyc) seems to start suffering performance issues when the amount of threads exceeds the max number of cores the bigger instances running with 64 cores are capable to sustain the throughput a bit better, these instances start to be impacted when we reach 4x the amount of CPU cores. 

Let’s have a look at the CPU utilization on each node:

CPU utilization on each node

cpu-n264.png

cpu-n2d48.png

cpu_n2d64.png

Additional observations: 

  • n1-highmem-64 and n2d-highmem-48 are reaching 100% utilization at 64 threads running. 
  • With 64 threads running n2-highmem-64 reaches 100% utilization while n2d-highmem-64 is still below. Although Intel provides better throughput overall probably by having a faster CPU clock (2.8Ghz vs 2.25Ghz) 
  • For 128 and 256 threads all CPUs show similar utilization. 

For the sake of analysis this is the estimated costs of each of used machines (at the moment of writing the post):
n1-highmem-64 $2,035.49/month = $0.000785297/second
n2-highmem-64 $2,549.39/month = $0.000983561/second
n2d-highmem-48 $1,698.54/month = $0.000655301/second
n2d-highmem-64 $2,231.06/month = $0.000860748/second

Costs above will give us roughly at peaks of TPS:
n1-highmem-64 costs are $0.0000000019/trx
n2-highmem-64 costs are $0.0000000016/trx
n2d-highmem-48 costs are $0.0000000019/trx
n2d-highmem-64 costs are $0.0000000019/trx

Conclusions

While this is not a super exhaustive analysis of all implications of CPU performance for MySQL workload we get a very good understanding of cost vs performance analysis. 

  • n1 family, currently used in production, shows very similar performance to n2d family (AMD) when running with the same amount of cores. This changes a lot when we move into the n2 family (Intel) which outperforms all other instances. 
  • While the cut in costs for moving into n2d-highmem-48 will represent ~$4k/year the performance penalty is close to 20%.
  • Comparing the costs per trx at peaks of loads we can see that both n2-64 and n2d-64 are pretty much the same but n2-64 will give us 35% more throughput, this is definitely something to consider if we plan to squeeze the CPU power.   
  • If the consideration is to go with n2 generation then definitely the n2d-highmem-64 is a very good choice to balance performance and costs but n2-highmem-64 will give much better performance per dollar spent. 

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK