AES-NI SSL Performance
source link: https://calomel.org/aesni_ssl_performance.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
AES-NI SSL Performance
a study of AES-NI acceleration using LibreSSL, OpenSSL
The Advanced Encryption Standard Instruction Set (AES-NI) is an extension to the x86 architecture for microprocessors from Intel and AMD. The purpose of AES-NI is to improve the speed of applications performing encryption and decryption using the Advanced Encryption Standard (AES) like the AES-128 and AES-256 ciphers. AES-NI was designed to provide 4x to 8x speed improvements when using AES ciphers for bulk data encryption and decryption.
AES accelerated CPUs can increase efficiency and performance when setting up an SSL Terminator for your HTTP web cluster, a VPN link, a sshfs file system mount or moving bulk data over an SSH connection using scp or rsync.
The following table lists the results of a quick study of various ciphers used on desktop, laptop and mobile devices. The benchmarks focus on the ciphers available to TLS v1.2 and TLS v1.3 connections made by HTTP/2 , HTTPS clients. The ChaCha20 cipher is used as our baseline. ChaCha20 is a 256 bit stream cipher which is not AES accelerated and relies on raw CPU processing power. The other ciphers are 128 bit and 256 bit AES ciphers which are accelerated by the CPU through AES-NI when AES-NI is enabled through the BIOS. LibreSSL (OpenSSL) is used to test all ciphers on various CPUs we have access to. All numbers are in Megabytes per Second (MB/s) per single CPU core. Higher values are better.
Cipher Performance per CPU core
AES Performance per CPU core for TLS v1.2 Ciphers (Higher is Better, Speeds in Megabytes per Second) ChaCha20 AES-128-GCM AES-256-GCM AES-128-CBC AES-256-CBC Total Score AMD Ryzen 7 1800X 573 3006 2642 1513 1101 = 8835 Intel W-2125 565 2808 2426 1698 1235 = 8732 Intel i7-6700 585 2607 2251 1561 1131 = 8135 AMD EPYC 7551 355 2213 1962 1114 811 = 6455 AMD EPYC 7402P 493 2478 2184 1244 907 = 6062 Intel i5-6500 410 1729 1520 1078 783 = 5520 Intel i7-4750HQ 369 1556 1353 688 499 = 4465 AMD FX 8350 367 1453 1278 716 514 = 4328 AMD FX 8150 347 1441 1273 716 515 = 4292 Intel E5-2650 v4 404 1479 1286 652 468 = 4289 Intel i7-2700K 382 1353 1212 763 552 = 4262 Intel i7-3840QM 373 1279 1143 725 520 = 4040 Intel i5-2500K 358 1274 1140 728 522 = 4022 AMD FX 6100 326 1344 1186 671 481 = 4008 AMD A10-7850K 321 1303 1176 685 499 = 3984 AMD A8-7600 Kaveri 306 1246 1108 648 470 = 3778 Intel E5-2640 v3 303 1286 1126 585 419 = 3719 AMD Opteron 6380 293 1203 1063 589 423 = 3571 AMD Opteron 6378 282 1138 986 561 406 = 3373 AMD Opteron 6274 232 1054 926 524 376 = 3112 Intel Xeon E5-2630 247 962 864 541 394 = 3008 Intel Xeon E5645 262 817 717 727 524 = 3047 Intel i7-2635QM 151 989 881 564 404 = 2989 Intel Xeon L5630 225 701 610 626 450 = 2612 Intel E5-2603 v4 236 866 754 382 274 = 2512 AMD Opteron 2382 249 651 485 215 150 = 1750 Intel i7-950 401 256 218 358 257 = 1490 Intel Xeon X5550 287 205 175 305 219 = 1191 AMD Phenom 965 404 84 63 282 198 = 1031 Intel Core2 Q9300 231 126 133 221 161 = 872 AMD X4 610e 225 59 44 198 139 = 665 Intel Core2 Q6600 173 141 79 108 77 = 578 Intel P4 3Ghz Will 109 26 23 55 43 = 256 Intel ATOM D525 98 51 43 28 20 = 240 Snapdragon S4 Pro 131 41 - - - = 172 ARM Cortex A9 73 24 - - - = 97 Testing Notes: AES-NI acceleration enabled if supported by BIOS and CPU Speeds in megabytes per second (MB/s) per real cpu core 8192 byte blocks Five(5) test runs, the average speed reported Snapdragon and ARM Cortex values reported by Google Developers
How do I interpret the results ?
Theoretically, we have a project with a 10 gigabit connection to the internet. 10 gigabits per second is 1,250 megabytes per second. The web page designers are expecting the web server to concurrently encrypt and decrypt data to saturate the 10 gigabit connection. Let us also say 100% of our clients are going to be using the AES-128-GCM based cipher just to make it easier to compare numbers from the table above.
Ideally we would want a CPU which could processes 1,250 MB/s of AES encrypted data per cpu core. Since we need to recieve (decrypt) and send (encrypt) the data we need at least two(2) CPU cores, each able to sustain 1,250 MB/s. From the test results above, any of the CPUs starting with the "AMD Opteron 6380" and faster would work perfectly as the "AMD Opteron 6380" can process 1,203 megabytes per second of AES data per CPU core. Note that the AMD Opteron 6380 is a 16 core CPU which leaves plenty of other CPU cores to do other work like network I/O, firewall rules or ZFS file system work.
In the real world the situation would be more complicated. Clients connect with a variety of ciphers and the system is not dedicated to just cipher processing. It is also possible that the cipher processing of multiple cpu cores can be added together to reach the desired speed. The "Intel Xeon L5630" has four cores and each core could processes 701 MB/s of AES data for a around 2,804 MB/s; just enough speed for encrypting and decrypting data on a 10 gigabit link using AES-128-GCM.
Note that AES-NI is only supported by real CPU cores and not hyper threaded (HT) or virtual cores.
How can I test my own CPU ?
Using the following commands, download and build LibreSSL. The build process statically builds the LibreSSL binaries and libraries in the local directory. No files are installed to the system. Once the build is done, run each of the cipher speed tests with a 30 second sleep in between to make sure the load of the machine reached zero(0). When you are done testing, delete the build directory and everything is cleaned up.
# Note: Ubuntu may need GCC and GNU Make to compile # sudo apt install gcc make cd /tmp wget http://ftp.openbsd.org/pub/OpenBSD/LibreSSL/libressl-3.2.0.tar.gz tar zxvf libressl-3.2.0.tar.gz cd libressl-3.2.0 ./configure && make && echo SUCCESS ./apps/openssl/openssl speed -elapsed -evp chacha sleep 30 ./apps/openssl/openssl speed -elapsed -evp aes-128-gcm sleep 30 ./apps/openssl/openssl speed -elapsed -evp aes-256-gcm sleep 30 ./apps/openssl/openssl speed -elapsed -evp aes-128-cbc sleep 30 ./apps/openssl/openssl speed -elapsed -evp aes-256-cbc
Cipher Speed Test Output Example
The LibreSSL (OpenSSL) cipher speed test will print out a few lines of output per test performed. The value we are interested in is on the last line under the label "8192 bytes". Our interests are focused on bulk data transfers and "8192 bytes" is the largest block test shown. The "8192 bytes" value is the amount of data the CPU can process using the cipher specified in thousands of bytes per second. Divide the value shown by one(1) thousand to get megabytes per second which is the same as our results in the table above.
# use dmesg and search for the cpu type. for example, $ dmesg | grep CPU0 [ 0.120426] smpboot: CPU0: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (fam: 06, model: 5e, stepping: 03) # run the series of cipher speed tests, chacha is first... $ ./apps/openssl/openssl speed -elapsed -evp chacha You have chosen to measure elapsed time instead of user CPU time. Doing chacha for 3s on 16 size blocks: 66892965 chacha's in 3.00s Doing chacha for 3s on 64 size blocks: 25017290 chacha's in 3.00s Doing chacha for 3s on 256 size blocks: 6502076 chacha's in 3.00s Doing chacha for 3s on 1024 size blocks: 1692776 chacha's in 3.00s Doing chacha for 3s on 8192 size blocks: 214511 chacha's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes chacha 356762.48k 533702.19k 554843.82k 577800.87k 585758.04k <---- ... the result is 585758.04k / 1000 = 585 MB/s $ ./apps/openssl/openssl speed -elapsed -evp aes-128-gcm You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-gcm for 3s on 16 size blocks: 134661060 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size blocks: 79432576 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size blocks: 28895019 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 7559486 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size blocks: 954887 aes-128-gcm's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm 718192.32k 1694561.62k 2465708.29k 2580304.55k 2607478.10k <---- ... the result is 2607478.10k / 1000 = 2,607 MB/s $ ./apps/openssl/openssl speed -elapsed -evp aes-256-gcm You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-gcm for 3s on 16 size blocks: 125601150 aes-256-gcm's in 3.00s Doing aes-256-gcm for 3s on 64 size blocks: 75507034 aes-256-gcm's in 3.00s Doing aes-256-gcm for 3s on 256 size blocks: 25591359 aes-256-gcm's in 3.00s Doing aes-256-gcm for 3s on 1024 size blocks: 6547497 aes-256-gcm's in 3.00s Doing aes-256-gcm for 3s on 8192 size blocks: 824454 aes-256-gcm's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-gcm 669872.80k 1610816.73k 2183795.97k 2234878.98k 2251309.06k <---- ... the result is 2251309.06k / 1000 = 2,251 MB/s $ ./apps/openssl/openssl speed -elapsed -evp aes-128-cbc You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-cbc for 3s on 16 size blocks: 250707357 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 64 size blocks: 71204109 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 18108237 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 4563775 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 571798 aes-128-cbc's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 1337105.90k 1519020.99k 1545236.22k 1557768.53k 1561389.74k <---- ... the result is 1561389.74k / 1000 = 1,561 MB/s $ ./apps/openssl/openssl speed -elapsed -evp aes-256-cbc You have chosen to measure elapsed time instead of user CPU time. Doing aes-256-cbc for 3s on 16 size blocks: 185732038 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 64 size blocks: 51745988 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 256 size blocks: 13073843 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 1024 size blocks: 3280738 aes-256-cbc's in 3.00s Doing aes-256-cbc for 3s on 8192 size blocks: 414517 aes-256-cbc's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 990570.87k 1103914.41k 1115634.60k 1119825.24k 1131907.75k <---- ... the result is 1131907.75k / 1000 = 1,131 MB/s
Questions?
Is OpenSSL faster than LibreSSL ?
Yes, both OpenSSL and BoringSSL are significantly faster than LibreSSL when using modern ciphers. LibreSSL is probibly slower due to more locking, no internal crypto devices and single threaded processes with the idea of being more secure. The following window shows a performance query using the elapsed speed tests built into both OpenSSL and LibreSSL. The server has a moderately powerful CPU with AES-NI enabled in the BIOS. The machine is setup with an Intel i5-6500 CPU, FreeBSD 11, with LibreSSL v3.0.1 and OpenSSL v1.1.1a built from source. The results show that OpenSSL is between 2.3x to 6.7x times faster than LibreSSL using ChaCha20 as well as AES-128-GCM and AES-256-GCM. This performance difference is great enough that you would need multiple https servers running Nginx built with LibreSSL to equal the speed of one(1) Nginx server built with OpenSSL.Tip: take a look at the Nginx server resource sizing guide for deploying Nginx on bare metal servers and the Nginx testing methodology. The guide shows graduated hardware configurations and how many requests per second, transactions per second and total throughput an https server could achieve.
AES Performance per CPU core for TLS v1.2 Ciphers (Higher is Better, Speeds in Megabytes per Second) ChaCha20 AES-128-GCM AES-256-GCM AES-128-CBC AES-256-CBC Total Score Intel i5-6500 2762 4900 3554 1067 780 = 13063 OpenSSL v1.1.1a 1760 4455 3370 460 402 = 10447 BoringSSL v2017_12 410 1729 1520 1078 783 = 5520 LibreSSL v3.0.1 ### ############### Testing Results ################## ### dmesg | grep -i CPU CPU: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (3192.14-MHz K8-class CPU) cd /tmp wget http://ftp.openbsd.org/pub/OpenBSD/LibreSSL/libressl-3.0.1.tar.gz tar zxvf libressl-3.0.1.tar.gz cd libressl-3.0.1 ./configure && make && echo SUCCESS ./apps/openssl/openssl speed -elapsed -evp chacha The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes chacha 229894.55k 374728.51k 401326.42k 407606.34k 410545.95k ^^^ ./apps/openssl/openssl speed -elapsed -evp aes-128-gcm The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm 578578.66k 1037298.77k 1496023.55k 1667607.21k 1729668.50k ^^^^ ./apps/openssl/openssl speed -elapsed -evp aes-256-gcm The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-gcm 514792.29k 953548.57k 1340996.10k 1478150.01k 1520833.77k ^^^^ ./apps/openssl/openssl speed -elapsed -evp aes-128-cbc The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 1070909.28k 1059120.83k 1084207.69k 1090894.01k 1078315.69k ^^^^ ./apps/openssl/openssl speed -elapsed -evp aes-256-cbc The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-256-cbc 806110.46k 767273.81k 793146.46k 803538.08k 783499.41k ^^^ cd /tmp wget https://www.openssl.org/source/openssl-1.1.1a.tar.gz tar zxvf openssl-1.1.1a.tar.gz cd openssl-1.1.1a ./config && make cp /tmp/openssl-1.1.1a/libssl.so.1.1 /usr/local/lib/ cp /tmp/openssl-1.1.1a/libcrypto.so.1.1 /usr/local/lib/ ./apps/openssl speed -elapsed -evp chacha20 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes chacha20 320078.35k 547365.25k 1287720.93k 2649847.21k 2762595.49k 2769084.88k ^^^^ ./apps/openssl speed -elapsed -evp aes-128-gcm The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-gcm 453159.25k 1215246.40k 2437021.95k 3909602.78k 4900248.28k 4996923.22k ^^^^ ./apps/openssl speed -elapsed -evp aes-256-gcm The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-gcm 397133.57k 1118061.03k 2050411.88k 3017616.18k 3554319.58k 3603072.56k ^^^^ ./apps/openssl speed -elapsed -evp aes-128-cbc The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 812677.93k 1037389.63k 1066182.04k 1068901.72k 1067816.15k 1074969.69k ^^^^ ./apps/openssl speed -elapsed -evp aes-256-cbc The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-256-cbc 720262.90k 757488.79k 775043.00k 776824.49k 780029.74k 792199.17k git clone https://boringssl.googlesource.com/boringssl cmake -GNinja -DCMAKE_BUILD_TYPE=Release .. && ninja cd build/tools ./bssl speed ... Did 544000 AES-128-GCM (8192 bytes) seal operations in 1000170us (543907.5 ops/sec): 4455.7 MB/s Did 412000 AES-256-GCM (8192 bytes) seal operations in 1001476us (411392.8 ops/sec): 3370.1 MB/s Did 215000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1000321us (214931.0 ops/sec): 1760.7 MB/s ... Did 57000 AES-128-CBC-SHA1 (8192 bytes) seal operations in 1014216us (56201.0 ops/sec): 460.4 MB/s Did 50000 AES-256-CBC-SHA1 (8192 bytes) seal operations in 1018187us (49106.9 ops/sec): 402.3 MB/s
How can I test OpenSSL with AES-NI on and off from the command line ?
Using the "OPENSSL_ia32cap" environmental variable you can force OpenSSL to disable AES-NI acceleration. The following two tests show AES-NI results off and then back on. Notice that without AES-NI, the aes-128-gcm cipher processed data at 212 MB/sec. With AES-NI enabled the same aes-128-gcm cipher speed jumped to 1,357 MB/s ! A six(6) times performance boost.# cpu example type: AMD FX 6100 $ dmesg | grep -i cpu [ 0.277326] smpboot: CPU0: AMD FX(tm)-6100 Six-Core Processor (fam: 15, model: 01, stepping: 02) OpenSSL AES-NI = OFF $ OPENSSL_ia32cap="~0x200000200000000" openssl speed -elapsed -evp aes-128-gcm You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-gcm for 3s on 16 size blocks: 11810234 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size blocks: 3458208 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size blocks: 2269863 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 612727 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size blocks: 77820 aes-128-gcm's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm 62987.91k 73775.10k 193694.98k 209144.15k 212500.48k ... the result is 212500.48k / 1000 = 212 MB/s OpenSSL AES-NI = ON $ openssl speed -elapsed -evp aes-128-gcm You have chosen to measure elapsed time instead of user CPU time. Doing aes-128-gcm for 3s on 16 size blocks: 47814322 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 64 size blocks: 32192031 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 256 size blocks: 13198683 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 1024 size blocks: 3757898 aes-128-gcm's in 3.00s Doing aes-128-gcm for 3s on 8192 size blocks: 497117 aes-128-gcm's in 3.00s The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-gcm 255009.72k 686763.33k 1126287.62k 1282695.85k 1357460.82k ... the result is 1357460.82k / 1000 = 1,357 MB/s
How can I test a remote server cipher ?
Use the openssl s_client tool and query a remote server. You can let the client and server choose the most preferred cipher or you can specify the exact cipher name you want to use during the connection.# Test calomel.org using the client/server negotiated cipher echo -n | ./apps/openssl/openssl s_client -connect calomel.org:443 # Test calomel.org using the ChaCha cipher echo -n | ./apps/openssl/openssl s_client -cipher ECDHE-ECDSA-CHACHA20-POLY1305 -connect calomel.org:443 # Test calomel.org using the AES-128-GCM cipher echo -n | ./apps/openssl/openssl s_client -cipher ECDHE-ECDSA-AES128-GCM-SHA256 -connect calomel.org:443
Recommend
-
40
-
37
说起加密,通常分为对称加密和非对称加密,所谓对称加密中的对称,指的是加密和解密使用的是同一个密钥,如此说来什么是非对称就不用我多做解释了。对称加密相对于非对称加密而言,优点是速度快,缺点是安全性相对低一点,不过只要能保证...
-
70
-
50
By Lane Wagner With quantum computers getting more powerful every year, many worry about the safety of modern encryption standards. As quantum computers improve in performance and the number of
-
16
Seeing the Penguin in AES-ECB Anthony Biondo 2020-04-04Filed under:SecurityProgramming To get it out of the way: don’t roll your own crypto, and don’t use ECB mode if you somehow find yours...
-
39
做360广告的对接需要对密码进行AES加密,下面是点睛平台文档的描述: (AES模式为CBC,加密算法MCRYPT_RIJNDAEL_128)对MD5加密后的密码实现对称加密。秘钥是apiSecret 的前16位,向量是后16位,加密结果为64位数字和小写字母。
-
18
JS逆向- AES 案例解析汇总(一) 2020-07-16 | 2020-09-24 ...
-
7
一、AES简介 密码学中的高级加密标准(Advanced Encryption Standard,AES),又称Rijndael加密法,这个标准用来替代原先的DES。AES加密数据块分组长度必须为128bit,密钥长度可以是128bit、192bit、256bit中的任意一个。 AES也是对称加密算法。...
-
8
Introduction The third-party cryptography package in Python provides tools to encrypt byte using a key. The same key that encrypts is used to decrypt, which is why they call it symmetric encryption.
-
5
AWS Graviton3 delivers leading AES-GCM encryption performance
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK