Intel Publishes Blazing Fast AVX-512 Sorting Library, Numpy Switching To It For 10~17x Faster Sorts

Written by Michael Larabel in Intel on 15 February 2023 at 04:00 PM EST. 12 Comments

Intel recently published an open-source C++ header file library for high performance SIMD-based sorting, which initially is focused on providing a lightning fast AVX-512 quicksort implementation. As of today that code has been merged to Numpy and is providing some 10~17x speed-ups.

Toward the end of last year Intel quietly made available x86-simd-sort via their GitHub account. It's a C++ header file library for high performance SIMD sorting though in its current form is just focused on an AVX-512 quicksort implementation.

There hasn't been much coverage of this x86-simd-sort project and the GitHub page itself doesn't do much to talk up the crazy fast performance potential of AVX-512 for sorting... But now by way of the widely-used Numpy open-source project, there is prominent use of it and achieving some staggering results.

Merged today into Numpy was PR 22315 to vectorize the quicksort for 16-bit and 64-bit data types using AVX-512. On an Intel Tigerlake system this sped-up 16-bit int sorting by 17x while float 64-bit sorting by nearly 10x for random arrays and 32-bit data types were 12~13x faster sorts. This Numpy change was made by Intel engineer Raghuveer Devulapalli and is leveraging the x86-simd-sort code.

Intel and AMD AVX-512 enabled processors

A speed-up worth celebrating... From multi-vendor support to more efficient AVX-512 implementations on newer processors to more robust software use, there is a lot to enjoy around AVX-512 these days.A 10~17x speed-up for sorting with AVX-512 is pretty astonishing, especially when factoring in the better AVX-512 efficiencies with recent generations of Intel CPUs. With the latest Xeon Scalable processors the thermal and power impact of AVX-512 is no longer too great or causing significant CPU down-clocking as it was panned for in the past, but is in rather good shape. See my recent Intel Xeon "Sapphire Rapids" AVX-512 benchmarks that includes the power efficiency It's too bad though that the latest Intel Core client processors no longer are offering AVX-512. Meanwhile over on the AMD side with their Zen 4 processors from the Ryzen 7000 series through the 4th Gen EPYC server processors is (finally) AVX-512 support.

It will be interesting to see what other software projects decide to make use of this x86-simd-sort for speedy AVX-512 sorting. It's another notable win for Advanced Vector Extensions 512 similar to how last year simdjson tapped AVX-512 for very fast JSON parsing as something one would normally not think of immediately as a great use-case for AVX-512.

12 Comments

Intel Publishes Blazing Fast AVX-512 Sorting Library, Numpy Switching To It For...

Intel Publishes Blazing Fast AVX-512 Sorting Library, Numpy Switching To It For 10~17x Faster Sorts

Recommend

The US Airforce may have shot down an Amateur Radio Pico Balloon over Canada

The World’s Wealthiest Women Billionaires (February 14, 2023)

uBlox UBX checksum calculator

ChatGPT接入Siri指南来了/必应下载量进前五/ 特斯拉被曝监控员工致不敢上厕所…今日更...

午报｜淘宝启动2023丑东西大赛；多个平台官宣接入百度文心一言；蔚来手机预计很快上市

Shake Shack to report Q4 earnings, following preliminary results: Here's what to...

image resize in github flavored markdown.

如何选择合适的多语言商城系统

Former Stanford Law School dean co-signed bail bond for FTX's Bankman-Fried

Just enough police presence: Reducing crime and disorderly behavior by optimizin...

About Joyk