7

Intel Publishes Blazing Fast AVX-512 Sorting Library, Numpy Switching To It For...

 1 year ago
source link: https://www.phoronix.com/news/Intel-AVX-512-Quicksort-Numpy
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Intel Publishes Blazing Fast AVX-512 Sorting Library, Numpy Switching To It For 10~17x Faster Sorts

Written by Michael Larabel in Intel on 15 February 2023 at 04:00 PM EST. 12 Comments

Intel recently published an open-source C++ header file library for high performance SIMD-based sorting, which initially is focused on providing a lightning fast AVX-512 quicksort implementation. As of today that code has been merged to Numpy and is providing some 10~17x speed-ups.

Toward the end of last year Intel quietly made available x86-simd-sort via their GitHub account. It's a C++ header file library for high performance SIMD sorting though in its current form is just focused on an AVX-512 quicksort implementation.

There hasn't been much coverage of this x86-simd-sort project and the GitHub page itself doesn't do much to talk up the crazy fast performance potential of AVX-512 for sorting... But now by way of the widely-used Numpy open-source project, there is prominent use of it and achieving some staggering results.

Merged today into Numpy was PR 22315 to vectorize the quicksort for 16-bit and 64-bit data types using AVX-512. On an Intel Tigerlake system this sped-up 16-bit int sorting by 17x while float 64-bit sorting by nearly 10x for random arrays and 32-bit data types were 12~13x faster sorts. This Numpy change was made by Intel engineer Raghuveer Devulapalli and is leveraging the x86-simd-sort code.

Intel and AMD AVX-512 enabled processors

A speed-up worth celebrating... From multi-vendor support to more efficient AVX-512 implementations on newer processors to more robust software use, there is a lot to enjoy around AVX-512 these days.A 10~17x speed-up for sorting with AVX-512 is pretty astonishing, especially when factoring in the better AVX-512 efficiencies with recent generations of Intel CPUs. With the latest Xeon Scalable processors the thermal and power impact of AVX-512 is no longer too great or causing significant CPU down-clocking as it was panned for in the past, but is in rather good shape. See my recent Intel Xeon "Sapphire Rapids" AVX-512 benchmarks that includes the power efficiency It's too bad though that the latest Intel Core client processors no longer are offering AVX-512. Meanwhile over on the AMD side with their Zen 4 processors from the Ryzen 7000 series through the 4th Gen EPYC server processors is (finally) AVX-512 support.

It will be interesting to see what other software projects decide to make use of this x86-simd-sort for speedy AVX-512 sorting. It's another notable win for Advanced Vector Extensions 512 similar to how last year simdjson tapped AVX-512 for very fast JSON parsing as something one would normally not think of immediately as a great use-case for AVX-512.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK