4

Fast byte searching with SIMD without a ton of boilerplate?

 1 year ago
source link: https://lobste.rs/s/9q6rnm/fast_byte_searching_with_simd_without_ton
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Fast byte searching with SIMD without a ton of boilerplate?

TL;DR: I’m looking for a SIMD abstraction layer that will let me do some high-speed byte searching without having to worry about a dozen different instruction sets. Not having luck finding such.

I’m optimizing some JSON-parsing code, and by far the biggest bottleneck is string searching, specifically searching for the next occurrence of one of two or more characters. For example, when scanning a string I search for both double-quote and backslash. (Plus 00 if the JSON has not been validated.)

  • C has a function to do this, strcspn, but it’s slow.
  • A naive loop testing every char with several == comparisons is faster.
  • indexing each byte in a 256-item lookup table is still faster.
  • Unrolling the loop to look at 4 bytes in sequence is even faster (stole this from sa-json.)
  • “Homemade SIMD” by reading a uint32 or uint64 and doing bitwise tricks to detect any of those bytes is significantly slower, sadly.
  • [Goldilocks solution goes here]
  • The SIMDJSON library is super fast but has a whole bunch of complex code — it has to implement the core loops six or seven times for specific instruction sets, and then figure out at runtime which of those to call. I would rather avoid dragging all that in.

Yesterday I was experimenting with Apple’s “simd” library, which is based on Clang vector extensions, and it does pretty well, but I am not sure if this is portable to non-Apple systems or non-Clang compilers. (GCC has its own extensions, which Clang partially supports, but when I tried those Clang generated slow non-SIMD code.)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK