Speedup int log10 branchless by falk-hueffner · Pull Request #88788 · rust-lang/... - JOYK Joy of Geek, Geek News, Link all geek

Conversation

Reviewers

the8472

Assignees

yaahc

Projects

None yet

Milestone

1.57.0

Linked issues

Successfully merging this pull request may close these issues.

None yet

11 participants

This is achieved with a branchless bit-twiddling implementation of the case x < 100_000, and using this as building block.

Benchmark on an Intel i7-8700K (Coffee Lake):

name                                   old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
num::int_log::u8_log10_predictable     165          169                     4    2.42%   x 0.98
num::int_log::u8_log10_random          438          423                   -15   -3.42%   x 1.04
num::int_log::u8_log10_random_small    438          423                   -15   -3.42%   x 1.04
num::int_log::u16_log10_predictable    633          417                  -216  -34.12%   x 1.52
num::int_log::u16_log10_random         908          471                  -437  -48.13%   x 1.93
num::int_log::u16_log10_random_small   945          471                  -474  -50.16%   x 2.01
num::int_log::u32_log10_predictable    1,496        1,340                -156  -10.43%   x 1.12
num::int_log::u32_log10_random         1,076        873                  -203  -18.87%   x 1.23
num::int_log::u32_log10_random_small   1,145        874                  -271  -23.67%   x 1.31
num::int_log::u64_log10_predictable    4,005        3,171                -834  -20.82%   x 1.26
num::int_log::u64_log10_random         1,247        1,021                -226  -18.12%   x 1.22
num::int_log::u64_log10_random_small   1,265        921                  -344  -27.19%   x 1.37
num::int_log::u128_log10_predictable   39,667       39,579                -88   -0.22%   x 1.00
num::int_log::u128_log10_random        6,456        6,696                 240    3.72%   x 0.96
num::int_log::u128_log10_random_small  4,108        3,903                -205   -4.99%   x 1.05

Benchmark on an M1 Mac Mini:

name                                   old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
num::int_log::u8_log10_predictable     143          130                   -13   -9.09%   x 1.10
num::int_log::u8_log10_random          375          325                   -50  -13.33%   x 1.15
num::int_log::u8_log10_random_small    376          325                   -51  -13.56%   x 1.16
num::int_log::u16_log10_predictable    500          322                  -178  -35.60%   x 1.55
num::int_log::u16_log10_random         794          405                  -389  -48.99%   x 1.96
num::int_log::u16_log10_random_small   1,035        405                  -630  -60.87%   x 2.56
num::int_log::u32_log10_predictable    1,144        894                  -250  -21.85%   x 1.28
num::int_log::u32_log10_random         832          786                   -46   -5.53%   x 1.06
num::int_log::u32_log10_random_small   832          787                   -45   -5.41%   x 1.06
num::int_log::u64_log10_predictable    2,681        2,057                -624  -23.27%   x 1.30
num::int_log::u64_log10_random         1,015        806                  -209  -20.59%   x 1.26
num::int_log::u64_log10_random_small   1,004        795                  -209  -20.82%   x 1.26
num::int_log::u128_log10_predictable   56,825       56,526               -299   -0.53%   x 1.01
num::int_log::u128_log10_random        9,056        8,861                -195   -2.15%   x 1.02
num::int_log::u128_log10_random_small  1,528        1,527                  -1   -0.07%   x 1.00

The 128 bit case remains ridiculously slow because llvm fails to optimize division by a constant 128-bit value to multiplications. This could be worked around but it seems preferable to fix this in llvm.

From u32 up, table lookup (like suggested here) is still faster, but requires a hardware leading_zeros to be viable, and might clog up the cache.

Speedup int log10 branchless by falk-hueffner · Pull Request #88788 · rust-lang/...

Conversation

Recommend

Make Result::as_mut const by woppopo · Pull Request #89977 · rust-lang/rust · Gi...

Digital Millennium Copyright Act

Config Connector overview | Config Connector Documentation

Faster Shipping Speeds for Cross-Border Orders

Website Content

The Apple Pencil and iPad Pro are on sale for their best prices to date - The Ve...

Why Does Your Brain Overthink Things?

1.15 kubelet 在 nodefs 容量富裕下循环 reclaim ephemeral-storage

Software Freedom Conservancy's DMCA Exemption Requests Granted

SEM Marketing

About Joyk