Github Import small cold functions by tmiasko · Pull Request #82980 · rust-lang/... - JOYK Joy of Geek, Geek News, Link all geek

Conversation

Copy link

Contributor

tmiasko commented 23 days ago

The Rust code is often written under an assumption that for generic
methods inline attribute is mostly unnecessary, since for optimized
builds using ThinLTO, a method will be code generated in at least one
CGU and available for import.

For example, deref implementations for Box, Vec, MutexGuard, and
MutexGuard are not currently marked as inline, neither is identity
implementation of From trait.

In PGO builds, when functions are determined to be cold, the default
multiplier of zero will stop the import, no matter how trivial the
implementation.

Increase slightly the default multiplier from 0 to 0.1.

r? @ghost

Copy link

Contributor

Author

tmiasko commented 23 days ago

Copy link

Collaborator

rust-timer commented 23 days ago

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Copy link

Contributor

bors commented 23 days ago

Trying commit 95b36e9 with merge df0df90...

Copy link

Contributor

bors commented 23 days ago

Try build successful - checks-actions
Build commit: df0df90 (df0df906bd16e3d83f3f52a984101dc638c848e3)

Copy link

Collaborator

rust-timer commented 23 days ago

Copy link

Collaborator

rust-timer commented 23 days ago

Finished benchmarking try commit (df0df90): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

Copy link

Contributor

Author

tmiasko commented 23 days ago •

edited

The perf result look quite promising, with small improvements in instruction counts, wall-time and max-rss. The change also reduced size of librustc_driver.so from 102.5 MiB to 97.1 MiB.

The main premise behind the change is that with PGO, we will have cold functions with negative inline cost, and we should continue inlining them. I don't know if this is something we would like to land in this form, but could be interesting from Rust PGO perspective, so

r? @michaelwoerister

tmiasko

force-pushed the

tmiasko:import-cold-multiplier

branch from 95b36e9 to 1aee808

22 days ago

tmiasko

changed the title Use small non-zero import-cold-multiplier

Import small cold functions

22 days ago

Copy link

Contributor

michaelwoerister commented 21 days ago

I'll take a look at this next week. Thanks for the PR, @tmiasko!

Copy link

Contributor

michaelwoerister commented 7 days ago

Hm, the numbers look like an improvement overall. However, I'm wondering, in the PGO case, wouldn't the compiler have accurate information about the coldness of a function? In which case it should actually be beneficial not to import cold functions?

On the other hand, as you say, a multiplier of 0.1 would be a strong discouragement for importing a given function while still allowing a decision to be made based on other factors such as size.

I think we can just merge this. It's just a small tweak and does not seem to a negative impact on LLVM times.

@bors r+

Copy link

Contributor

bors commented 7 days ago

Commit 1aee808 has been approved by michaelwoerister

Copy link

Contributor

bors commented 7 days ago

Testing commit 1aee808 with merge e423058...

Copy link

Contributor

bors commented 7 days ago

Test successful - checks-actions
Approved by: michaelwoerister
Pushing e423058 to master...

bors

merged commit e423058 into rust-lang:master 7 days ago

11 checks passed

rustbot

added this to the 1.53.0 milestone

7 days ago

Copy link

Contributor

Author

tmiasko commented 7 days ago

Inlining trivial functions often reduces caller code size, and inlining heuristics for cold functions try to avoid inlining otherwise, so yes it should be beneficial even with completely accurate profile information. For example, an error handling code path using ? taken from rustc MIR interpreter, before:

             → callq  *0x20446f5(%rip)        # 3e1a678 <<rustc_middle::mir::interpret::error::InterpErrorInfo as core::convert::From<rustc_middle::mir::interpret::error::InterpError>>::from@@Base+0x1ba93a8>
               mov    %rax,%rdi
             → callq  <T as core::convert::From<T>>::from
               mov    %rax,%rdi
        1ce: → callq  <T as core::convert::From<T>>::from
               mov    %rax,%rdi
             → callq  <T as core::convert::From<T>>::from
               mov    $0x1,%sil
             ↑ jmpq   115

and after:

             → callq  *0x1f22ee0(%rip)        # 3a0ae68 <<rustc_middle::mir::interpret::error::InterpErrorInfo as core::convert::From<rustc_middle::mir::interpret::error::InterpError>>::from@@Base+0x1a80f38>
        1c8:   mov    $0x1,%sil
             ↑ jmpq   116

tmiasko

deleted the

tmiasko:import-cold-multiplier

branch

7 days ago

Copy link

Contributor

rylev commented yesterday

@tmiasko @michaelwoerister Can you talk a bit more about the performance regressions seen here? Some of the regressions are somewhat severe, so it would be nice to talk through them a bit more. Why is the performance regression worth it?

Copy link

Contributor

michaelwoerister commented yesterday

The main change here is that we allow LLVM to import functions across module boundaries during ThinLTO even if a given function has been marked as cold. This allows LLVM to inline trivial functions (like the example in the comment above), leading to an overall reduction in code size. LLVM still has the ability to not do inlining for such functions based on its usual heuristics (in which case the effort put into import the function was wasted). The change should lead to an overall quality improvement for all machine code being produced via ThinLTO (i.e. currently all --release done via cargo).

Because this PR re-jiggers LLVM inlining decisions it can be expected to have a mixed influence on compile times. Since that effect seems to be positive overall (with the exception of two small synthetic test cases) I think this change is a net win.

Reviewers

No reviews

Projects

None yet

Milestone

1.53.0

Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants

Github Import small cold functions by tmiasko · Pull Request #82980 · rust-lang/...

Conversation

tmiasko commented 23 days ago

tmiasko commented 23 days ago

rust-timer commented 23 days ago

bors commented 23 days ago

bors commented 23 days ago

rust-timer commented 23 days ago

rust-timer commented 23 days ago

tmiasko commented 23 days ago •

michaelwoerister commented 21 days ago

michaelwoerister commented 7 days ago

bors commented 7 days ago

bors commented 7 days ago

bors commented 7 days ago

tmiasko commented 7 days ago

rylev commented yesterday

michaelwoerister commented yesterday

Recommend

用于自动驾驶的实时车道线检测和智能告警

一文详解激光点云的地面分割

Github Allow qualified paths in struct construction (both expressions and patter...

Github reduce threads spawned by ui-tests by the8472 · Pull Request #81942 · rus...

bigendian support for ICU

Github RFC: Add a standard trait for getting many &mut to places by Nokel81...

Github Remove assignments to ZST places instead of marking ZST return place as u...

Callbell | Customer Communication Platform for WhatsApp, Facebook Messenger, Ins...

NFT 使他想起了早期互联网，并且这种力量已经出现了｜茶馆速递

Alfred、Jeffrey ：合作数十年，他们让计算机读懂码农代码丨启元社

About Joyk