6

Github Import small cold functions by tmiasko · Pull Request #82980 · rust-lang/...

 3 years ago
source link: https://github.com/rust-lang/rust/pull/82980
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Conversation

Copy link

Contributor

tmiasko commented 23 days ago

The Rust code is often written under an assumption that for generic
methods inline attribute is mostly unnecessary, since for optimized
builds using ThinLTO, a method will be code generated in at least one
CGU and available for import.

For example, deref implementations for Box, Vec, MutexGuard, and
MutexGuard are not currently marked as inline, neither is identity
implementation of From trait.

In PGO builds, when functions are determined to be cold, the default
multiplier of zero will stop the import, no matter how trivial the
implementation.

Increase slightly the default multiplier from 0 to 0.1.

r? @ghost

Copy link

Contributor

Author

tmiasko commented 23 days ago

Copy link

Collaborator

rust-timer commented 23 days ago

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Copy link

Contributor

bors commented 23 days ago

hourglass Trying commit 95b36e9 with merge df0df90...

Copy link

Contributor

bors commented 23 days ago

sunny Try build successful - checks-actions
Build commit: df0df90 (df0df906bd16e3d83f3f52a984101dc638c848e3)

Copy link

Collaborator

rust-timer commented 23 days ago

Copy link

Collaborator

rust-timer commented 23 days ago

Finished benchmarking try commit (df0df90): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

Copy link

Contributor

Author

tmiasko commented 23 days ago

edited

The perf result look quite promising, with small improvements in instruction counts, wall-time and max-rss. The change also reduced size of librustc_driver.so from 102.5 MiB to 97.1 MiB.

The main premise behind the change is that with PGO, we will have cold functions with negative inline cost, and we should continue inlining them. I don't know if this is something we would like to land in this form, but could be interesting from Rust PGO perspective, so

r? @michaelwoerister

tmiasko

force-pushed the

tmiasko:import-cold-multiplier

branch from 95b36e9 to 1aee808

22 days ago

tmiasko

changed the title Use small non-zero import-cold-multiplier

Import small cold functions

22 days ago

Copy link

Contributor

michaelwoerister commented 21 days ago

I'll take a look at this next week. Thanks for the PR, @tmiasko!

Copy link

Contributor

michaelwoerister commented 7 days ago

Hm, the numbers look like an improvement overall. However, I'm wondering, in the PGO case, wouldn't the compiler have accurate information about the coldness of a function? In which case it should actually be beneficial not to import cold functions?

On the other hand, as you say, a multiplier of 0.1 would be a strong discouragement for importing a given function while still allowing a decision to be made based on other factors such as size.

I think we can just merge this. It's just a small tweak and does not seem to a negative impact on LLVM times.

@bors r+

Copy link

Contributor

bors commented 7 days ago

pushpin Commit 1aee808 has been approved by michaelwoerister

Copy link

Contributor

bors commented 7 days ago

hourglass Testing commit 1aee808 with merge e423058...

Copy link

Contributor

bors commented 7 days ago

sunny Test successful - checks-actions
Approved by: michaelwoerister
Pushing e423058 to master...

bors

merged commit e423058 into rust-lang:master 7 days ago

11 checks passed

rustbot

added this to the 1.53.0 milestone

7 days ago

Copy link

Contributor

Author

tmiasko commented 7 days ago

Inlining trivial functions often reduces caller code size, and inlining heuristics for cold functions try to avoid inlining otherwise, so yes it should be beneficial even with completely accurate profile information. For example, an error handling code path using ? taken from rustc MIR interpreter, before:

             → callq  *0x20446f5(%rip)        # 3e1a678 <<rustc_middle::mir::interpret::error::InterpErrorInfo as core::convert::From<rustc_middle::mir::interpret::error::InterpError>>::from@@Base+0x1ba93a8>
               mov    %rax,%rdi
             → callq  <T as core::convert::From<T>>::from
               mov    %rax,%rdi
        1ce: → callq  <T as core::convert::From<T>>::from
               mov    %rax,%rdi
             → callq  <T as core::convert::From<T>>::from
               mov    $0x1,%sil
             ↑ jmpq   115     

and after:

             → callq  *0x1f22ee0(%rip)        # 3a0ae68 <<rustc_middle::mir::interpret::error::InterpErrorInfo as core::convert::From<rustc_middle::mir::interpret::error::InterpError>>::from@@Base+0x1a80f38>
        1c8:   mov    $0x1,%sil
             ↑ jmpq   116     

tmiasko

deleted the

tmiasko:import-cold-multiplier

branch

7 days ago

Copy link

Contributor

rylev commented yesterday

@tmiasko @michaelwoerister Can you talk a bit more about the performance regressions seen here? Some of the regressions are somewhat severe, so it would be nice to talk through them a bit more. Why is the performance regression worth it?

Copy link

Contributor

michaelwoerister commented yesterday

The main change here is that we allow LLVM to import functions across module boundaries during ThinLTO even if a given function has been marked as cold. This allows LLVM to inline trivial functions (like the example in the comment above), leading to an overall reduction in code size. LLVM still has the ability to not do inlining for such functions based on its usual heuristics (in which case the effort put into import the function was wasted). The change should lead to an overall quality improvement for all machine code being produced via ThinLTO (i.e. currently all --release done via cargo).

Because this PR re-jiggers LLVM inlining decisions it can be expected to have a mixed influence on compile times. Since that effect seems to be positive overall (with the exception of two small synthetic test cases) I think this change is a net win.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers

No reviews

Projects

None yet

Milestone

1.53.0

Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK