New issue

Remove assignments to ZST places instead of marking ZST return place as unused #83177

Conversation

Copy link

Contributor

rust-timer commented 18 days ago

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Copy link

Contributor

bors commented 18 days ago

Trying commit 90562b4 with merge 7ac77ab...

Copy link

Contributor

bors commented 17 days ago

Try build successful - checks-actions
Build commit: 7ac77ab (7ac77ab9f463f60282360fd96138f4c09eb263e8)

Copy link

Collaborator

rust-timer commented 17 days ago

Copy link

Collaborator

rust-timer commented 17 days ago

Finished benchmarking try commit (7ac77ab): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

Copy link

Contributor

Author

erikdesjardins commented 17 days ago

Pushed a change to cache layouts, let's see if it gets better or worse.

Also moved it to a separate pass, since it's a bit different than the other opts in instcombine...let me know if you have a preference for where it should live.

Copy link

Contributor

tmiasko commented 16 days ago •

edited

Looking at MIR diffs of some real world projects, this implementation is definitely more effective at removing ZST assignments than previous one was. Though, it's not demonstrated by any of existing mir-opt tests, so if we want to land this adding and extra one would be nice.

The perf results, both those here and earlies ones, are quite hard to interpret. Unfortunately the most significant impact of this change is one on the size estimates. In a few benchmarks I looked at, the CGU partitioning was changed. This almost surely applies to rustc itself as well. In fact, I suspect that the -3.0% change in ctfe-stress-4 benchmark from earlier perf run were entirely because of this (code that is hot in those benchmarks is optimized differently and CTFE evaluates unoptimized MIR).

The layout computation uses query system, the computation should be cached already, but we can of course try again:

Copy link

Collaborator

rust-timer commented 16 days ago

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Copy link

Contributor

bors commented 16 days ago

Trying commit b6d5b72 with merge 39cf6bc...

Copy link

Collaborator

rust-timer commented 16 days ago

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Copy link

Contributor

bors commented 16 days ago

Try build successful - checks-actions
Build commit: 39cf6bc (39cf6bc137798a38f205e17dc9994bdb2205ba41)

Copy link

Collaborator

rust-timer commented 16 days ago

Copy link

Collaborator

rust-timer commented 16 days ago

Finished benchmarking try commit (39cf6bc): comparison url.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

Copy link

Collaborator

rust-timer commented 14 days ago

Finished benchmarking try commit (a985e90): comparison url.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

compiler/rustc_mir/src/transform/remove_zsts.rs

Outdated

match statement.kind {

StatementKind::Assign(box (place, _)) => {

let place_ty = place.ty(local_decls, tcx).ty;

if let Ok(layout) = tcx.layout_of(param_env.and(place_ty)) {

oli-obk 14 days ago

Contributor

Maybe a fast path for known ZSTs (well, let's start with just ()) could reduce the number of query calls?

Copy link

Contributor

Author

erikdesjardins commented 14 days ago

Added a check to skip layout_of for types which can never be ZSTs.

When compiling std (or whatever gets built during a stage 1 build), the RemoveZsts pass now sees:

855924 total assignments

478400 assignments are skipped by the `maybe_zst` check
353160 assignments are skipped by the `layout_of` check
 24364 assignments are removed due to being of a ZST

I didn't add a fast path for known ZSTs because they make up <10% of the remaining layout_of calls. I can try that, or make the maybe_zst check more precise, if this isn't enough.

Copy link

Contributor

oli-obk commented 13 days ago

I didn't add a fast path for known ZSTs because they make up <10% of the remaining layout_of calls. I can try that, or make the maybe_zst check more precise, if this isn't enough.

so... 90% of zsts are aggregates or user defined? I would have thought a large portion is () or FnDefs

@bors try @rust-timer queue

Copy link

Collaborator

rust-timer commented 13 days ago

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Copy link

Contributor

bors commented 13 days ago

Trying commit 46fd49c with merge bd5d1b9...

Copy link

Contributor

Author

erikdesjardins commented 13 days ago •

edited

90% of zsts are aggregates or user defined?

No, >90% of types we call layout_of on are not ZSTs.

What I assume you meant by "fast path" is

if ty == unit {
    // fast path
} else {
    // slow path
    if let Ok(layout) = layout_of(ty) && layout.is_zst() {
        // slow path success
    } else {
        // slow path failure
    }
}

In my test, the fast path could be hit at most 24k times, if every ZST is (). But the slow path would still be hit at least 353k times, because there are 353k assignments that aren't ZSTs, but aren't ruled out until we check the layout (i.e., we reach "slow path failure" at least 353k times). I don't expect a fast path that's hit <10% of the time (24k / [24k + 353k] ~ 6%) to significantly improve performance.

Unless you meant "add a fast path and remove the slow path entirely", i.e. the optimization only works for (), FnDef, etc., and not struct MyZst;, but I'd prefer not to do that if possible.

Copy link

Contributor

bors commented 13 days ago

Try build successful - checks-actions
Build commit: bd5d1b9 (bd5d1b96f0c64c9938feea831789e1b5bb2cd4a2)

Copy link

Collaborator

rust-timer commented 13 days ago

Copy link

Contributor

oli-obk commented 13 days ago

Unless you meant "add a fast path and remove the slow path entirely", i.e. the optimization only works for (), FnDef, etc., and not struct MyZst;, but I'd prefer not to do that if possible.

I did not mean that. My brain just took a wrong turn somewhere. You're completely right.

Though... we could enable the optimization for FnDef and unit in debug builds and for everything in release builds, but let's look at perf before we resort to such schemes.

Copy link

Collaborator

rust-timer commented 13 days ago

Finished benchmarking try commit (bd5d1b9): comparison url.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

Copy link

Contributor

oli-obk commented 11 days ago

Perf looks very promising. While there's still some regression in servo, that is entirely in LLVM, so we may be optimizing more stuff now, no way to tell without runtime perf tests. Also the LLVM perf test shows a 60% reduction in static_mutability query calls (15k!!!) on the servo test.

@erikdesjardins this looks really good, all that is left is to add a mir-opt-level 3 check in the opt so it doesn't run by default. I think we should do that here and not immediately stabilize, even if I see no reason not to stabilize. The opt doesn't affect anything UB related and is trivial to review. So my proposal is to merge this PR quickly with a level 3 check, and then open a PR removing that check and pinging wg-mir-opt so that everyone can have their say

Copy link

Contributor

oli-obk commented 10 days ago

@bors r+

Copy link

Contributor

bors commented 10 days ago

Commit 6960bc9 has been approved by oli-obk

Copy link

Contributor

bors commented 10 days ago

Testing commit 6960bc9 with merge 79e5814...

Copy link

Contributor

bors commented 10 days ago

Test successful - checks-actions
Approved by: oli-obk
Pushing 79e5814 to master...

Reviewers

oli-obk

Assignees

oli-obk

Projects

None yet

Milestone

1.53.0

Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants

Github Remove assignments to ZST places instead of marking ZST return place as u...

Remove assignments to ZST places instead of marking ZST return place as unused #83177

Conversation

tmiasko commented 18 days ago

rust-timer commented 18 days ago

bors commented 18 days ago

bors commented 17 days ago

rust-timer commented 17 days ago

rust-timer commented 17 days ago

erikdesjardins commented 17 days ago

tmiasko commented 16 days ago •

rust-timer commented 16 days ago

bors commented 16 days ago

rust-timer commented 16 days ago

bors commented 16 days ago

rust-timer commented 16 days ago

rust-timer commented 16 days ago

rust-timer commented 14 days ago

erikdesjardins commented 14 days ago

oli-obk commented 13 days ago

rust-timer commented 13 days ago

bors commented 13 days ago

erikdesjardins commented 13 days ago •

bors commented 13 days ago

rust-timer commented 13 days ago

oli-obk commented 13 days ago

rust-timer commented 13 days ago

oli-obk commented 11 days ago

oli-obk commented 10 days ago

bors commented 10 days ago

bors commented 10 days ago

bors commented 10 days ago

Recommend

About Joyk