Github Remove assignments to ZST places instead of marking ZST return place as u...
source link: https://github.com/rust-lang/rust/pull/83177
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
New issue
Remove assignments to ZST places instead of marking ZST return place as unused #83177
Conversation
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
Try build successful - checks-actions
Build commit: 7ac77ab (7ac77ab9f463f60282360fd96138f4c09eb263e8
)
Finished benchmarking try commit (7ac77ab): comparison url.
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup-
to bors.
Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.
@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf
Pushed a change to cache layouts, let's see if it gets better or worse.
Also moved it to a separate pass, since it's a bit different than the other opts in instcombine...let me know if you have a preference for where it should live.
Looking at MIR diffs of some real world projects, this implementation is definitely more effective at removing ZST assignments than previous one was. Though, it's not demonstrated by any of existing mir-opt tests, so if we want to land this adding and extra one would be nice.
The perf results, both those here and earlies ones, are quite hard to interpret. Unfortunately the most significant impact of this change is one on the size estimates. In a few benchmarks I looked at, the CGU partitioning was changed. This almost surely applies to rustc itself as well. In fact, I suspect that the -3.0% change in ctfe-stress-4 benchmark from earlier perf run were entirely because of this (code that is hot in those benchmarks is optimized differently and CTFE evaluates unoptimized MIR).
The layout computation uses query system, the computation should be cached already, but we can of course try again:
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
Try build successful - checks-actions
Build commit: 39cf6bc (39cf6bc137798a38f205e17dc9994bdb2205ba41
)
Finished benchmarking try commit (39cf6bc): comparison url.
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup-
to bors.
Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.
@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf
Finished benchmarking try commit (a985e90): comparison url.
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup-
to bors.
Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.
@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf
match statement.kind {
StatementKind::Assign(box (place, _)) => {
let place_ty = place.ty(local_decls, tcx).ty;
if let Ok(layout) = tcx.layout_of(param_env.and(place_ty)) {
oli-obk 14 days ago
Contributor
Maybe a fast path for known ZSTs (well, let's start with just ()
) could reduce the number of query calls?
Added a check to skip layout_of
for types which can never be ZSTs.
When compiling std (or whatever gets built during a stage 1 build), the RemoveZsts pass now sees:
855924 total assignments
478400 assignments are skipped by the `maybe_zst` check
353160 assignments are skipped by the `layout_of` check
24364 assignments are removed due to being of a ZST
I didn't add a fast path for known ZSTs because they make up <10% of the remaining layout_of
calls. I can try that, or make the maybe_zst
check more precise, if this isn't enough.
I didn't add a fast path for known ZSTs because they make up <10% of the remaining
layout_of
calls. I can try that, or make themaybe_zst
check more precise, if this isn't enough.
so... 90% of zsts are aggregates or user defined? I would have thought a large portion is ()
or FnDefs
@bors try @rust-timer queue
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
90% of zsts are aggregates or user defined?
No, >90% of types we call layout_of
on are not ZSTs.
What I assume you meant by "fast path" is
if ty == unit { // fast path } else { // slow path if let Ok(layout) = layout_of(ty) && layout.is_zst() { // slow path success } else { // slow path failure } }
In my test, the fast path could be hit at most 24k times, if every ZST is ()
. But the slow path would still be hit at least 353k times, because there are 353k assignments that aren't ZSTs, but aren't ruled out until we check the layout (i.e., we reach "slow path failure" at least 353k times). I don't expect a fast path that's hit <10% of the time (24k / [24k + 353k] ~ 6%) to significantly improve performance.
Unless you meant "add a fast path and remove the slow path entirely", i.e. the optimization only works for ()
, FnDef, etc., and not struct MyZst;
, but I'd prefer not to do that if possible.
Try build successful - checks-actions
Build commit: bd5d1b9 (bd5d1b96f0c64c9938feea831789e1b5bb2cd4a2
)
Unless you meant "add a fast path and remove the slow path entirely", i.e. the optimization only works for
()
, FnDef, etc., and notstruct MyZst;
, but I'd prefer not to do that if possible.
I did not mean that. My brain just took a wrong turn somewhere. You're completely right.
Though... we could enable the optimization for FnDef and unit in debug builds and for everything in release builds, but let's look at perf before we resort to such schemes.
Finished benchmarking try commit (bd5d1b9): comparison url.
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup-
to bors.
Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.
@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf
Perf looks very promising. While there's still some regression in servo, that is entirely in LLVM, so we may be optimizing more stuff now, no way to tell without runtime perf tests. Also the LLVM perf test shows a 60% reduction in static_mutability
query calls (15k!!!) on the servo test.
@erikdesjardins this looks really good, all that is left is to add a mir-opt-level 3 check in the opt so it doesn't run by default. I think we should do that here and not immediately stabilize, even if I see no reason not to stabilize. The opt doesn't affect anything UB related and is trivial to review. So my proposal is to merge this PR quickly with a level 3 check, and then open a PR removing that check and pinging wg-mir-opt so that everyone can have their say
@bors r+
Commit 6960bc9 has been approved by oli-obk
Test successful - checks-actions
Approved by: oli-obk
Pushing 79e5814 to master...
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK