Always const-evaluate the GCD in `slice::align_to_offsets` #111296

Conversation

Member

Use an inline const-block to force the compiler to calculate the GCD at compile time, even in debug mode. This shouldn't affect the behavior of the program at all, but it drastically cuts down on the number of instructions emitted with optimizations disabled.

With the current implementation, a single slice::align_to instantiation (specifically <[u8]>::align_to::<u128>()) generates 676 instructions (on x86-64). Forcing the GCD computation to be const cuts it down to 327 instructions, so just over 50% less. This is obviously not representative of actual runtime gains, but I still see it as a significant win as long as it doesn't degrade compile times.

Not having to worry about LLVM const-evaluating the GCD function also allows it to use the textbook recursive euclidean algorithm instead of a much more complicated iterative implementation with multiple unsafe-blocks.

Urgau reacted with thumbs up emojitherealprof reacted with heart emoji

Collaborator

r? @Mark-Simulacrum

(rustbot has picked a reviewer for you, use r? to override)

rustbot

added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

labels

May 6, 2023

Collaborator

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

This comment has been minimized.

rustbot

added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label

May 7, 2023

Contributor

Trying commit b5ee324 with merge 2a2eb58...

r=me with perf results OK

cc @scottmcm, I think you've worked on the GCD code and may be interested.

Contributor

Try build successful - checks-actions
Build commit: 2a2eb58 (2a2eb58ca22fcee5651c02f204da3818f000d4fc)

This comment has been minimized.

library/core/src/slice/mod.rs

		let gcd: usize = const { gcd(mem::size_of::<T>(), mem::size_of::<U>()) };
		let ts: usize = mem::size_of::<U>() / gcd;
		let us: usize = mem::size_of::<T>() / gcd;

Maybe push even a bit more into the const block?

Suggested change

	let gcd: usize = const { gcd(mem::size_of::<T>(), mem::size_of::<U>()) };
	let ts: usize = mem::size_of::<U>() / gcd;
	let us: usize = mem::size_of::<T>() / gcd;
	let (ts, us) = const {
	let gcd: usize = gcd(mem::size_of::<T>(), mem::size_of::<U>());
	(mem::size_of::<U>() / gcd, mem::size_of::<T>() / gcd)
	};

nagisa reacted with thumbs up emoji

Member

Author

Doing that results in more instructions being emitted in debug mode for some reason.

You can also compute ts * us in const too.

Member

Author

I'm not convinced we should move more operations into the const-block if it increases the instruction count though.

Member

Doing that results in more instructions being emitted in debug mode for some reason.

Caveat: small changes in instruction count in a debug build is a subpar proxy for estimating performance of such a build. LLVM can still select instruction sequences that execute in fewer cycles, but require more instructions in a debug mode, so long as it is able to match on IR sequences that trigger those instruction sequences. A prime example of that is e.g. dividing by a constant that produces a strength reduction that involves a multiplication and a shift. These two operations are still way faster to execute compared to a full division. Something similar may be happening here.

More importantly, though, this is all LLVM version dependent. It might do better in versions V and V+1, worse in versions V+2 and V+3 and even better in V+4... In this case I'd say, as long as the difference in machine instruction count isn’t like 10-20% or more, then we should strive to ensure that rustc does as much work as possible before handing it off to LLVM’s consideration.

EDIT: The actually correct thing would be to actually put the generated code through llvm-mca and have it compute the actual execution cost in cycles.

EDIT (part 2): Another thing that might be happening is that moving code into the const block enables LLVM to use FastIsel or somesuch mechanism which might have been not used before for whatever reason, which is just much faster in terms of generating code (even if the results aren't as good execution-time wise.)

Member

Author

If both T and U are ZSTs then doing the divisions in the const-block causes a compile time divide-by-zero error. align_to has an early return if at least one of the types is a ZST, so the divisor will never be zero at runtime, but it looks like the const-block still gets evaluated even though it's technically dead code.

nagisa reacted with thumbs up emoji

Got it, sounds like we have no other choice right now, then. We could look into moving the check for ts == 0 || us == 0 into the const block too and bail in some other reasonable way, but it probably something that warrants a separate PR anyhow.

Member

Nope, wasn't me. Maybe @nagisa ?

Great change, though! A really nice demonstration of a const block being worth it.

Member

Yeah, I “own” anything align_to-related. This is a great change and exactly what I was anticipating this to look like in the future. As @scottmcm pointed out you can still put more stuff into const and it should be possible to make the entire align_to_offsets a const fn (though it probably won't change much, since the argument to that function will always be non-const AFAIR…)

library/core/src/slice/mod.rs

		}
		a << k
		const fn gcd(a: usize, b: usize) -> usize {
		if b == 0 { a } else { gcd(b, a % b) }

Member

Doesn’t this need to check for a == 0 too? There's no guarantee that either of the types won’t be ZST (in isolation.) Or, add a comment.

Member

Author

If a == 0 then it returns gcd(b, 0 % b) which will then return b.

nagisa and cjgillot reacted with thumbs up emoji

Collaborator

Finished benchmarking commit (2a2eb58): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 653.81s -> 653.024s (-0.12%)

rustbot

removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label

May 7, 2023

Member

@bors r=nagisa,Mark-Simulacrum

Contributor

Commit b5ee324 has been approved by nagisa,Mark-Simulacrum

It is now in the queue for this repository.

bors

added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

labels

May 8, 2023

Contributor

Testing commit b5ee324 with merge 90c02c1...

Contributor

Test successful - checks-actions
Approved by: nagisa,Mark-Simulacrum
Pushing 90c02c1 to master...

bors

added the merged-by-bors This PR was explicitly merged by bors label

May 9, 2023

bors

merged commit 90c02c1 into

rust-lang:master

May 9, 2023

12 checks passed

rustbot

added this to the 1.71.0 milestone

May 9, 2023

Collaborator

Finished benchmarking commit (90c02c1): comparison URL.

Overall result: improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean	range	count
Regressions (primary)	-	-	0
Regressions (secondary)	-	-	0
Improvements (primary)	-	-	0
Improvements (secondary)	-0.3%	[-0.3%, -0.2%]	3
All (primary)	-	-	0

Max RSS (memory usage)

Results

Cycles

Results

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 658.017s -> 656.527s (-0.23%)

Sp00ph

deleted the const_gcd branch

May 9, 2023 12:55

Reviewers

nagisa

nagisa left review comments

scottmcm

scottmcm left review comments

Labels

merged-by-bors This PR was explicitly merged by bors S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Milestone

1.71.0

Development

Successfully merging this pull request may close these issues.

None yet

7 participants

Always const-evaluate the GCD in `slice::align_to_offsets` by Sp00ph · Pull Requ...

Always const-evaluate the GCD in `slice::align_to_offsets` #111296

Conversation

This comment has been minimized.

This comment has been minimized.

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Overall result: improvements - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Recommend

Physicists Create Biggest-Ever Schrodinger's Cat - Slashdot

人工智能抢夺了饭碗怎么办？Meta高管：元宇宙能培训失业者再上岗

又有将有卖家落泪！短片《Don’t Hug Me I’m Scared》角色版权维权

Introduce `DynSend` and `DynSync` auto trait for parallel compiler by SparrowLii...

如何通过Meta目录工具，实现高效转化潜在客户？

在 Ubuntu Server 22.04 上安装 gcc-13

艺人口嗨，笑果遭殃

投融快讯 | 中昊芯英获数亿元Pre-B轮融资；鲲游光电完成新一轮数亿元融资；恒瑞源正完...

Implement `EphemeralKeyInterface` for `MockDb` · Issue #1191 · juspay/hyperswitc...

Aqua Security releases Real-Time CSPM to tackle multi-cloud security risks

About Joyk

Always const-evaluate the GCD in `slice::align_to_offsets` by Sp00ph · Pull Requ...

Always const-evaluate the GCD in slice::align_to_offsets #111296

Conversation

This comment has been minimized.

This comment has been minimized.

Overall result: no relevant changes - no action needed

Overall result: improvements - no action needed

Recommend

About Joyk

Always const-evaluate the GCD in `slice::align_to_offsets` #111296