1

Use 128 bits for TypeId hash by thomcc · Pull Request #109953 · rust-lang/rust ·...

 1 year ago
source link: https://github.com/rust-lang/rust/pull/109953
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Use 128 bits for TypeId hash #109953

Conversation

Member

@thomcc thomcc

commented

Apr 5, 2023

edited
scottmcm, eddyb, Nilstrieb, memoryruins, KisaragiEffective, zohnannor, and sofiaritz reacted with thumbs up emojicompiler-errors and alecmocatta reacted with heart emoji

Collaborator

r? @davidtwco

(rustbot has picked a reviewer for you, use r? to override)

rustbot

added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

labels

Apr 5, 2023

Member

Author

thomcc

commented

Apr 5, 2023

edited

There are a couple test failures that hit locally. Just two, really: https://gist.github.com/thomcc/514a6b46f9636af96cf0c5f9fdd0f37f. Not sure about them. These were caused by tracing-subscriber depending on TypeId's size.

Other things I wasn't sure about:

  1. Why do I need to handle the intrinsic in rustc_codegen_llvm (I thought that it should be lowered already, since I did so in rustc_codegen_ssa). But if I don't handle there, it complains that I haven't lowered it yet, so I handled it in both places. This seems to have been my imaginination.
  2. If there's some way to avoid adding type_id128 as an intrinsic, and instead just change the return type of type_id. I couldn't see a clean way to make that work in the compiler with bootstrapping, but maybe I didn't try hard enough or just confused myself with it. It turns out I had confused myself.

Member

Author

Oh, it fails because tracing-subscriber relies on TypeId being u64. :|

Nilstrieb and Veykril reacted with laugh emoji

Member

Author

thomcc

commented

Apr 5, 2023

edited

I've added a hacky impl Hash for TypeId which should avoid upsetting tracing-subscriber. I don't really know if that's the approach we want to use long-term (t-libs question -- this might be more efficient in many hashers, honestly), but this probably will fix those test failures.

Member

Author

Going to do a perf run, since it was mentioned as a concern.

@bors try @rust-timer queue

This comment has been minimized.

rustbot

added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label

Apr 5, 2023

Contributor

hourglass Trying commit 50ae15e with merge 95d90d2...

Contributor

sunny Try build successful - checks-actions
Build commit: 95d90d2 (95d90d2f6d4495fee62f69cf14fe3ec6d009965e)

This comment has been minimized.

Collaborator

Finished benchmarking commit (95d90d2): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

Cycles

Results

rustbot

removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label

Apr 5, 2023

thomcc

marked this pull request as ready for review

April 5, 2023 09:18

Collaborator

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

// (especially given the previous point about the lower 64 bits being

// high quality on their own).

//

// That said, this is a bit of a hack. (FIXME(thom): nominate this for

Member

Author

Don't land without me doing this (but it probably would be premature before the MCP completes).

ping: the MCP was completed, do you still want to nominate this for T-libs?

(IMO this impl can be changed after/asynchronously to this PR — as discussed here previously, this implementation is totally fine for the Hash trait)

Contributor

cc @alice-i-cecile maybe this should be benchmarked with Bevy?

Member

cc-ing @Badel2 @CAD97 @m-ou-se @digama0 @RalfJung @nikomatsakis @joshtriplett (based on comments you each had on #95845 regarding hashes - apologies if this PR isn't relevant, but I wanted to at least connect those dots)

Member

@davidtwco davidtwco

left a comment

Implementation LGTM

Contributor

Wow, so excited to see this finally being fixed!

Regarding hashing only the lower u64 in the Hash impl, that it is a common and perfectly valid way to truncate a cryptographically secure hash function, it is even in the standard for SHA-2. And even if the hash function used by TypeId is not cryptographically secure, the worst attack which is creating hash collisions will have the same cost as if we were hashing the full u128 since the output of Hash::finish is a u64 anyway.

The only problem is that it is possible to create a TypeId using transmute, and in that case creating a collision is free because we can modify the output of the first hash function. (transmute TypeId to u128, modify without touching lower u64, transmute back to TypeId). The impact of that problem is that you can make a hashmap O(n^2), but only in your own code, so I will let someone else decide if it is worth to fix it.

After taking a look at the offending code that breaks, here: https://github.com/tokio-rs/tracing/blob/3392be93cb9262a5da54c23e5772e79a8736d50e/tracing-subscriber/src/registry/extensions.rs#L27

I propose to just call write_u64 twice. This will avoid worrying about the hash collisions created using transmute, and it will not break tracing-subscriber, which will only use the input of the second call as the hash.

Also the last time I remember talking about this, the problem was that some crates are using transmute between TypeId and u64, how is that fixed here? Was that fixed in those crates at some point?

Thank you for moving this issue forward!

Member

Author

I propose to just call write_u64 twice. This will avoid worrying about the hash collisions created using transmute, and it will not break tracing-subscriber, which will only use the input of the second call as the hash.

This is what I did initially, after looking at the same code. That said, I find @eddyb's argument that it's fine to just use 64 bits pretty compelling.

the worst attack which is creating hash collisions will have the same cost as if we were hashing the full u128 since the output of Hash::finish is a u64 anyway

We're not really concerned about attacks, so much as accidental collisions.

the problem was that some crates are using transmute between TypeId and u64, how is that fixed here

It's not. Those crates are going to be broken no matter how we fix this issue (even the string pointer approach probably won't work right for them, although I suppose on 64 bit targets they might be less broken), the only way to keep them working is to decide this never changes, which seems unreasonable.

In the past we haven't really considered that reason to avoid a motivated fix; for example, plenty of code used to do type punning between std::net types and libc... although it's possible the breakage in this case could be too much to bear.

It's worth doing a crater run for this at least, if for no other reason than to give such code a heads up.

saethlin reacted with thumbs up emoji

pnkfelix

removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label

May 11, 2023

Member

We discussed this in the libs meeting today. We're happy to keep the hash implementation as it is with just using the low 64 bits of the TypeId. However we would like one changes to be made to the comment:

  • Compatibility with the ecosystem is not the reason we do this: crates should not be making assumptions on which methods of the Hash trait are called, and we reserve the right to change this in the future with no warning.
thomcc reacted with thumbs up emojiWaffleLapkin reacted with heart emoji

Amanieu

added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). I-libs-nominated Indicates that an issue has been nominated for discussion during a libs team meeting.

and removed I-libs-nominated Indicates that an issue has been nominated for discussion during a libs team meeting. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label).

labels

May 31, 2023

Member

Author

I ended up squashing the changes as part of the rebase, to avoid having to resolve conflicts in some of the over-complicated intermediate approaches I tried. The only material change in the update (besides the rebase) is a rewritten comment inside the Hash implementation.

Member

Author

@rustbot ready

rustbot

added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.

labels

Jun 4, 2023

Member

Given the T-libs approval modulo comment in #109953 (comment) and the comment being fixed,

@bors r+ rollup

tada

thomcc reacted with rocket emoji

Contributor

pushpin Commit b512004 has been approved by WaffleLapkin

It is now in the queue for this repository.

bors

added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

labels

Jun 8, 2023

bors

merged commit 8747c0e into

rust-lang:master

Jun 8, 2023

11 checks passed

rustbot

added this to the 1.72.0 milestone

Jun 8, 2023

rust-timer

added a commit to rust-lang-ci/rust that referenced this pull request

Jun 8, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

eddyb

eddyb left review comments

compiler-errors

compiler-errors left review comments

bjorn3

bjorn3 left review comments

scottmcm

scottmcm left review comments

davidtwco

davidtwco approved these changes

WaffleLapkin

WaffleLapkin approved these changes
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects

None yet

Milestone

1.72.0

Development

Successfully merging this pull request may close these issues.

None yet

20 participants

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK