debuginfo: Generalize C++-like encoding for enums. by michaelwoerister · Pull Re...
source link: https://github.com/rust-lang/rust/pull/98393
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Conversation
The updated encoding should be able to handle niche layouts where more than one variant has fields (as introduced in #94075).
The new encoding is more uniform as there is no structural difference between direct-tag, niche-tag, and no-tag layouts anymore. The only difference between those cases is that the "dataful" variant in a niche-tag enum will have a (start, end)
pair denoting the tag range instead of a single value.
The new encoding now also supports 128-bit tags, which occur in at least some standard library types. These tags are represented as u64
pairs so that debuggers (which don't always have support for 128-bit integers) can reliably deal with them. The downside is that this adds quite a bit of complexity to the encoding and especially to the corresponding NatVis.
The new encoding seems to increase the size of (x86_64-pc-windows-msvc) debuginfo by 10-15%. The size of binaries is not affected (release builds were built with -Cdebuginfo=2
, numbers are in kilobytes):
EXE | before | after | relative |
---|---|---|---|
cargo (debug) | 40453 | 40450 | +0% |
ripgrep (debug) | 10275 | 10273 | +0% |
cargo (release) | 16186 | 16185 | +0% |
ripgrep (release) | 4727 | 4726 | +0% |
PDB | before | after | relative |
---|---|---|---|
cargo (debug) | 236524 | 261412 | +11% |
ripgrep (debug) | 53140 | 59060 | +11% |
cargo (release) | 148516 | 169620 | +14% |
ripgrep (release) | 10676 | 11804 | +11% |
Given that the new encoding is more general, this is to be expected. Only platforms using C++-like debuginfo are affected -- which currently is only *-pc-windows-msvc
.
TODO
- Properly update documentation
- Add regression tests for new optimized enum layouts as introduced by Use niche-filling optimization even when multiple variants have data. #94075.
r? @wesleywiser
added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label
added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label
added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
labels
force-pushed the new-cpp-like-enum-debuginfo
branch
4 times, most recently
from
2214eed
to
64d9011
Compare
Collaborator
rust-timer commented on Jul 1
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
changed the title
[wip] debuginfo: Change C++-like encoding for enums.
debuginfo: Generalize C++-like encoding for enums.
Try build successful - checks-actions |
Collaborator
rust-timer commented on Jul 1
Collaborator
rust-timer commented on Jul 1
Finished benchmarking commit (c10a6d5): comparison url. Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results CyclesResults If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Footnotes
|
added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
and removed S-waiting-on-perf Status: Waiting on a perf run to be completed.
labels
/// int_type tag; |
||
/// |
||
/// enum VariantNames { |
||
/// <name-of-variant-0> = 0, // The numeric values are variant index, |
This comment is kinda misleading, because it implies that in order to enumerate all variants in an enum one could simply take all values of VariantNames
and obtain variant filed names by concatenating variant
and the variant index. However variant_fallback
throws a wrench into this plan.
So how is one supposed to enumerate enum variants? Should I simply look for all top-level members that begin with variant
?
The comment is accurate. The VariantNames
enum will contain the names of all variants. You can use it to enumerate them.
I meant that it's easy to assume that "variant index" here is the same as X
in variantX
. At least I did assume that upon first reading.
Yes, that's correct: both the X
in the variantX
field name and the numeric values of the VariantNames
enum correspond to the variant index - i.e. the zero-based index of the variant as listed in the enums source code definition.
/// 128-bit integers, so all values involved get split into two 64-bit fields. |
||
/// Instead of the `tag` field, we generate two fields `tag128_lo` and `tag128_hi`, |
||
/// Instead of `DISCR_EXACT`, we generate `DISCR128_EXACT_LO` and `DISCR128_EXACT_HI`, |
||
/// and so on. |
Other questions I have:
- Should I expect to deal with more than one variant using tag range? Or is it always just the
variant_fallback
? Could this change in the future? - Are
variantX.NAME
and the name of the type ofvariantX.value
supposed to stay in sync? Should one be preferred to the other when displaying variant name to the user? - Is tag's
int_type
guaranteed to be unsigned? - How is one supposed to know when to switch to 128-bit tag/constants? By checking whether
tag
ortag128...
is present?
To resolve these ambiguities, I think it would be helpful to include pseudo-code for decoding enums. I know this was geared towards declarative natvis, but not all debuggers use natvis. As a debugger author, I would appreciate that very much!
Should I expect to deal with more than one variant using tag range? Or is it always just the variant_fallback?
At the moment it's only variant_fallback
.
Could this change in the future?
I'm not aware of any concrete plans to change that -- but it could change in the future, yes.
Are variantX.NAME and the name of the type of variantX.value supposed to stay in sync? Should one be preferred to the other when displaying variant name to the user?
Yes, they'll stay in sync. You can use whatever is easier to implement.
Is tag's int_type guaranteed to be unsigned?
No, it can be signed. I'm not a quite clear about the situation for niche-tag enums, though. The compiler currently will only describe those as unsigned types in debuginfo -- but I don't know if that's actually correct in all cases.
How is one supposed to know when to switch to 128-bit tag/constants? By checking whether tag or tag128... is present?
Yes, if there is a tag128_*
field in the top-level union, it's 128-bits. Otherwise, there'll be a tag
field with some other integer type.
To resolve these ambiguities, I think it would be helpful to include pseudo-code for decoding enums. I know this was geared towards declarative natvis, but not all debuggers use natvis. As a debugger author, I would appreciate that very much!
I'll add something.
src/etc/natvis/intrinsic.natvis
Outdated Show resolved
src/etc/natvis/intrinsic.natvis
Outdated
<Intrinsic Name="in_range128" Expression="(lt128(begin_hi, begin_lo, end_hi, end_lo)) ? |
||
(lt_or_eq128(begin_hi, begin_lo, tag128_hi, tag128_lo) && lt_or_eq128(tag128_hi, tag128_lo, end_hi, end_lo)) : |
||
(lt_or_eq128(begin_hi, begin_lo, tag128_hi, tag128_lo) || lt_or_eq128(tag128_hi, tag128_lo, end_hi, end_lo))"> |
Is this part right? If begin > end
, don't we need to do tag > start || tag < end
?
It should be equivalent: (tag >= start) == (start <= tag)
. I switched from (tag >= start)
to (start <= tag)
so I don't have to create a gt_or_eq128
function.
Member
Author
michaelwoerister commented on Jul 8
So, I'm somewhat conflicted about The (only) reason it does exist is to make the (already very big) NatVis definition smaller. With
Without it, we have to check for every variant if it corresponds to a range, so we get:
And there are actually another three similar copies of these two blocks to account for 128-bit tags and field expansion. It's quite the monstrosity But I'm not sure that that would actually be a problem. On the one hand, I'm worried about NatVis evaluation engines not being built for visualizers of this size. On the other hand, I haven't seen any actual evidence that either Visual Studio or WinDbg have trouble interpreting these. So I'm not sure. Shall we make the encoding more regular and consistent by removing |
removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label
Member
Author
michaelwoerister commented on Jul 8
I've removed |
/// int_type tag; |
||
/// |
||
/// enum VariantNames { |
||
/// <name-of-variant-0> = 0, // The numeric values are variant index, |
compiler/rustc_codegen_llvm/src/debuginfo/metadata/enums/cpp_like.rs
Outdated Show resolved
added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label
Contributor
bors commented 8 days ago
Contributor
bors commented 8 days ago
Test failed - checks-actions |
added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
labels
This comment has been minimized.
Member
wesleywiser commented 7 days ago
@bors r+ |
Contributor
bors commented 7 days ago
added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
labels
Contributor
bors commented 7 days ago
Contributor
bors commented 7 days ago
Test failed - checks-actions |
added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
labels
This comment has been minimized.
Member
Author
michaelwoerister commented 5 days ago
Tests updated. @bors r=wesleywiser |
Contributor
bors commented 5 days ago
added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
labels
Contributor
bors commented 5 days ago
Contributor
bors commented 4 days ago
Test successful - checks-actions |
Collaborator
rust-timer commented 4 days ago
Finished benchmarking commit (4916e2b): comparison url. Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results CyclesResults If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. @rustbot label: -perf-regression Footnotes
|
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK