7

debuginfo: Generalize C++-like encoding for enums. by michaelwoerister · Pull Re...

 2 years ago
source link: https://github.com/rust-lang/rust/pull/98393
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Conversation

Member

@michaelwoerister michaelwoerister commented on Jun 22

edited

The updated encoding should be able to handle niche layouts where more than one variant has fields (as introduced in #94075).

The new encoding is more uniform as there is no structural difference between direct-tag, niche-tag, and no-tag layouts anymore. The only difference between those cases is that the "dataful" variant in a niche-tag enum will have a (start, end) pair denoting the tag range instead of a single value.

The new encoding now also supports 128-bit tags, which occur in at least some standard library types. These tags are represented as u64 pairs so that debuggers (which don't always have support for 128-bit integers) can reliably deal with them. The downside is that this adds quite a bit of complexity to the encoding and especially to the corresponding NatVis.

The new encoding seems to increase the size of (x86_64-pc-windows-msvc) debuginfo by 10-15%. The size of binaries is not affected (release builds were built with -Cdebuginfo=2, numbers are in kilobytes):

EXE before after  relative
cargo (debug) 40453 40450 +0%
ripgrep (debug) 10275 10273 +0%
cargo (release) 16186 16185 +0%
ripgrep (release) 4727 4726 +0%
PDB before after  relative
cargo (debug) 236524 261412 +11%
ripgrep (debug) 53140 59060 +11%
cargo (release) 148516 169620 +14%
ripgrep (release) 10676 11804 +11%

Given that the new encoding is more general, this is to be expected. Only platforms using C++-like debuginfo are affected -- which currently is only *-pc-windows-msvc.

TODO

r? @wesleywiser

All reactions

rustbot

added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label

on Jun 22

rust-highfive

added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label

on Jun 22

michaelwoerister

added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.

and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

labels

on Jun 22

michaelwoerister

force-pushed the new-cpp-like-enum-debuginfo

branch 4 times, most recently from 2214eed to 64d9011 Compare

2 months ago

Collaborator

rust-timer commented on Jul 1

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

Contributor

bors commented on Jul 1

hourglass Trying commit 64d9011 with merge c10a6d5...

michaelwoerister

changed the title [wip] debuginfo: Change C++-like encoding for enums.

debuginfo: Generalize C++-like encoding for enums.

on Jul 1

Contributor

bors commented on Jul 1

sunny Try build successful - checks-actions
Build commit: c10a6d5 (c10a6d5629fa5fe05b77feda456310dc9e3778a5)

Collaborator

rust-timer commented on Jul 1

Collaborator

rust-timer commented on Jul 1

Finished benchmarking commit (c10a6d5): comparison url.

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

Cycles

Results

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

Footnotes

  1. the arithmetic mean of the percent change

  2. number of relevant changes

rustbot

added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

and removed S-waiting-on-perf Status: Waiting on a perf run to be completed.

labels

on Jul 1

/// int_type tag;

///

/// enum VariantNames {

/// <name-of-variant-0> = 0, // The numeric values are variant index,

Contributor

@vadimcn vadimcn on Jul 7

This comment is kinda misleading, because it implies that in order to enumerate all variants in an enum one could simply take all values of VariantNames and obtain variant filed names by concatenating variant and the variant index. However variant_fallback throws a wrench into this plan.
So how is one supposed to enumerate enum variants? Should I simply look for all top-level members that begin with variant?

All reactions

The comment is accurate. The VariantNames enum will contain the names of all variants. You can use it to enumerate them.

All reactions

Contributor

@vadimcn vadimcn on Jul 8

I meant that it's easy to assume that "variant index" here is the same as X in variantX. At least I did assume that upon first reading.

All reactions

Yes, that's correct: both the X in the variantX field name and the numeric values of the VariantNames enum correspond to the variant index - i.e. the zero-based index of the variant as listed in the enums source code definition.

All reactions

/// 128-bit integers, so all values involved get split into two 64-bit fields.

/// Instead of the `tag` field, we generate two fields `tag128_lo` and `tag128_hi`,

/// Instead of `DISCR_EXACT`, we generate `DISCR128_EXACT_LO` and `DISCR128_EXACT_HI`,

/// and so on.

Contributor

@vadimcn vadimcn on Jul 7

Other questions I have:

  • Should I expect to deal with more than one variant using tag range? Or is it always just the variant_fallback? Could this change in the future?
  • Are variantX.NAMEand the name of the type of variantX.value supposed to stay in sync? Should one be preferred to the other when displaying variant name to the user?
  • Is tag's int_type guaranteed to be unsigned?
  • How is one supposed to know when to switch to 128-bit tag/constants? By checking whether tag or tag128... is present?

To resolve these ambiguities, I think it would be helpful to include pseudo-code for decoding enums. I know this was geared towards declarative natvis, but not all debuggers use natvis. As a debugger author, I would appreciate that very much!

All reactions

Should I expect to deal with more than one variant using tag range? Or is it always just the variant_fallback?

At the moment it's only variant_fallback.

Could this change in the future?

I'm not aware of any concrete plans to change that -- but it could change in the future, yes.

Are variantX.NAME and the name of the type of variantX.value supposed to stay in sync? Should one be preferred to the other when displaying variant name to the user?

Yes, they'll stay in sync. You can use whatever is easier to implement.

Is tag's int_type guaranteed to be unsigned?

No, it can be signed. I'm not a quite clear about the situation for niche-tag enums, though. The compiler currently will only describe those as unsigned types in debuginfo -- but I don't know if that's actually correct in all cases.

How is one supposed to know when to switch to 128-bit tag/constants? By checking whether tag or tag128... is present?

Yes, if there is a tag128_* field in the top-level union, it's 128-bits. Otherwise, there'll be a tag field with some other integer type.

To resolve these ambiguities, I think it would be helpful to include pseudo-code for decoding enums. I know this was geared towards declarative natvis, but not all debuggers use natvis. As a debugger author, I would appreciate that very much!

I'll add something.

All reactions

src/etc/natvis/intrinsic.natvis

Outdated Show resolved

<Intrinsic Name="in_range128" Expression="(lt128(begin_hi, begin_lo, end_hi, end_lo)) ?

(lt_or_eq128(begin_hi, begin_lo, tag128_hi, tag128_lo) && lt_or_eq128(tag128_hi, tag128_lo, end_hi, end_lo)) :

(lt_or_eq128(begin_hi, begin_lo, tag128_hi, tag128_lo) || lt_or_eq128(tag128_hi, tag128_lo, end_hi, end_lo))">

Is this part right? If begin > end, don't we need to do tag > start || tag < end?

All reactions

Member

Author

@michaelwoerister michaelwoerister on Jul 8

edited

It should be equivalent: (tag >= start) == (start <= tag). I switched from (tag >= start) to (start <= tag) so I don't have to create a gt_or_eq128 function.

All reactions

Member

Author

michaelwoerister commented on Jul 8

So, I'm somewhat conflicted about variant_fallback for niche-layout enums. As @vadimcn's comment make clear, it's something that makes it harder to reason about the encoding. And at the same time, it limits the encoding to there only being a single variant that corresponds to a range of tag values.

The (only) reason it does exist is to make the (already very big) NatVis definition smaller. With variant_fallback we have:

<DisplayString Condition="tag == variant0.DISCR_EXACT" Optional="true">{variant0.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant1.DISCR_EXACT" Optional="true">{variant1.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant2.DISCR_EXACT" Optional="true">{variant2.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant3.DISCR_EXACT" Optional="true">{variant3.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant4.DISCR_EXACT" Optional="true">{variant4.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant5.DISCR_EXACT" Optional="true">{variant5.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant6.DISCR_EXACT" Optional="true">{variant6.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant7.DISCR_EXACT" Optional="true">{variant7.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant8.DISCR_EXACT" Optional="true">{variant8.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant9.DISCR_EXACT" Optional="true">{variant9.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant10.DISCR_EXACT" Optional="true">{variant10.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant11.DISCR_EXACT" Optional="true">{variant11.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant12.DISCR_EXACT" Optional="true">{variant12.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant13.DISCR_EXACT" Optional="true">{variant13.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant14.DISCR_EXACT" Optional="true">{variant14.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant15.DISCR_EXACT" Optional="true">{variant15.NAME,en}</DisplayString>

<DisplayString Condition="in_range(tag, variant_fallback.DISCR_BEGIN, variant_fallback.DISCR_END)" 
               Optional="true">{variant_fallback.NAME,en}</DisplayString>

Without it, we have to check for every variant if it corresponds to a range, so we get:

<DisplayString Condition="tag == variant0.DISCR_EXACT" Optional="true">{variant0.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant1.DISCR_EXACT" Optional="true">{variant1.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant2.DISCR_EXACT" Optional="true">{variant2.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant3.DISCR_EXACT" Optional="true">{variant3.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant4.DISCR_EXACT" Optional="true">{variant4.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant5.DISCR_EXACT" Optional="true">{variant5.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant6.DISCR_EXACT" Optional="true">{variant6.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant7.DISCR_EXACT" Optional="true">{variant7.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant8.DISCR_EXACT" Optional="true">{variant8.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant9.DISCR_EXACT" Optional="true">{variant9.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant10.DISCR_EXACT" Optional="true">{variant10.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant11.DISCR_EXACT" Optional="true">{variant11.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant12.DISCR_EXACT" Optional="true">{variant12.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant13.DISCR_EXACT" Optional="true">{variant13.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant14.DISCR_EXACT" Optional="true">{variant14.NAME,en}</DisplayString>
<DisplayString Condition="tag == variant15.DISCR_EXACT" Optional="true">{variant15.NAME,en}</DisplayString>

<DisplayString Condition="in_range(tag, variant0.DISCR_BEGIN, variant0.DISCR_END)" Optional="true">{variant0.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant1.DISCR_BEGIN, variant1.DISCR_END)" Optional="true">{variant1.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant2.DISCR_BEGIN, variant2.DISCR_END)" Optional="true">{variant2.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant3.DISCR_BEGIN, variant3.DISCR_END)" Optional="true">{variant3.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant4.DISCR_BEGIN, variant4.DISCR_END)" Optional="true">{variant4.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant5.DISCR_BEGIN, variant5.DISCR_END)" Optional="true">{variant5.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant6.DISCR_BEGIN, variant6.DISCR_END)" Optional="true">{variant6.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant7.DISCR_BEGIN, variant7.DISCR_END)" Optional="true">{variant7.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant8.DISCR_BEGIN, variant8.DISCR_END)" Optional="true">{variant8.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant9.DISCR_BEGIN, variant9.DISCR_END)" Optional="true">{variant9.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant10.DISCR_BEGIN, variant10.DISCR_END)" Optional="true">{variant10.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant11.DISCR_BEGIN, variant11.DISCR_END)" Optional="true">{variant11.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant12.DISCR_BEGIN, variant12.DISCR_END)" Optional="true">{variant12.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant13.DISCR_BEGIN, variant13.DISCR_END)" Optional="true">{variant13.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant14.DISCR_BEGIN, variant14.DISCR_END)" Optional="true">{variant14.NAME,en}</DisplayString>
<DisplayString Condition="in_range(tag, variant15.DISCR_BEGIN, variant15.DISCR_END)" Optional="true">{variant15.NAME,en}</DisplayString>

And there are actually another three similar copies of these two blocks to account for 128-bit tags and field expansion. It's quite the monstrositysweat_smile

But I'm not sure that that would actually be a problem. On the one hand, I'm worried about NatVis evaluation engines not being built for visualizers of this size. On the other hand, I haven't seen any actual evidence that either Visual Studio or WinDbg have trouble interpreting these.

So I'm not sure. Shall we make the encoding more regular and consistent by removing variant_fallback? I'm starting to be in favor of that.

michaelwoerister

removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label

on Jul 8

Member

Author

michaelwoerister commented on Jul 8

I've removed variant_fallback now. The NatVis is more verbose, but the encoding is more regular.

/// int_type tag;

///

/// enum VariantNames {

/// <name-of-variant-0> = 0, // The numeric values are variant index,

Contributor

@vadimcn vadimcn on Jul 8

I meant that it's easy to assume that "variant index" here is the same as X in variantX. At least I did assume that upon first reading.

All reactions

bors

added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label

8 days ago

Contributor

bors commented 8 days ago

hourglass Testing commit ee634fb with merge 192ff19...

Contributor

bors commented 8 days ago

broken_heart Test failed - checks-actions

bors

added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

labels

8 days ago

This comment has been minimized.

Member

wesleywiser commented 7 days ago

@bors r+

Contributor

bors commented 7 days ago

pushpin Commit 6d030a5 has been approved by wesleywiser

It is now in the queue for this repository.

bors

added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

labels

7 days ago

Contributor

bors commented 7 days ago

hourglass Testing commit 6d030a5 with merge 8f836d9...

Contributor

bors commented 7 days ago

broken_heart Test failed - checks-actions

bors

added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

labels

7 days ago

This comment has been minimized.

Member

Author

michaelwoerister commented 5 days ago

Tests updated.

@bors r=wesleywiser

Contributor

bors commented 5 days ago

pushpin Commit 03b93d0 has been approved by wesleywiser

It is now in the queue for this repository.

bors

added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

labels

5 days ago

Contributor

bors commented 5 days ago

hourglass Testing commit 03b93d0 with merge 4916e2b...

Contributor

bors commented 4 days ago

sunny Test successful - checks-actions
Approved by: wesleywiser
Pushing 4916e2b to master...

bors

added the merged-by-bors This PR was explicitly merged by bors label

4 days ago

bors

merged commit 4916e2b into

rust-lang:master

4 days ago

11 checks passed

rustbot

added this to the 1.65.0 milestone

4 days ago

Collaborator

rust-timer commented 4 days ago

Finished benchmarking commit (4916e2b): comparison url.

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

Cycles

Results

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

Footnotes

  1. the arithmetic mean of the percent change

  2. number of relevant changes


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK