proc-macro: Stop wrapping `ident` matchers into groups by petrochenkov · Pull Re...
source link: https://github.com/rust-lang/rust/pull/92472
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Conversation
ident
is always a single token and can be treated in the same way as tt
.
r? @Aaron1011
@bors try
Try build successful - checks-actions
Build commit: ef8c760 (ef8c760c4c3d93b561a5b431502e92f4b4773aca
)
@craterbot check
Experiment pr-92472
created and queued.
Automatically detected try build ef8c760
You can check out the queue and this experiment's details.
Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more
Experiment pr-92472
is now running
Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more
Experiment pr-92472
is completed!
44 regressed and 16 fixed (203512 total)
Open the full report.
If you notice any spurious failure please add them to the blacklist!
Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more
@Aaron1011
See #91166 for some background for this PR.
I think groups is a way to preserve parsing priorities first of all, not to maintain the macro_rules
matcher substitution stacks.
(From that point of view ident
doesn't need to be grouped, it's not affected by parsing priorities as a single token.)
Otherwise we would wrap tokens substituted with tt
too.
Moreover, even ident
doesn't preserve the whole substitution stack, only the last substitution.
I think it would be useful to maintain these stacks in the compiler somehow, including for tt
s, for which they are currently not tracked at all.
That would be useful for spans, because the interpolated token spans are too damn useful for joining them with neighboring spans, like getting a reasonable span for the $a + b
expression, which currently happens if $a
is an ident
, but not if $a
is a tt
.
I just don't think groups is the right tool for that.
The main source of regressions here is rhai
which generates a definition and use of a variable with different contexts (one is plain Span::call_site()
and another is inherited from a part of input), but those two contexts happen to coincide if ident
matchers are wrapped (the group is used as that "part of input").
So, it's not about group parsing, just an accidental hygiene issue.
(I'll look at other regressions later.)
Besides rhai
there are only 3 non-spurious root regressions:
ruly-4.1.0
anddisplaydoc-lite-0.1.3
- the proc macros are internal, do manual parsing, and expect a fixed input, which is generated bymacro_rules
and happened to include groups at the time of writing. The macros panic if the input differ from the expected.MNTRA.compiler_project.6bd7adf91e1532d3ebac49efef132b4d456cc757
- some kind of a hygiene context mismatch similar torhai
, I didn't figure out the precise details.
So, I think the conclusion is that we should be able make this change, if we want it.
FWIW, among those "16 fixed" there are some legitimate fixes for regressions caused by (not so) recent proc macro bugfixes that resulted in new groups being produced :)
triage: looks like this is ready for review
@rustbot label: -S-waiting-on-author +S-waiting-on-review
This is an observable language and library change that needs to go through some team.
I'm not sure which exactly, so I'll nominate it for both, please remove the nomination if you think it's not yours responsibility.
Code matched by macro_rules
matchers like ty
or expr
(and using TokenKind::Interpolated
internally in rustc) is converted to token streams when passed to proc macros.
For majority of matchers, with exception of tt
, such token streams are wrapped into a TokenTree::Group
with Delimiter::None
to preserve parsing priorities (the typical example being $expr * 3
where $expr
is 1 + 2
, which is tokenized as ⟪ 1 + 2 ⟫ * 3
instead of 1 + 2 * 3
).
This PR removes this wrapping from the ident
matcher which is always a single token, so it's not affected by any parsing priorities and is very similar to tt
.
- Before:
$ident + 1
=>⟪ i ⟫ + 1
. - After:
$ident + 1
=>i + 1
.
@rust-lang/lang, @rust-lang/libs-api:
@rfcbot fcp merge
See petrochenkov's summary of the change in #92472 (comment).
We discussed this change in the libs-api meeting on February 2 and I believe those present had no objections at the time.
Team member @dtolnay has proposed to merge this. The next step is review by the rest of the tagged team members:
Concerns:
- documentation resolved by #92472 (comment)
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
FCP for lang is about the vague question "what happens if identifiers are not one token in the future". In the libs-api meeting we felt it would be okay to wrap multiple-token identifiers in a None
group even while continuing to have no None
group around identifiers which are a single token, which will always exist.
Note that this will remove some Span
information that proc-macros could have been using: #92472 (comment). I don't know of any case where this is used in practice (and as @petrochenkov said, groups are probably the wrong tool for making this information available) - the most obvious one would be setting token spans to 'nicer' locations (e.g. a location within a macro_rules
definition). However, I think everyone should be aware that this has a (small) chance of removing functionality that a proc-macro is actually relying on.
@rustbot label +relnotes
This clearly needs a note in the release notes -- can someone knowledge (@dtolnay or @petrochenkov, perhaps?) draft one?
@rfcbot concern documentation
@petrochenkov are the current tokenization rules documented somewhere (e.g., the rust reference?). Some quick searching did not find them, if so.
I am happy to make this change but I'd like the current tokenization rules to be documented in the reference (or in some location linked to by the reference).
are the current tokenization rules documented somewhere (e.g., the rust reference?)
Probably not.
Not so long ago pretty-printing was often used for tokenization, so there was not many rules besides "try as best as we can".
I'll try to document this in the reference.
The references says, regarding macro_rules
https://doc.rust-lang.org/reference/macros-by-example.html#transcribing
The
ident
,lifetime
, andtt
fragment types are an exception, and can be matched by literal tokens.
That implies to me that something like this behaviour has basically already happened for MBE, and thus it makes sense to do for proc macros too.
(Please correct me if I'm understanding this wrong. I'm pretty new to looking at macro details.)
@rfcbot reviewed
@petrochenkov that would be awesome -- it doesn't have to be the world's best docs, if it has ssome "TODO" items seems ok, but having some docs with key concerns would be great.
@rfcbot reviewed
@rfcbot resolve documentation
Copy link
rfcbot commented 10 days ago
This is now entering its final comment period, as per the review above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
No reviews
None yet
No milestone
Successfully merging this pull request may close these issues.
None yet
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK