6

Github Hygiene opt-out (escaping) for declarative macros 2.0 by alexreg · Pull R...

 3 years ago
source link: https://github.com/rust-lang/rfcs/pull/2498
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Conversation

Copy link

alexreg commented on Jul 13, 2018

edited

This feature introduces the ability to "opt-out" of the usual macro hygiene rules within definitions of declarative macros (macros 2.0), for designated occurrences of identifiers. In other words, the feature will enable one to annotate occurrences of identifiers with macro call-site hygiene rather than the default definition-site hygiene.

Rendered

CC @jseyfried @petrochenkov @nrc

Copy link

Contributor

Centril left a comment

Nicely written :)

# Summary

[summary]: #summary

This feature introduces the ability to "opt-out" of the usual macro hygiene rules within definitions of [declarative macros][decl-macro], for designated identifiers or occurrences of identifiers. In other words, the feature will enable one to annotate occurrences of identifiers with macro call-site hygiene rather than the default definition-site hygiene.

Centril on Jul 13, 2018

Contributor

Could be good to mention here that "declarative macros" does not refer to macro_rules! (it is apparent if you click the link, but in the interest of not having to do so...)

alexreg on Jul 13, 2018

Author

Fair point. I originally had this, but somehow removed it.

# Motivation

[motivation]: #motivation

The use of [hygienic macros] in Rust is justified by much prior research and experience, and solves several common issues that programmers would otherwise encounter with macros due to the nature of syntactical substitution. The principal deficit of this approach is that it requires that names/identifiers of any items generated by a macro be *explicitly passed to* the macro as arguments. This both requires the logic for name selection to remain entirely external to the macro, and even if that is not a problem, the passing of all identifiers-to-export into a macro can quickly become unwieldy for macros that generate many identifiers.

Centril on Jul 13, 2018

Contributor

justified by much prior research and experience

A link would be good for curious readers :)

alexreg on Jul 13, 2018

Author

I think the "hygienic macros" links offers good justification, no?

Centril on Jul 13, 2018

Contributor

Truthfully I expected more papers and citations given "much prior research", but I suppose it's enough :)

alexreg on Jul 13, 2018

Author

Hah, okay, I'll add one or two!

# Guide-level explanation

[guide-level-explanation]: #guide-level-explanation

Escaping of hygiene for identifiers within macros allows one to define identifiers with syntax contexts (**hygiene**) corresponding to the place the macro is invoked (the **call-site**) rather than the place it is defined (**definition-site**). It also enables one to use/reference existing identifiers from the call-site from within macro definitions, though this is not the true aim of the feature, but rather a side-effect, and will be discussed later.

Centril on Jul 13, 2018

Contributor

Could be more clear: "Place" => "location in the source code"

Escaping of hygiene for identifiers within macros allows one to define identifiers with syntax contexts (**hygiene**) corresponding to the place the macro is invoked (the **call-site**) rather than the place it is defined (**definition-site**). It also enables one to use/reference existing identifiers from the call-site from within macro definitions, though this is not the true aim of the feature, but rather a side-effect, and will be discussed later.

Note that for the purposes of this RFC, an **identifier** can roughly be considered to be an textual name (e.g. `foo_bar`) of any sort (for a variable, function, trait, etc.) or a lifetime (e.g. `'a`).

Centril on Jul 13, 2018

Contributor

So what is the relation of this RFC to #2151?

alexreg on Jul 13, 2018

Author

None. I might add a sentence to make that clear.

Note that for the purposes of this RFC, an **identifier** can roughly be considered to be an textual name (e.g. `foo_bar`) of any sort (for a variable, function, trait, etc.) or a lifetime (e.g. `'a`).

To escape an identifier in code, one simply prefixes an identifier with the [sigil] `#`. This changes the syntax context (hygiene) of the identifier from the usual definition-site to the call-site.

Centril on Jul 13, 2018

Contributor

The quote! macro uses #... Have you considered conflicts if and when quote is redefined as a 2.0 macro?

kennytm on Jul 13, 2018

Member

bike I wonder if backslash ("escaping") can be valid

pub mod \foo {
    const \BAR: u32 = 123;
}

alexreg on Jul 13, 2018

Author

@Centril No, I'm not sure. I wonder why it doesn't use $? Grr. Maybe someone can clarify for me whether it would conflict.

alexreg on Jul 13, 2018

Author

Added an "unresolved question" about this, incidentally.

## Reference: Example D

[reference-example-d]: #reference-example-d

In [example B][guide-example-b], the situation is almost identical to [example C][reference-example-c], except that the name of the module is defined within the macro as `foo`, and hygiene-escaped, so that it has the call-site syntax context.

Centril on Jul 13, 2018

Contributor

Typo here? Should say "In [example D][guide-example-d]"?

alexreg

changed the title Hygiene opt-out (escaping) for Declarative Macros 2.0

Hygiene opt-out (escaping) for declarative macros 2.0

on Jul 13, 2018

Copy link

Contributor

Centril commented on Jul 13, 2018

cc @dtolnay on possible conflicts, due to using #, between quote! and decl macros...

Copy link

Contributor

tikue commented on Jul 14, 2018

I think the RFC doesn't specify the semantics of nested macro calls:

Escaping of hygiene for identifiers within macros allows one to define identifiers with syntax contexts (hygiene) corresponding to the location in the source code from which the macro is invoked (the call-site) rather than the location it is defined (definition-site).

Given:

x!();

macro x() {
    y!();
}

macro y() {
    struct #Foo;
}

In the invocation of x!(), what is the call site of y!()?

When the macro is invoked (expanded), each token tree is transcribed according to the following rules, depending on its hygiene tag.

- *definition-site*: a normal mark is applied for the current expansion

- *call-site*: a transparent mark is applied for the current expansion and the syntax context for every identifier in the token tree is changed to the syntax context of the call site.

petrochenkov on Jul 14, 2018

Contributor

and the syntax context for every identifier in the token tree is changed to the syntax context of the call site

What is this part about?
When a macro is expanded, an identifier gets an opaque mark added by default (Span::def_site() in proc macro API) or transparent mark if opt-out is in place (Span::call_site() in proc macro API), that's all what happens.

alexreg on Jul 14, 2018

Author

Ah, I was slightly confused about how your transparent mark worked. I'll clarify that.

alexreg on Jul 14, 2018

Author

Let me know if it's better now.

Copy link

Contributor

petrochenkov commented on Jul 14, 2018

edited

@tikue
Call site of y!() is inside of x so the struct Foo cannot be accessed from outside of x.
So if you have a transparent macro you can always "contain" it with another macro and prevent further name leakage.

I agree that this presents a problem if you have several layers of macro helpers, this problem needs some other solution in addition to call-site hygiene.

Copy link

Author

alexreg commented on Jul 14, 2018

In the invocation of x!(), what is the call site of y!()?

The definition site of x, by definition. :-)

Copy link

Author

alexreg commented on Jul 14, 2018

edited

I agree that this presents a problem if you have several layers of macro helpers, this problem needs some other solution in addition to call-site hygiene.

Possibly a proc macro that can change the syntax context to that of a given identifier?

Speaking of this, this whole feature could be implemented as a proc macro with eager expansion, couldn't it?

When the macro is invoked (expanded), each token tree is transcribed according to the following rules, depending on its hygiene tag.

- *definition-site*: a normal mark is applied for the current expansion, which leaves the syntax context alone

- *call-site*: a transparent mark is applied for the current expansion, which changes the syntax context for every identifier in the token tree to that of the call site.

petrochenkov on Jul 14, 2018

Contributor

Syntax context of an identifier is a sequence of marks RootMark -> Mark1 -> Mark2.
Both "def-site" and "call-site" variants change it, the former to RootMark -> Mark1 -> Mark2 -> OpaqueMark, the latter to RootMark -> Mark1 -> Mark2 -> TransparentMark.
(All this is an implementation detail anyway.)

mark-i-m on Jul 15, 2018

Member

What exactly are Marks? What does the sequence of marks in this example mean?

petrochenkov on Jul 17, 2018

Contributor

What exactly are Marks?

Right now a mark is a combination of expansion ID and transparency :)

What does the sequence of marks in this example mean?

A syntactic context fully identifying what macros produced an identifier (or other token).

I'll write some docs after doing a number of refactorings in the compiler.

alexreg on Jul 18, 2018

Author

Yeah. And an expansion ID is a particular expansion (instance of an expansion) of a macro, as I understand. Furthermore, I believe a RootMark is constructed from a span or set of spans, though I'm not 100% clear on this. Perhaps @petrochenkov can clarify.

Copy link

Author

alexreg commented on Jul 14, 2018

Right, but I think I explain it correctly now at least. :-)

Copy link

Member

mark-i-m commented on Jul 15, 2018

@alexreg Thanks for the RFC! I'm glad to see macros 2.0 starting to get RFC-ified :)

I had a couple of questions:

  • Is there any leaning towards making this an eRFC or do we have a reasonably good notion of what we want?
  • Regardless of whether this RFC is accepted, would it be possible to get some of this content in the rustc-guide? Specifically the explanations of different syntax contexts, marks, etc. were instructive to me.

Copy link

Author

alexreg commented on Jul 15, 2018

Is there any leaning towards making this an eRFC or do we have a reasonably good notion of what we want?

I think this could work as an eRFC or RFC. I think we know what we want, and the feature is very well-motivated, but I'm starting to think this feature may be best implemented as a proc macro now. (And can thus be extensible to the case I mentioned above.)

Regardless of whether this RFC is accepted, would it be possible to get some of this content in the rustc-guide? Specifically the explanations of different syntax contexts, marks, etc. were instructive to me.

Yep, I think so. Let's wait until it's merged and I have a bit more experience with things, but I'd be glad to do that.

Copy link

Author

alexreg commented on Jul 17, 2018

@petrochenkov Can this feature be implemented using a proc macro yet? I presume not. What more would it require though?

Copy link

Contributor

petrochenkov commented on Jul 17, 2018

@alexreg
Probably yes, given enough effort?
Procedural macros can produce call-site spans on stable, can produce def-site spans on nightly, and can parse arbitrary text including Rust code with identifiers marked with #.

Copy link

Author

alexreg commented on Jul 18, 2018

Probably yes, given enough effort?
Procedural macros can produce call-site spans on stable, can produce def-site spans on nightly, and can parse arbitrary text including Rust code with identifiers marked with #.

I think it's worth considering a) whether we should implement this RFC as a proc macro, b) the alternative of lift!(ident) and lift!(ident, ident2) (where in the second case it changed the syntax context of ident to that of ident2). Thoughts on both of these?

Escaping of hygiene for identifiers within macros allows one to define identifiers with syntax contexts (**hygiene**) corresponding to the location in the source code from which the macro is invoked (the **call-site**) rather than the location it is defined (**definition-site**). It also enables one to use/reference existing identifiers from the call-site from within macro definitions, though this is not the true aim of the feature, but rather a side-effect, and will be discussed later.

Note that for the purposes of this RFC, an **identifier** can roughly be considered to be an textual name (e.g. `foo_bar`) of any sort (for a variable, function, trait, etc.) or a lifetime (e.g. `'a`).

nrc on Jul 18, 2018

Member

Currently all lifetime parameters are unhygienic, not sure if we will fix that for macros 2.0 or not.

alexreg on Jul 18, 2018

Author

Yeah. Hopefully we will!

petrochenkov on Jul 18, 2018

Contributor

Lifetimes are already hygienic in macro macros and with Span::def_site() in proc macros.

## Meta-variables

[meta-variables]: #meta-variables

Hygiene escaping of meta-variables (i.e. `#$foo` and `$#foo`) does not have immediately obvious semantics or usefulness, so is explicitly disallowed for the present, and yields error messages.

nrc on Jul 18, 2018

Member

The obvious semantics to me is that the resulting identifier takes the name from the metavariable and the hygiene context from the call site.

alexreg on Jul 18, 2018

Author

Yes, I really meant the former in't obviously useful, why the latter isn't obviously useful either nor does it have obvious semantics.

# Prior art

[prior-art]: #prior-art

Extended discussion on this subject was carried out in a [pull request][pr-47992] for this feature, which was closed due to the decision that an RFC such as this one be accepted first. [Alternatives][pr-47992-alternatives] were originally evaluated there, with discussion initiated by @jseyfried, and [continued][pr-47992-alternatives-eval] by @petrochenkov.

nrc on Jul 18, 2018

Member

I'd expect some discussion of how this works in other languages here. In particular, Scheme has a rich system for doing this sort of thing.

alexreg on Jul 18, 2018

Author

Hmm. I'd like to avoid learning Scheme properly for this... maybe I can dig up a decent explanation somewhere?

Copy link

Member

nrc commented on Jul 18, 2018

Can this feature be implemented using a proc macro yet?

I had always imagined this feature being implemented as a proc macro, rather than having dedicated syntax. I imagined that we would provide some functions to proc macros for doing things like apply the hygiene context from a to name b to produce an identifier, as well as the manual manipulation of spans. I think we would also need something along the lines of a function or marker for saying 'don't apply the hygiene marking procedure to this identifier when we expand the proc macro'. Then a crate could provide a library of macros for use by devs to manipulate hygiene in common ways. (I think this may also require eager expansion? I forget the details exactly, but you need some way of saying expand this macro in the context of the use site, not the decl site.).

I think there is a problem with this approach which is what exactly is the use site? If you have nested macros, do you mean the root of the expansion or one level of expansion? If you pass a macro to a macro before expanding it, which use site do you use. What about interactions with eager expansion?

One solution is to always take the hygiene context from an identifier, so the user has to pass some identifier in. This can be done in conjunction with a 'concat idents' kind of macro too. This does make cases like your first example a bit weird. The alternative is that you pick a default and offer ways to access the other variations as needed.

So, the solution I would propose is that you change the RFC to the functions other support necessary to support ergonomic hygiene manipulation in proc macros, then create some library macros which use these to provide hygiene manipulation to decl macros and RFC any language changes required to make that work. I think this approach will lead to a more flexible and orthogonal system.

Copy link

Author

alexreg commented on Jul 20, 2018

edited

@nrc

So, the solution I would propose is that you change the RFC to the functions other support necessary to support ergonomic hygiene manipulation in proc macros, then create some library macros which use these to provide hygiene manipulation to decl macros and RFC any language changes required to make that work. I think this approach will lead to a more flexible and orthogonal system.

Yes, that could work well. Having written this, I immediately started thinking an expanded proc macros system was the better way to go... I was just encouraged down this path initially. Do you think my lift! proc macro per above is something that could be integrated into the language? It seems like just about anyone writing hygienic macros would want to use it, really.

Also, maybe you could clarify if eager expansion is actually needed for this purpose? I was convinced it was needed for concat_idents in the past, but even there I forget the exact reason why.

If you're up for a chat on Discourse about what extensions we need to make to the proc macro system, that might be nice.

I think there should be some mention of computed unhygienic symbols (for example concatenating an input parameter with a constant string to create an output symbol) either as a possible extension or in alternatives.

nrc

self-assigned this

on Jul 27, 2018

Copy link

Author

alexreg commented on Aug 5, 2018

@tmccombs I see that as somewhat orthogonal to the concern of this RFC, although hygiene control/opt-out can certainly help with using it.

@alexreg I don't think it is completely orthogonol. If there was a mechanism to generate computed unhygienic symbols, that mechanism could probably be used to achieve the same goals as this RFC.

Copy link

Author

alexreg commented on Aug 6, 2018

@tmccombs There's no such thing as "unhygienic" symbols in the parse tree. The way we've discussed implementing the concat! macro for example is simply ignoring the hygiene info for the arguments, doing a string-like concatenation, and assigning the resulting ident token the hygiene of the call site.

Copy link

Author

alexreg commented on Aug 6, 2018

@nrc

I had always imagined this feature being implemented as a proc macro, rather than having dedicated syntax. I imagined that we would provide some functions to proc macros for doing things like apply the hygiene context from a to name b to produce an identifier, as well as the manual manipulation of spans. I think we would also need something along the lines of a function or marker for saying 'don't apply the hygiene marking procedure to this identifier when we expand the proc macro'. Then a crate could provide a library of macros for use by devs to manipulate hygiene in common ways. (I think this may also require eager expansion? I forget the details exactly, but you need some way of saying expand this macro in the context of the use site, not the decl site.).

Do you have any pointers to info about the current support for hygiene retrieval and manipulation in proc macros? I can then rework this RFC accordingly.

Copy link

Author

alexreg commented on Sep 7, 2018

edited

@dhardy I appreciate your thoughts on this. The issue with having scope designators is that they don't represent tokens or parts of the AST. Conceptually there's a mismatch. What I proposed instead (in discussion on Discord, though I'll eventually write it up into an RFC) is a macro like create_ident!("foo", bar) that would generate an ident foo with its syntax context taken from the token bar (which can of course be a metavar). The second parameter could furthermore be optional, defaulting to use the immediate parent (call-site) syntax context. That, or we could have a separate function that generates a dummy span for the immediate parent syntax context.

Most of us are now against the sigil approach, so no need to worry about that. As for banning uses of idents (as opposed to definition) to passed syntax contexts, I'm in favour of that, as I think others would be, since it's essentially trying to replace what macro parameters are already there for. That said, implementation would be a bit tricky, since all this is done at the token level, and definition vs. use is only distinguished at the AST level at earliest. The most straightforward way we could achieve this, as I see, would be to tag ident tokens when substituted from metavars, and raise an error for all other tokens with non-local syntax context unless they appear on the LHS of an assignment.

P.S. I don't see what you're suggesting with regards to function and module scope. This is irrelevant for macros, since the syntax contexts (hygiene info) are the same; only the spans differ.

Copy link

Member

mark-i-m commented on Mar 1, 2019

What's the status of this?

Copy link

Author

alexreg commented on Mar 1, 2019

@mark-i-m The plan is to take some of the macros @nrc described in his blog post a while ago and integrate these into core, alongside macros like concat!. We then need macros in ident-position to make this actually useful in most cases, or at least some sort of equivalent (maybe restricting macros in ident-position to within macros). I think this debate should probably be reopened. Do you have any thoughts on it?

I'd also be curious to get @Centril's view on this, as the language design guru. ;-)

Copy link

Member

mark-i-m commented on Mar 2, 2019

Do you have any thoughts on it?

Thanks for asking, but I'm not really knowledgeable about hygiene at all. I was mainly curious what the progress was on macros 2.0. While I would like to see it continue, it seems from the Rust2019 posts that there are other things that are priorities ATM (e.g. GATs, specialization).

Macros in the ident position does seem like an excellent feature for exploring how Rust will proceed in this area, though. Is there more documentation that can be found about these issues somewhere?

Copy link

Author

alexreg commented on Mar 3, 2019

edited

Macros in the ident position does seem like an excellent feature for exploring how Rust will proceed in this area, though. Is there more documentation that can be found about these issues somewhere?

Not a lot. There was an attempt at an RFC (a while back, by @Manishearth if I remember), and some discussion on a GitHub issue, plus a series of posts by @nrc (including that one), but nothing nothing more formal.

The nice thing about enhancements to the macro system is that they can largely proceed independently of the type system or trait solving, since there aren't too many interactions. The main thing macros needs to be concerned with are other syntactical developments, like what WG-Grammar are doing, perhaps.

Copy link

Member

Manishearth commented on Mar 3, 2019

(yeah, I posted an RFC , but this was back when RFCs were much smaller and also I was pretty new to it, so it's not a substantial RFC and it's not too relevant now)

Copy link

Member

Manishearth commented on Mar 3, 2019

If you decide to go the "special proc macro" route you'll definitely need eager expansion. One consistent way to add it to the syntax is $let:

$let $myident = lift!(hello);
let $myident =5;

IIRC there have been other proposals

Copy link

Author

alexreg commented on Mar 3, 2019

@Manishearth Yeah, no worries, I realised it doesn't have the usual level of detail of recent RFCs, but worth referencing in any case.

I've talked about this before with a few people, and I don't think there's any inherent reason we need eager expansion for this... unless I misremember something. Is there?

Copy link

Member

Manishearth commented on Mar 3, 2019

The reason is that supporting macros directly in ident positions is a bit of an annoying syntax minefield.

$let doesn't actually have to be eager expansion -- come to think of it in this case it shouldn't probably, but being able to bind macros to macro variables helps do this substitution well.

There are probably other solutions.

Copy link

Author

alexreg commented on Mar 3, 2019

@Manishearth Yeah, this is what I meant by "inherent"... the syntax gets ugly, and intrudes on normal Rust code more than usual, but there's no technical obstacle, as far as I'm aware. :-)

I recall discussing macro bindings as well, but I think we discounted this on the basis that syntactical substitutions don't really work like bindings, and to even create a coherent system, it would require a significant expansion in complexity.

I believe one argument was to allow macros in ident-position (normal, lazy expansion), but only allow it within other macros. This is nice in a way, but I think the main argument against is was consistency (if ident-position is allowed inside macros, why not every position? -- though I forget what others are currently disallowed). This is where eager expansion (most recent attempt here) comes back again, and starts to look like the best solution, if you combine it with allowing invocations in every position/context.

CC @pierzchalski

Oh neat, identifier-position macros might be a use-case for the same 'eager expansion' macro idea from #2320 (I'm working on cleaning that up after discussions with @alexreg).

Assuming you've got your fancy hygiene-scope-adjusting, identifier-token-producing macro mk_ident!(), then you can avoid all the identifier-position parsing issues by writing something like:

eager! {
  x = mk_ident!();
  let #x = whatever;
}

Here, the line x = mk_ident!(); means "expand mk_ident!(), then bind the resulting tokens to x for the purposes of the next line", and the line let #x = whatever means "interpolate x into #x (I'm stealing the interpolation syntax from quote::quote!) then return that interpolated result".

Copy link

Author

alexreg commented on Mar 4, 2019

Yes, exactly like that. :-) (There will also be other token-producing macros, probably along the lines of https://www.ncameron.org/blog/untitledconcat_idents-and-macros-in-ident-position/, but I need to write up a short RFC for that.)

I just realised there are some ambiguities in the above syntax for eager!, however. Maybe something like the following would work better.

eager! {
    $x = mk_ident!(),
    let #x = whatever;
}

Or, if these forms of macros are still supported in 2018 (I forget):

eager!(x = mk_ident!(), ...) {
    let #x = whatever;
}

Copy link

Contributor

nikomatsakis commented 12 days ago

@rfcbot fcp postpone

Hello everyone; we discussed this RFC in our backlog bonanza. The consensus was that we that we should postpone it, as we don't think we have the bandwidth to see to it right now. We do think that macros need some more work, though, and that this RFC in particular is looking at real problems (even if we're not sure whether it's the right solution or not).

We would like to encourage folks to discuss "macros 2.0" when the time comes for us to discuss our upcoming roadmap (one of the procedural changes we have in mind is to make it clearer when we'd be open to bigger proposals).

Copy link

rfcbot commented 12 days ago

edited by nikomatsakis

Team member @nikomatsakis has proposed to postpone this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Copy link

rfcbot commented 12 days ago

bellThis is now entering its final comment period, as per the review above. bell

Copy link

rfcbot commented 2 days ago

The final comment period, with a disposition to postpone, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC is now postponed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Linked issues

Successfully merging this pull request may close these issues.

None yet

15 participants

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK