6

RFC: Syntax for embedding cargo-script manifests by epage · Pull Request #3503 ·...

 11 months ago
source link: https://github.com/rust-lang/rfcs/pull/3503
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

RFC: Syntax for embedding cargo-script manifests #3503

Conversation

Contributor

@epage epage

commented

Sep 26, 2023

edited

Rendered

This is for the T-lang side of #3502

Example:

#!/usr/bin/env cargo
```cargo
[dependencies]
clap = { version = "4.2", features = ["derive"] }
```

use clap::Parser;

#[derive(Parser, Debug)]
#[clap(version)]
struct Args {
    #[clap(short, long, help = "Path to config")]
    config: Option<std::path::PathBuf>,
}

fn main() {
    let args = Args::parse();
    println!("{:?}", args);
}
thomcc, alxpettit, joshtriplett, zayenz, and avsaase reacted with thumbs up emojiPatchMixolydic, mcobzarenco, TimNN, Aloso, petrochenkov, Veykril, Miabread, bestouff, and liigo reacted with thumbs down emojiruniq reacted with eyes emoji

epage

added the T-lang Relevant to the language subteam, which will review and decide on the RFC. label

Sep 26, 2023

Member

In PL/Rust (a Rust subset that works as a postgres procedural language handler) we use a somewhat hacky syntax like (see https://github.com/tcdi/plrust/blob/main/doc/src/dependencies.md for example source)

[dependencies]
rand = "0.8"

[code]
use rand::Rng;
Ok(Some(rand::thread_rng().gen()))

I greatly prefer the approach in this RFC (and will likely push PL/Rust transition to it if the RFC is accepted) but it's probably worth noting as prior art.

This comment was marked as resolved.

Member

If you think this RFC a little bit further, then you could generalize this to make rustc ignore all ``` delimitered blocks, to say that rustc should just ignore all these sections and leave them to other tools similarly to how it does it already for the shebang. And then you'd see how similar triple backtick blocks are to /* */ comments. From there to just adding `/* */` comments to the AST it's only a small step.

IMO it's bad style to add cargo specific extensions to Rust's syntax. Just say that ``` delimitered blocks are ignored by rustc and that parser implementations are suggested to add them.

veber-alex and LucasOe reacted with thumbs up emojijoshtriplett reacted with thumbs down emoji

Contributor

Author

In PL/Rust (a Rust subset that works as a postgres procedural language handler) we use a somewhat hacky syntax like (see https://github.com/tcdi/plrust/blob/main/doc/src/dependencies.md for example source)

Thanks!

I've added this as prior art and would love more interoperability (one of the stated motivations for the cargo script RFC)

thomcc reacted with thumbs up emoji

Comment on lines

306 to 316

### Alternative 7: Extended Shebang

````rust

#!/usr/bin/env cargo

# ```cargo

# [dependencies]

# foo = "1.2.3"

# ```

fn main() {}

````

Contributor

Author

@bstrie (moved here for context / easier to follow)

I would like to consider alternative 7, the extended shebang. I don't think the backticks and the redundant cargo specifier should be necessary, producing this:

#!/usr/bin/env cargo
# [dependencies]
# foo = "1.2.3"

fn main() {}

I think this looks quite good. It's fewer lines than the proposed syntax, and mirrors both the shebang syntax and the attribute syntax.

I understand that people might want this to be generalizable/extensible, but this could suffice for now and any discussion about how to generalize the "info" portion can be left for a future discussion. If people think that it's important to make it generalizable right now, then I'd be interested to hear some concrete use cases.

EDIT: although I suppose an unmentioned downside of this is that # [dependencies] might look a bit too much like an attribute.

Contributor

Author

As you mentioned, there is the syntax ambiguity and backticks let us "escape" this block of # lines.

Yes, the cargo specifier is redundant to the reader but each use of cargo is for a different purpose

  • One is for execution
  • One is for parsing

I've noted in the Future Possibilities that we could relax the requirement on having cargo in the infostring in the future. I would lean towards defaulting to cargo rather than parsing the shebang because shebang parsing is messy.

We also wanted to leave the door open (slightly) for adding additional frontmatter blocks, like if we decide to embed lockfiles.

Contributor

Author

If you think this RFC a little bit further, then you could generalize this to make rustc ignore all ``` delimitered blocks, to say that rustc should just ignore all these sections and leave them to other tools similarly to how it does it already for the shebang. And then you'd see how similar triple backtick blocks are to /* */ comments. From there to just adding `/* */` comments to the AST it's only a small step.

Is this meant to positive suggest something or to point out a slipper slope? I'm not too sure the intent.

IMO it's bad style to add cargo specific extensions to Rust's syntax. Just say that ``` delimitered blocks are ignored by rustc and that parser implementations are suggested to add them.

I'm not seeing how this is different than the suggested future state. We are starting with it being locked to cargo initially as we work out the design / usage and then can remove that restriction which is noted Future Possibilities.

joshtriplett reacted with thumbs up emoji

[drawbacks]: #drawbacks

- A new concept for Rust syntax, adding to overall cognitive load

- Requires people escape markdown code fences with an extra backtick which they are likely not used to doing (or aware even exists)

Contributor

Does this refer to markdown inside multiline TOML strings? It's not really clear. Do any of Cargo's manifest fields even support markdown syntax?

Contributor

It may be referring to general usage outside of Rust source files, for example if I wanted to express this syntax here in this comment I would have to escape the backticks somehow (or indent by four spaces, but that doesn't allow syntax highlighting).

Contributor

Ooohh I see. In that case there's also the option of using the (older, I think?) syntax of prefixing the .rs snippet with four spaces instead of ````, though I'm not sure it it's as well-known as code fences, and it has the downside of not supporting language tags.

Contributor

Author

394387b adds some context. This is about sharing snippets over github or zulip. Since you are putting a code fence inside of a code fence, the outer one needs to use 4 backticks

- Parsers are available to make this work (e.g. `syn`)

Downsides

- The `cargo` macro would need to come from somewhere (`std`?) which means it is taking on `cargo`-specific knowledge

Contributor

For the macro approach, I don't think it would be necessary to embed any Cargo-specific knowledge in std. In every other approach the data here is stored in a glorified comment, which means we're fine if it gets thrown away as far as Rust is concerned. The macro here could simply expand to nothing, and trust that other tooling will parse the macro body as needed (which is easier than it sounds, since the macro body should just be treated as raw tokens rather than anything that needs parsing). Rather than calling it cargo, call it build! or meta! or something. Although I suppose the fact that it will still need to lex to Rust tokens might be limiting compared to a string or a comment, unless we want to make it magical.

Contributor

Author

I added mention of meta! in 42077b7. I don't bother exploring it due to the other problems with macros.

# Future possibilities

[future-possibilities]: #future-possibilities

- Treat `cargo` as the default infostring

What is the main reason for not including this in this proposal?

Contributor

Author

I clarified this a little in 842f722

Contributor

Author

Basically, we want to start with the absolute minimal approach and see what we feel needs to be relaxed from there (which is backwards compatible) rather than make a lot of assumptions and then regret them,

This comment was marked as resolved.

Contributor

Author

@est31

The suggested future state is to allow multiple specific specifiers and have them mean different things to cargo.

I had over looked this comment before. I think I was unclear; that is not a suggested future state but a "in case we need it". I tried to clarify that in 97ca09a

Member

While cargo script is a novel feature, my comment was about giving syntactic meaning to these comments like we do for doc comments. That applies everywhere.

To the end user? Yes. But the "end user" view of a language is broken all the time. Say when todo!() is added, suddenly each place that uses unimplemented!() is legacy. the times someone has /*% */ comments in their code should be quite limited. There is no need for an edition boundary there, but even if, we have the 2024 edition around the corner.

PatchMixolydic reacted with confused emoji

- When discussing with a Rust crash course teacher, it was felt their students would have a hard time learning to write these manifests from scratch

- Unpredictable location (both the doc comment and the cargo code block within it)

- From talking to a teacher, users are more forgiving of not understanding the details for structure data in an unstructured format (doc comments / comments) but something that looks meaningful, they will want to understand it all requiring dealing with all of the concepts

- The attribute approach requires explaining multiple "advanced" topics: One teacher doesn't get to teaching any attributes until the second level in his crash course series and two teachers have found it difficult to teach people raw strings

I don't fully buy into this objection and the one above: not everything has to be explained as a pre-requisite to it being used in a course.
Let's take println! as an example: the Rust book introduces println! in the first chapter but it doesn't provide any macro discussion beyond "println! is a macro, we'll talk about that later", where "later" is in chapter 19 (!).

I don't see what makes a cargo attribute any different here from println! or a #[derive(Debug)].

runiq, bestouff, and PatchMixolydic reacted with thumbs up emoji

Elaborating further: aren't we over-indexing on language newcomers here?

Aloso, TimNN, and runiq reacted with thumbs up emoji

Contributor

Author

Let's take println! as an example: the Rust book introduces println! in the first chapter but it doesn't provide any macro discussion beyond "println! is a macro, we'll talk about that later", where "later" is in chapter 19 (!).

While I'm speaking for myself and not that person, I feel there is a big difference between println! and

#![cargo(manifest = r#"
[package]
edition = "2018"
"#)]

With println! the name (mostly) makes sense and it was a weird ! after it that can be glossed over. Thats less the case with attributes.

Elaborating further: aren't we over-indexing on language newcomers here?

I suspect we under-index on language newcomers.

That aside, one of the big use cases for this specific feature is helping users of all levels figure out how to write the code they need, including looking at written material (books, blogs, messages from coworkers, etc).

I'm not saying we should discount newcomers entirely, but

including looking at written material (books, blogs, messages from coworkers, etc).

mostly involves copy-pasting snippets. After looking at one or two examples they should become accustomed to the syntax and able to use it, regardless of them understanding the ins and outs of attributes.

Contributor

Yes, just teach this as a magic syntax at first. Later on people will understand what it means.

Contributor

Author

While cargo script is a novel feature, my comment was about giving syntactic meaning to these comments like we do for doc comments. That applies everywhere.

To the end user? Yes. But the "end user" view of a language is broken all the time. Say when todo!() is added, suddenly each place that uses unimplemented!() is legacy. the times someone has /*% */ comments in their code should be quite limited. There is no need for an edition boundary there, but even if, we have the 2024 edition around the corner.

I'm not referring to user perception but how we parse the syntax.

And yes, we have an edition right around a corner but it would be very limiting if a feature like this can only b used with edition="2024" or newer.

I think it's worth noting that the proposed syntax exactly reverses the meaning of fenced code blocks: these "turn code off" but in markdown they turn code on.

I also think it's worth leaving a fly-by comment at least to advocate for not using markdown fenced code blocks (I will try to respond; personal life means I may not, thus fly-by. I care but time).

It seems to me that the biggest motivation for this is perceived learnability, but the effect is that in all contexts that use markdown not taking the extra step of escaping means breaking the code. This is called out in the RFC but if I don't have a lot of time and I want to give someone a repro of something I want to just copy/paste it and not remember that there's this extra specific Rust step. Someone has to deal with that. I am almost sure that even Rust veterans reporting issues who happen to be in a hurry at the moment are going to forget. It's an annoying chore for the reporter or an annoying chore for the maintainer and it's often already hard enough to get people reporting/finding time to deal with things. Of course if this is ever extended that goes from annoying to potentially much more problematic: should it ever not only be used at the top of files, then it's time to play hunt the problems and, from that perspective, begins closing doors to any future extensions. The only advantage I can see for making this choice is learnability, and I'm skeptical that the effect is as large as it seems (for one thing Jira doesn't really use markdown as it is; the implicit assumption is that a new coder knows markdown).

The static site generator alternative of using --- is just as learnable: "this is like insert-blog-thing, we will learn about all the content later". But without the foot-guns.

I will add that I was involved in another discussion like this back on the un-indented strings RFC, was told in that discussion that fences could still be escaped, and had to look it up for this comment anyway. Not only for fences, but how to write ``` by itself in a sentence. Not to mention that markdown isn't exactly a standard beast so who knows if what I used on GitHub works elsewhere? I don't personally know of problematic implementations but given the lack of standards/conformation to spec, are we even sure there isn't some issue tracker somewhere where "overloading" markdown like this isn't possible because it doesn't respect the escaping? It seems to me that the unspoken assumption is "if x claims to be using markdown, then x will work in this specific way" and that's never really been true.

TLDR: it really feels messy when there's other non-overloaded syntactical constructs that would work, most of which are even in the RFC. Learners only learn it once, the rest of us deal with it forever

BrenBarn, Aloso, and CraftSpider reacted with thumbs up emoji

Contributor

Author

TLDR: it really feels messy when there's other non-overloaded syntactical constructs that would work, most of which are even in the RFC. Learners only learn it once, the rest of us deal with it forever

If advocating for another solution, please also address the concerns with that format.

Contributor

Author

it's obviously up to the cargo team to decide on the final format, though IMHO the doc-comment alternative would be much more beneficial and easier to support across the ecosystem since no parser changes are necessary to support files with an embedded manifest.

Sorry, forgot to call this out earlier but I think an important note for anyone reviewing this RFC is that this is not a cargo team decision but a language team decision. There are subjective aspects of this. There are aspects where people will prioritize things differently than others. I've geared things towards what I expect will work for the language team and will adjust as they direct otherwise. This does not mean that input isn't useful but it has already improved the RFC and can help provide more perspectives for my recommendation and for the language team in their decision. Let's make sure we recognize that multiple experienced, well reasoned people can come to different conclusions on this and it might not look like our ideal (myself included).

I would be happy with any other option. I'm not addressing specific alternatives because all other alternatives have two extremely useful properties:

  • There are no questions over what a markdown implementation does.
  • At 2 AM or whatever, when I'm in a hurry, etc, and I have a repro or something like a useful devops script/utility, ctrl+a ctrl+v works for me and also doesn't introduce problems for the maintainer because I forgot a step which is unnecessary in any other language I know (or reversed, if I'm the maintainer, etc).

I am not aware of any other language which knowingly decided that requiring editing the program in order to paste into issue trackers etc. would be necessary because it chose to embed a subset of markdown because (as the RFC currently reads to me)it decided that was the most learnable option. And indeed, here is what I would consider a likely problem with learnability anyway:

  • Student 1 figures out something neat or a bug, or whatever and wants to share.
  • Student 1 goes to whatever's using markdown to share with student 2, but makes the mistake of not fixing the markdown.
  • Student 2 copy/pastes and now the program doesn't work and they have to fix it.

I'm not the person to ask about readability. I'm actually blind and in addition I'm one of those weirdos who doesn't find C++ that bad so any opinion I have there is likely not great. For the record I favor --- because it's the same as a code block (e.g. can copy/paste manifests back and forth) but will embed in any text markup I'm aware of as long as it's inside a code block (which we must do for other code anyway). I'm not a huge personal fan of prefixes because I can't easily select the non-prefixed text with the mouse, but eh, the 99% matter so if that's what people like go for it. My preference for isn't what I consider important, it's my strong preference against.

As an example of a "fast" (e.g. ctrl+a ctrl+v) system where I'd expect this to break, I've seen Slack treat markdown in a copy/paste as formatting. I believe you have to toggle something on to get markdown (I forget; it's useful for me because I can type formatting without clicking around). Snippets exist and clearly are the right answer but me and my coworkers don't bother with them half the time.

One great thing about Rust is that while it has edge cases, most of them don't compile and most of the rest don't break the program; markdown fences, by contrast, silently break sharing with others.

Contributor

Author

@bestouff that is a slight variant of one of the options and I'd recommend talking to the downsides if proposing it.

Note that you left off the manifest being a string. That is dependent on whether the attribute parsing code can correctly handle TOML syntax (now and in the future) being embedded with in it which I've not tried to verify. In using strings, unless we try to shift people's style to single quotes, it will likely require a raw string literal.

Contributor

@epage Yes sorry I found it simpler to learn than Alternative 3 but not sure it's rustc-parseable. I removed my comment but you were too quick !
For the record my proposal was:

#!/usr/bin/env cargo
#[cargo(
[dependencies]
clap = { version = "4.2", features = ["derive"] }
)]

use clap::Parser;

#[derive(Parser, Debug)]
#[clap(version)]
struct Args {
    #[clap(short, long, help = "Path to config")]
    config: Option<std::path::PathBuf>,
}

fn main() {
    let args = Args::parse();
    println!("{:?}", args);
}

Contributor

With my lang hat on, I don't see a reason we should RFC a feature that only allows cargo front matter, without specifying a path to generalizing it to other tooling. If we want to be conservative in what we stabilize, let's approach that in the stabilization rather than in the RFC.

At the language level we should acknowledge that not all projects get to use cargo, and the generalization here seems trivial to do in the RFC. Note that I'm fine with the RFC being conservative in other ways (only allowing one, right after the shebang, etc).

Taking off my lang hat now – using cargo sure makes it awkward to extend to embedding a second front matter with Cargo.lock. It'd be nice if the RFC addressed that. (Maybe just say "we can use cargo-lock for consistency with cargo"?)

- Users can edit/copy/paste the manifest without dealing with leading characters

Downsides

- Too general that people might abuse it

Contributor

As a general comment.. I don't agree with this as a downside. I'm not even sure what you mean by "abuse" since that varies greatly by use case. Will people embed 1,200 source files in a single file? In general probably not, but if they do, they probably have a good reason.

For example, if I want to run a minimizer like creduce (which supports Rust) to reproduce a compiler issue, it requires embedding my reproducer in a single file. Some tooling assumes this because C/C++ compilation units can always be embedded in a single file, unlike Rust. Then the tool can take care of minimizing for me. Obviously I wouldn't do this for every day software development, but in my mind it's a completely valid use case.

tmandry

added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label

Oct 7, 2023

Contributor

Author

With my lang hat on, I don't see a reason we should RFC a feature that only allows cargo front matter, without specifying a path to generalizing it to other tooling. If we want to be conservative in what we stabilize, let's approach that in the stabilization rather than in the RFC.

At the language level we should acknowledge that not all projects get to use cargo, and the generalization here seems trivial to do in the RFC. Note that I'm fine with the RFC being conservative in other ways (only allowing one, right after the shebang, etc).

Had considered loosening this up before any more official word from the lang team and realized there are syntax questions we don't really have an answer to (and our source of inspiration doesn't have good answers for). I expanded on this and also gave a suggested starting point for syntax if we decide to bring those decisions into this RFC.

Taking off my lang hat now – using cargo sure makes it awkward to extend to embedding a second front matter with Cargo.lock. It'd be nice if the RFC addressed that. (Maybe just say "we can use cargo-lock for consistency with cargo"?)

I figured what the string should be would be best left for #3502 (as noted here) which goes into more detail. Feel free to add your thoughts there!

If you feel that is a t-lang or a joint t-lang + t-cargo decision, we can talk about it!

So, I've been following along a bit and had a couple of comments.

I am also an instructor and a CS Ed researcher. I think that no matter what we do, we are going to increase the cognitive load for students unless it's extraordinarily explicit in its immediate interpretation. I don't think that code fences give you that.

The reason for that is 2-fold.

First, the code fence means that you have no distinguishing factor inline with the embedded toml. This means that students will basically have to switch modes when they are looking at different parts of the file. The only way this becomes even slightly viable is with syntax highlighting but even then, now you have to mix syntax highlighting for toml (which isn't a lot but does exist) and rust into one file adding to the complexity of ensuring you have a diverse enough color palette to maintain good contrast and instant recognition.

Second, code fences are not explicit in their intent. There's nothing about them that inherently says, oh, by the way, we have an embedded file here. I don't think anyone has really managed to do that phenomenally from what I've seen but I will grant that I haven't looked that hard for languages that embed other languages inside of them. JSX, Bash here docs, and Doc comments come to mind but that's all I've got currently.

My personal opinion is that if the goal is to ensure minimal increase in cognitive load, there are a few general approaches that I would suggest:

  1. Comments. Already discussed this extensively, my two cents is that comments already tell students and programmers that this isn't something that will be in the code. You can disregard it unless you're trying to do something with it at the moment. I will add that a danger I see with comments is that sometimes people see comments as vestigial and ignorable while an embedded cargo.toml is anything but that.

  2. If we're already adding support for the bash shebang, why not make it possible to embed files beneath it as well. Something like

#!/blah
#= cargo.toml
#+ stuff
#+ goes 
#+ here

But a bit more thought out and considered in the extension to the syntax.

  1. Allow specifying dependencies on the use statements instead and have that be collected into a file format that can be used by cargo or other build tools.

  2. Extend the extern keyword to allow either externing an embedded file for users to link to or consume somehow or to simply allow for users to again specify dependencies by saying extern crate lib with name and version and features and stuff.

All of these feel to me like they address the idea of remaining explicit and not locking people into cargo.

Final thoughts, even if we do use code fences or whatever, many students don't fully comprehend what a use or import or whatever your language adds to support including libraries and other code until much later. They just type it out because they know that if they don't have it, their code will break. I don't think it's wrong to simply provide a well documented template file to students at the beginning. Make sure it covers the common libraries they will be using. Then, through the semester/quarter you take opportunities to come back to it and refine their understanding of why the things in the template are there until they can begin adding dependencies on their own.

As it is, as an instructor, I prefer to autograde students programs and I would never in a million years use their dependency list. That's a quick way for a clever student to decide that they want to access the container that the autograder is running in and then try to either manipulate it, access the answers when they shouldn't using something I'm not familiar with, or do something more malicious or mischievous. It's the same reason playground doesn't let you use your own cargo.toml file. It's just not worth the risk. Especially when the machines I'm using are either mine, my universities, or a free services. All of which would probably not take kindly to a student mucking around inside of their stuff or not take kindly to what may appear to be a spike of usage from me.

weihanglo and Aloso reacted with thumbs up emoji

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

Turbo87

Turbo87 left review comments

pksunkara

pksunkara left review comments

bstrie

bstrie left review comments

jplatte

jplatte left review comments

bestouff

bestouff left review comments

Emilgardis

Emilgardis left review comments

tmandry

tmandry left review comments

programmerjake

programmerjake left review comments

est31

est31 left review comments

Aloso

Aloso left review comments

LukeMathWalker

LukeMathWalker left review comments
Assignees

No one assigned

Labels
I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. T-lang Relevant to the language subteam, which will review and decide on the RFC.
Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

None yet

17 participants

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK