Github add const-ub RFC by RalfJung · Pull Request #3016 · rust-lang/rfcs · GitH...
source link: https://github.com/rust-lang/rfcs/pull/3016
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
This RFC was drafted by me with lots of input from @oli-obk.
Define UB during const evaluation to lead to an unspecified result for the affected CTFE query, but not otherwise infect the compilation process.
Does this RFC tries to make unsafe
things not-actually-unsafe in const context?
It is obvious that if unsafe
code does something wrong, undefined behaviour happens during execution. So isn't it obvious that if const unsafe
code does something wrong, undefined behaviour happens during compilation? unsafe
is supposed to turn off checks and turn on freedom. Otherwise it's not a "real" unsafe.
Another extreme alternative would be to say that UB during CTFE may have arbitrary effects in the host compiler, including host-level UB.
Maybe this way should be the main way instead of an alternative?
unsafe
is supposed to turn off checks and turn on freedom. Otherwise it's not a "real" unsafe.
unsafe
doesn't and never did "turn off checks and turn on freedom". It grants access to additional language constructs that can cause UB or break memory safety if mis-used. That's it.
(All the invariants for the non-unsafe
language constructs still must be upheld.)
More specifically, it grants access to constructs which would have to be checked by MIRI as part of their compile-time execution for UB to be detected by the compiler.
The "freedom" turned on by unsafe
is the freedom to shoot yourself in the foot. ;)
But it's still just as wrong to cause UB inside unsafe
than anywhere else. It's just (supposed to be) impossible outside unsafe
.
Does this RFC tries to make unsafe things not-actually-unsafe in const context?
It depends on what you mean by "not-actually-unsafe". unsafe
is still the only way to cause these "you caused UB" errors during CTFE. The RFC also changes nothing about the fact that we do not make stability guarantees for incorrect unsafe code, including during CTFE. It just says that, for now (until a follow-up RFC), incorrect unsafe
may not go completely crazy during CTFE.
Maybe this way should be the main way instead of an alternative?
Why would that be a good idea?
+1 to this.
Defining UB to fail constant evaluation is a good idea, imo.
A few questions though:
- What about constant promotion. Would a failure in constant evaluation be downgraded to runtime whenever it would be semantically permitted? It can be noted that code may never be evaluated at runtime, so making it a compile time error may be incorrect.
- Should this be guaranteed to extend to intrinsics? Intrinsics aren't stable, and vary by compiler. In C++ it is unspecified whether any UB in the standard library is diagnosed. Perhaps saying the same about the rust standard library would be beneficial
- Perhaps increasing the rules for dereferencing raw pointers to be closer to C++'s requirements (either needs to be a pointer constant expression, and not a null or passed the end pointer, or point to an "object" for which the lifetime started since the evaluation of the constant expression). This may make the remainder of the requirements trivial. This may, however, be undermined by arbitrary pointer casts? Are those allowed in
const
presently?
What about constant promotion. Would a failure in constant evaluation be downgraded to runtime whenever it would be semantically permitted? It can be noted that code may never be evaluated at runtime, so making it a compile time error may be incorrect.
Promotion is completely independent. It should not be possible to cause UB in promotion, as the analysis that allows promotion essentially just allows basic math and some field accesses.
Should this be guaranteed to extend to intrinsics? Intrinsics aren't stable, and vary by compiler. In C++ it is unspecified whether any UB in the standard library is diagnosed. Perhaps saying the same about the rust standard library would be beneficial
The RFC states that any UB in intrinsic invocation must be caught, so exact_div
will error if you pass 0
as the second argument or if a remainder remains. Since we must implement something in the interpreter for all inputs to the intrinsic that is being emulated, we would have to add additional code just to not error. It is easier to just error.
Perhaps increasing the rules for dereferencing raw pointers to be closer to C++'s requirements (either needs to be a pointer constant expression, and not a null or passed the end pointer, or point to an "object" for which the lifetime started since the evaluation of the constant expression). This may make the remainder of the requirements trivial.
Restricting dereferences beyond "must be in a live allocation" is actually harder than not doing so. I don't think Rust has any notion of an "object".
This may, however, be undermined by arbitrary pointer casts? Are those allowed in const presently?
Even if we disallowed pointer casts, you could always use transmute, so this is impossible to protect against afaict.
Did I understand correctly that for following program:
const x: String = unsafe {std::hint::unreachable_unchecked()};
fn main() {
printn!("{}", x);
}
RFC states that
- Current rustc version must reject this program.
- Any compilter is allowed to reject this program.
- Some future rustc version (or alternative implementation) however can accept this program (and say that is's behavior is never defined) and miscomplite it pretty awfully.
- But anyway the compiler itself can not perform arbitrarily bad things like RCE or running
rm -rf /
on the host?
For the specific example (unreachable_unchecked
) the RFC states that
- Incorrect use of compiler intrinsics (e.g., reaching an
unreachable
or violating the assumptions ofexact_div
).
will always be an error, and all compilers must emit an error
But there are cases of UB that we do not reliably detect:
- Dereferencing unaligned pointers.
- Violating Rust's aliasing rules.
- Producing an invalid value (but not using it in one of the ways defined above).
- Anything else from https://doc.rust-lang.org/reference/behavior-considered-undefined.html
A compiler may choose to emit an error, warning or nothing at all if it encounters this kind of UB. This means on 64 bit systems (not on 16 or 32 bit, due to transmute size checks) you are free to do
const x: String = unsafe { std::mem::transmute((1, 2, 3, 4, 5, 6)) }; fn main() { print!("{}", x); }
which actually compiles on stable and will die horribly.
Since we must implement something in the interpreter for all inputs to the intrinsic that is being emulated, we would have to add additional code just to not error. It is easier to just error.
I'm not saying that it should not be allowed to error, rather it should generally be unspecified. It may be entirely unreasonable to do anything other than error. However, there are some cases where diagnosing errors for some intrinsics may be difficult itself.
Even if we disallowed pointer casts, you could always use transmute
The issue with pointer casts and transmute involving pointers actually goes with usize, and was something I was made aware of when I looked to C++'s std-proposals list to see if it was viable to lift some reinterpert_cast
restrictions. It may not be possible to implement constant conversion to integer values, as the actual runtime bit pattern may not be known at compile time (for example, transmute, if permitted).
Promotion is completely independent
And I presume that if Promotion and required Constant evaluation are evaluated together, it would still be permissible (and in fact, required) that failed promotion downgrades to runtime.
Producing an invalid value (but not using it in one of the ways defined above).
I would assume that producing an invalid value should always be an error, and, absent pointer transmutes, is probably quite easy to diagnose.
Another question:
Would it be permitted to detect UB in any standard library function, again left as unspecified whether or not it does so?
However, there are some cases where diagnosing errors for some intrinsics may be difficult itself.
we have not encountered this yet, so we could also just wait until it comes up with a new intrinsic
And I presume that if Promotion and required Constant evaluation are evaluated together, it would still be permissible (and in fact, required) that failed promotion downgrades to runtime.
yes, this is what happens with &(1/0)
which has a compile-time warning (not an error) and a runtime panic before we ever encounter the constant that was never successfully computed.
I would assume that producing an invalid value should always be an error, and, absent pointer transmutes, is probably quite easy to diagnose.
it's very expensive to do that check, we could, but it would slow down a lot of totally fine code. We do check final values of constants though, just not the intermediate states. So you could have a bool with the bit pattern 3
, and that's fine, until you try to branch on it, then you get an error. But if you just convert it to an integer, there's no error.
Would it be permitted to detect UB in any standard library function, again left as unspecified whether or not it does so?
Depends on the method you're suggesting. Ideally you'd just create a new intrinsic and invoke that. It could be as simple as invoking std::intrinsics::assume(your_condition_that_must_be_true);
Since we must implement something in the interpreter for all inputs to the intrinsic that is being emulated, we would have to add additional code just to not error. It is easier to just error.
I'm not saying that it should not be allowed to error, rather it should generally be unspecified. It may be entirely unreasonable to do anything other than error. However, there are some cases where diagnosing errors for some intrinsics may be difficult itself.
@chorman0773 @oli-obk
Some code examples what things compiler intrinsics can do should help to provide insight into the RFC. I guess one can group them into categories? The reference does not explain what compiler intrinsics do.
Promotion is completely independent
And I presume that if Promotion and required Constant evaluation are evaluated together, it would still be permissible (and in fact, required) that failed promotion downgrades to runtime.
I dont understand what you mean with downgrading. It is not in the promotion RFC.
"only const fn that were explicitly marked with the #[rustc_promotable] attribute are subject to promotion"
"Those functions must be manually reviewed to never raise CTFE errors."
Producing an invalid value (but not using it in one of the ways defined above).
I would assume that producing an invalid value should always be an error, and, absent pointer transmutes, is probably quite easy to diagnose.
Every check of things you dont use is a performance loss, so you dont want to do it "in release builds".
so we could also just wait until it comes up with a new intrinsic
Keeping in mind that the intrinsics provided by the compiler are not specified as part of the stable language (with the possible exception of the limited ones marked rust_const_stable
), ::__lccc::builtins::cxx::__builtin_launder
is potentially an issue. It's not an intrinsic in the rust sense, per se, it's not obtained using extern"rust-intrinsic"
(the intrinsics that can be live under ::__lccc::builtins::rust
), but it's implemented purely by the compiler, and has equivalent semantics to the C++17 standard library function std::launder
. It's allowed during constant evaluation if it's parameter is a constant expression, and I believe contains no union subobjects. If it's possible to create arbitrary pointers, enforcement of it's undefined behaviour (reachability restrictions in particular) could prove difficult. I'd have to look into the equivalent code in clang, and see how other interacting rules could alter how I perform the appropriate checks (previously I did not have to consider this, as in C++, it's unspecified whether the undefined behaviour of std::launder
is diagnosed).
I dont understand what you mean with downgrading.
What I mean here by promotion is I want to convert constant expressions (which is a property of the expression, not of when it is evaluated) to evaluate at compile time, similar to what a C++ compiler can do. If attempting to perform constant evaluation of a non-madatory constant expression fails, "downgrade" here means that it just abandons constant evaluation and leaves it to runtime.
Keeping in mind that the intrinsics provided by the compiler are not specified as part of the stable language (with the possible exception of the limited ones marked rust_const_stable)
Intrinsics are never stable. The rustc_const_stable
only specifies that the standard library may use this intrinsic in const fn that are stable.
I don't think Rust's memory is typed, and thus something like C++'s std::launder
does not need to exist. I can't really come up with an example where the intrinsic itself would do something that is UB without us being able to detect it. It could always produce a result that if used will cause UB, but that's not UB happening in the intrinsic.
I don't think Rust's memory is typed, and thus something like C++'s std::launder does not need to exist.
As the name of the intrinsic in question may imply, it's an intrinsic provided by lccc
(a work in progress implementation of rust designed primarily as a competitor for rustc), for the purposes of supporting C++ code (which lccc can also compile). When I say the intrinsic matches the semantics of std::launder
I mean exactly. lccc does operate on typed memory (this is permissible, provided observable-behaviour remains intact as-if it operates on untyped memory). The specific issue is that I don't know how the reachability restriction of std::launder
(which is passed onto the intrinsic because the reachability problem is actually something I am very much interested in optimizing) will interact with other features of rust, either current, in discussion, or future (particularily things like rust-lang/unsafe-code-guidelines#2).
What I would propose is the following:
Instead of specifying that all intrinsics are diagnosed, the following should be required instead:
- It is unspecified if undefined behaviour caused by an intrinsic is diagnosed
- Implementations shall diagnose undefined behaviour caused by or in calls to any of the following standard library functions
core::hint::unreachable_unchecked()
core::mem::transmute()
core::mem::zeored()
- and any other unspecified functions the implementation chooses.
a work in progress implementation of rust designed primarily as a competitor for rustc
Oh awesome! I didn't know about that.
Won't your const evaluator need to support checking whether a pointer points to a valid object of the appropriate type anyway?
In that case the intrinsic could just query that information. Basically you'd need to tag the const evaluator's virtual memory with the original type when the object was placed in that memory and drag that tag along with any memory moving operations.
Implementations shall diagnose undefined behaviour caused by or in calls to any of the following standard library functions
Ah interesting, we stop focusing on intrinsics, which are not part of the language anyway and only address the stable standard library API and language features. I wonder if we can specify this in a way that's better than listing specific things that are caught, but would not be averse to just having an explicit list that we grow.
One way to specify this in a general way could be to merge the rules for operations/branching and "intrinsics" by including the latter in "operations". Just like a + undef
is not ok, even if an undef
value by itself being moved around but not inspected is ok, intrinsics are simply treated as operations stating that they must not make decisions on an undef value or a value that is invalid for its type (e.g. a bool
with value 3
or a null/OOB reference).
As a concrete example: I guess under this definition it would be ok for you to ignore overflow in <*const T>::offset
, even if that is UB, but you would not be allowed to ignore undef
input. Not sure if we can come up with a wording that doesn't also require us to validate the input and result of a transmute
What about constant promotion. Would a failure in constant evaluation be downgraded to runtime whenever it would be semantically permitted? It can be noted that code may never be evaluated at runtime, so making it a compile time error may be incorrect.
An RFC for this is under development.
Should this be guaranteed to extend to intrinsics? Intrinsics aren't stable, and vary by compiler. In C++ it is unspecified whether any UB in the standard library is diagnosed. Perhaps saying the same about the rust standard library would be beneficial
Good point -- I wasn't fully considering other implementations with a different set of intrinsics when writing the RFC.
I think intrinsics should likewise either error or do the "obvious" sensible thing (and if there is nothing "obvious" they must error). I am not sure how to make that wording more precise though. Do you think this makes sense?
As the name of the intrinsic in question may imply, it's an intrinsic provided by lccc (a work in progress implementation of rust designed primarily as a competitor for rustc), for the purposes of supporting C++ code (which lccc can also compile). When I say the intrinsic matches the semantics of std::launder I mean exactly. lccc does operate on typed memory (this is permissible, provided observable-behaviour remains intact as-if it operates on untyped memory).
If your memory behaves "as-if" it is untyped, then there cannot be any UB caused by "type mismatches" (as those cannot occur in untyped memory), so this should not be a concern relevant for this RFC.
Implementations shall diagnose undefined behaviour caused by or in calls to any of the following standard library functions
I don't think we want to go over all stdlib functions here and classify them like this. Plus, detecting UB even for the ones you stated would be very expensive (in the way rustc is currently architected), so this is not a good option IMO.
The RFC deliberately says nothing about library UB, it is only about language UB. The issue is that intrinsics make this line a bit harder to draw that I'd like.
will always be an error, and all compilers must emit an error
Well, the RFC also says that this is not a stable guarantee. So it will always be an error until another RFC changes the rules.
then there cannot be any UB caused by "type mismatches"
I should rephrase: to the extent necessary to emulate the observable behaviour of the rust abstract machine, it operates as-if memory is untyped (which, pending answers to UCG #2, #84, and #236, merely requires that strict-aliasing is not applied to code compiled from rust). This is distinguishable via the launder builtin, which notably is absent from rust, and present only because it's a C++ intrinsic that needs to communicate with the optimizer.
so this should not be a concern relevant for this RFC.
It was primarily used as an expository argument against blanket requiring intrinsic UB to be diagnosed. Any compiler could come up with some wild intrinsic that has hard-to-diagnose UB. In this case, it's existence is serving a compelling interest, implementing the std::launder function from C++ in a way that generally preserves the optimizations it serves to inhibit.
Plus, detecting UB even for the ones you stated would be very expensive (in the way rustc is currently architected), so this is not a good option IMO
I believe the three I identified would already be diagnosed under the intrinsic rules (transmute is an intrinsic, zeroed calls init, an intrinsic, and unreachable_unchecked is an intrinsic, and rather easy to diagnose, imo). Also, production/copies/moves of trivially-invalid values should be some of the easiest to diagnose, as I presume rustc knows about the bitwise validity invariants of every type.
I think intrinsics should likewise either error or do the "obvious" sensible thing (and if there is nothing "obvious" they must error)
I stated, I think it should be unspecified whether intrinsics are diagnosed, though maybe promoting it to requiring the implementation to document the intrinsics where it might not diagnose. I'm sure any implementation would choose to diagnose as much as is possible and reasonable to implement. If this is a reasonable direction, I'd propose the following wording:
An implementation may support the constant evaluation of an implementation-defined set of intrinsics. For each such intrinsic which can cause undefined behaviour, the implementation shall document any which might not diagnose such undefined behaviour.
The RFC deliberately says nothing about library UB, it is only about language UB. The issue is that intrinsics make this line a bit harder to draw that I'd like
I'd argue that at the least, it should say that it is unspecified whether library UB is diagnosed. That way, implementations can do so.
I believe the three I identified would already be diagnosed under the intrinsic rules (transmute is an intrinsic, zeroed calls init, an intrinsic, and unreachable_unchecked is an intrinsic, and rather easy to diagnose, imo). Also, production/copies/moves of trivially-invalid values should be some of the easiest to diagnose, as I presume rustc knows about the bitwise validity invariants of every type.
mem::zeroed
/mem::uninitialized
UB arises from "UB due to invalid values". Detecting that is fairly non-trivial -- but we do have the required machinery because Miri needs it. However, it is also very expensive, so I quite explicitly do not want to mandate that (as the RFC says).
Having a special case just for mem::zeroed
/mem::uninitialized
would be possible, but I see no good reason to special-case just this. For example, MaybeUninit::zeroed().assume_init()
is implemented without any intrinsic -- it would be strange to require detecting UB from mem::zeroed()
but not from this.
EDIT: Actually, zeroed()
just calls MaybeUninit::zeroed().assume_init()
. So there is no intrinsic involved in either of them. And FWIW, mem::uninitialized()
is also implemented without an intrinsic. Of course other implementations can choose to provide intrinsics for this, but it is by no means necessary. (These implementations do not rely on anything rustc-specific.)
It was primarily used as an expository argument against blanket requiring intrinsic UB to be diagnosed. Any compiler could come up with some wild intrinsic that has hard-to-diagnose UB.
The question is, is it easy to implement this intrinsic just ignoring UB? I assume this can be treated like we treat e.g. aliasing violations, where checking memory accesses for UB is actually highly non-trivial due to aliasing constraints, but there is "obvious" behavior to fall back to if this UB cannot be detected: just perform the access ignoring the aliasing rules.
I'd argue that at the least, it should say that it is unspecified whether library UB is diagnosed. That way, implementations can do so.
That's fair. OTOH, there is a tension here between documenting precisely what rustc does, and documenting what should apply to all implementations.
I have basically rewritten the RFC based on the earlier draft that did not mandate any UB checking. There's also a section explaining what rustc
currently does; I hope I made it clear enough that this is basically just an example of what an implementation can choose to do, but not a stable guarantee.
@rust-lang/lang please let me know what you think. :)
@rfcbot cancel
Canceling the FCP because the RFC has been rewritten.
Copy link
rfcbot commented 24 days ago
@nikomatsakis proposal cancelled.
@rfcbot fcp merge
Not everyone has read this yet (including me, ha!) but I'm going to propose FCP anyway based on the summary of what is in it, and because this has been extensively discussed.
Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged team members:
No concerns currently listed.
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
Folks in @rust-lang/lang can check their boxes.
I'm assuming that if we hit const ub then rustc will still produce the same binary outputs bit for bit? It would be unfortunate if we couldn't get reproducable build outputs. (I.e. same file hashes)
In rustc
, we will. I don't think the RFC mandates that, though.
Copy link
rfcbot commented 16 days ago
This is now entering its final comment period, as per the review above.
Copy link
rfcbot commented 6 days ago
The final comment period, with a disposition to merge, as per the review above, is now complete.
As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.
The RFC will be merged soon.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK