Add `core::mem::offset_of!` RFC by thomcc · Pull Request #3308 · rust-lang/rfcs...
source link: https://github.com/rust-lang/rfcs/pull/3308
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Add core::mem::offset_of!
RFC
#3308
Conversation
I started this a jillion years ago, and have been meaning to clean it up and submit it for a long time. The intention is to remove most (ideally all) need for hardcoding offsets, computing offsets manually, or using {memoffset,bytemuck}::offset_of!
.
CC @JakobDegen, @saethlin (since they provided some amount of input on this during RustConf)
Member
Author
thomcc commented 10 days ago
Oh right, CC @rust-lang/wg-const-eval since this RFC allow using this macro during const-eval. |
added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label
I am personally very much in favor of something like this. I do wonder if the lang team should be roped into this? From looking at And maybe that's all okay. I don't know. I don't think I mind having this as a macro. But that it might be something the lang team wants to evaluate too. (And they might just say, "nah libs-api can deal with this." But they might not.) |
Contributor
Lokathor commented 9 days ago
As a user, I'd prefer if the docs could say "this info isn't some goofy library calculation, this info is directly from the compiler's data structures, guaranteed accurate, guaranteed no UB, guaranteed no runtime cost." Which is to say, yeah, "language verb" is probably a better framing. |
added the T-lang Relevant to the language subteam, which will review and decide on the RFC. label
afetisov commented 9 days ago •
One important thing is missing from the reference: the layout of For example, the reference explicitly states that the layout of I think the RFC should have some motivation for allowing Being a macro, One possibility which isn't considered in the RFC is to use special constants instead of an expression macro. Now, defining those constants manually is mentioned, and has many problems, but what if they are defined by a compiler macro? For example, there could be some
For better ergonomics, the constants may be accessible via some expression macro
This implementation is also very hard to provide in a library crate, which provides a stronger argument for including it into the |
I've always wanted this feature, but I'm wondering what the rational would be for not supporting |
Yeah, I didn't know what to tag this as. I'm certainly not opposed to roping in t-lang, and added the label.
The intention is that's how this would be implemented (that's what the bit at the end of the reference is intended to mean), but "no runtime cost" is kind of difficult to define in a way that still works in implementations that don't make as firm of a distinction between compile time and runtime. Anyway, this felt likely to bog down the implementation under bikeshedding as to what this means, and in practice "evaluates to a constant expression" should be sufficient. I also don't want to forbid a pure-library implementation, although I believe implementing it as a builtin macro is likely to be better for several reasons (error messages, compile-time performance, reliability, ...).
Hm, largely because it prevents a pure library implementation. That said, I don't really care about that, and perhaps with enough cleverness it would still be possible. |
Member
Author
thomcc commented 9 days ago
@afetisov (your comments are long enough to get a separate reply)
The use cases for this are largely the same as the use cases for using More generally, it is a goal of this RFC to remove the reasons people have to implement
In several ways it is already fixed at compile time (size and alignment are, which limits the size of the overall structure). In other ways, it is fixed at runtime. Allowing use in const evaluation prevents an implementation from doing things like randomizing field offsets at startup, unless it also performs modifications to adjust My suspicion is that many implementations which can randomize field offsets at startup can recompute constant values (because they're likely interpreted), but it's plausible that there's some middle-ground. Either way, this is more or less what I'm getting at by the first item under "Drawbacks" -- previously this information was only available at runtime, and this allows accessing it at compile time.
This kind of optimization (and more aggressive ones, like SROA) would still be allowed in exactly the same cases they are currently allowed. If a pointer to an instance of a structure does not escape, the offsets of its field do not matter. This is true regardless of the
Are you sure? From my experience with the code in Perhaps someone familiar with compiler internals cares to weigh in?
I'll add it to the RFC as an alternative, but I am not a fan of macros that expand to items not present in the source (you'll note that none of the existing builtin
I don't think there's any reason a derive macro would have more information here than is available to a builtin macro. They're essentially the same thing, and have essentially the same limitations.
I don't think this is true -- if I'll try to update drawbacks/alternatives/motivation with this feedback. |
Yes, the layouts are fixed. However, the as-if rule applies anyway. If the compiler can prove that the user can't tell the difference either way, it's always free to do whatever it wants as long as the behavior of the program is correct.
|
afetisov commented 9 days ago
The pointer may not be computed explicitly, but perhaps there is still some roundabout way to get it via unsafe code. Trying to bound the behaviour of unsafe code hits Rice's theorem fast, and I can't imagine what would a reasonable definition of UB for Unless the reference explicitly guarantees that a type is always represented in the same way in memory, I would feel very uneasy about exposing
If the layout stability within a single artifact is not guaranteed, then all such calculations are borderline UB (their results can not be used for any unsafe operations in any nontrivial case).
Thank you, but that doesn't carry much weight unless it passes an RFC and is specified in the Reference. For all I know, that's just the current implementation restriction.
I would like to see specific use cases. I can't imagine how field offsets would help with (de)serialization of types with unspecified layouts, you certainly can't just transmute a slice of incoming data. Nor can I imagine how layout-unstable types could possibly be used in FFI, except as opaque types with no access to inner fields.
Well, that depends on the implementation and power of |
Contributor
Lokathor commented 9 days ago
I would like to just put in the standard reminder that the Reference is a best effort documentation of how the compiler/language currently works, it's not actually an authority. If the reference and the compiler conflict about a point, it's just as likely that the reference will be updated as the compiler will be. |
JakobDegen commented 9 days ago •
Fortunately, the language doesn't have to worry about this. People writing compiler optimizations will be responsible for proving that the results of the optimization are not observable.
I don't really know how to respond to this besides saying that this is what it means for a type to have a layout. If the layout is not fixed, then it is not a property of the type, but a property of some other thing |
Contributor
Lokathor commented 9 days ago
I got one: With the GPU you often have to send structured data and the way you do it is basically sending a bunch of bytes along with a description of what the fields of each field within each struct is. Details vary by API so I don't want to get into it too heavily, but you'd tell the API something like "the 0th field is an [f32;2] at offset 0, the 1th field is an [f32;3] at offset 8" or whatever, and then the GPU would read your buffer that way. |
afetisov commented 9 days ago
I guess my question can be rephrased as "is it guaranteed that |
Member
Kixiron commented 9 days ago
For something to have an unspecified layout it by definition must have a layout |
CryZe commented 9 days ago •
Well of course it has a layout. But the question is if there might be situations in the future where the compiler might want to optimize the layout based on the particular case where the variable is used, just like how a struct might never be on the stack or heap at all and instead be represented entirely in registers. Similar optimizations could be possible where the struct is sorted in a particular way in one function and differently in another. Stabilizing this macro for |
This macro is already writable with stable Rust, with 2 library implementations of varying QoL. A compiler that wishes to reorder structs (in a way as described above) must already ensure that such reordering does not impact the current library implementations. (This RFC, as written, allows a library-only implementation, even if that might not be the preferred route). |
CryZe commented 9 days ago
You certainly can write the macro in stable Rust, but you still need unsafe to actually use that offset. So there's still a possibility that this offset is actually unsound to use then on a repr(Rust) type. However it's possibly indeed too late to allow such an optimization as people are probably relying on it enough already to break that. |
Contributor
Diggsey commented 9 days ago
I've had a crate which wraps the concept of a field offset in a safe API which has existed for 6 years |
Member
RalfJung commented 8 days ago
This assumes that everything we need to know about struct layout can be described by size, alignment, and field offsets. I agree that should be the case, but if that is an intended usecase of the RFC, it should be stated explicitly. |
Member
programmerjake commented 8 days ago
since the offset only depends (or can be made to only depend) on the other |
Member
programmerjake commented 8 days ago
perhaps |
Member
Author
thomcc commented 8 days ago
Something like That said, I suspect |
That would not work -- @thomcc I feel like it makes sense to support asking for offsets of all the sized fields on an unsized type -- if we want to commit to those always being statically known. |
Member
programmerjake commented 8 days ago
ah, yeah. |
Note that while this example shows a combination that supports array indexing, |
||
it's unclear if this is actually desirable for Rust. |
||
## `memoffset::span_of!` Functionality |
Isn't the span easily computable using offset_of!
and size_of
and maybe align_of
(if we include padding) or would this be something more subtle?
It is if you have a type_of
or so to compute the type of the field.
Ah, I hadn't considered the need for that. Maybe it's a reasonable future addition.
It turns out you can do it, it's just hacky and non-obvious. Here's a version implemented using just memoffset::offset_of!
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3f8a716592b64b551364105a3087b955. This is not really robust but you get the idea.
(Somewhat inspired by some code @danielhenrymantilla shared for a C-style sizeof that supports passing in a value rather than a type)
That is, instead of `offset_of!(Foo, quank.zoop.2.quank[4])`, you'll have to |
||
compute the offsets of each step manually, and sum them. |
||
4. Finally, types other than tuples, structs, and unions are currently |
What about arrays? It's pretty easy to compute, but it seems like it would make sense to allow something like offset_of!([T; N], 2)
as a shorthand for 2 times the size of T
rounded up to the alignment of T
.
Size is always a multiple of alignment, so this is just 2 * size_of::<T>()
. I suppose we could allow it for consistency, but I don't really see a good motivation tbh
Member
Author
thomcc 6 days ago
This isn't a field so I'm really not a fan of allowing this. I kind of hint at it in the nested fields part of the future work, though. I think without nested field access, it's not worth the inconsistency that it introduces.
1. This exposes layout information at compile time which is otherwise not |
||
exposed until runtime. This can cause compatibility hazards similar to |
||
`mem::size_of` or `mem::align_of`, but plausibly greater as it provides even |
||
more information. |
Note that with rust-lang/rust#96240 (const_ptr_offset_from
) being merged for 1.65, this will already be available in a const context - it was the only feature gate left for memoffset::offset_of!()
to be able to be const (as long as the type doesn't have UnsafeCell somewhere, that's const_refs_to_cell
).
afetisov commented 7 days ago •
A couple extra objections against exposing offsets for
TL;DR: exposing offsets will effectively ossify the layouts of types, thus it must not happen on types with not fully defined layouts. Extending |
@afetisov none of those things don't apply just as much to
This is already the case.
This is not really the case in practice; the compiler does not gratuitously change layouts just because it can. That's not to say that this doesn't happen at all of course, but executing the same build twice won't just randomly change things. (and in any case, there is still a requirement for reproducible builds in general)
This is already the case today with |
ojeda commented 7 days ago •
The offsets can still change from one build to the next (as you mention in the second point), so they shouldn't be relied upon by end users. But that is fine, because there are use cases where one needs to know the offsets but the particular layout does not need to be predictable (in fact, sometimes it is best to not have a predictable layout at all, not just for debugging, and a flag like
Even if
No, UB is different (and worse). If anything, you could compare it with using compile-time random numbers, or proc macros that depend on external events with unpredictable timing or race conditions, etc., and those are all fine in safe Rust (that is not to say reproducible builds should be broken without good reason, of course). |
afetisov commented 6 days ago
Maybe. But it doesn't matter since it's already a stable API. That doesn't mean that constraining the layouts further is a good idea. At the very least it's a kind of change which requires a separate RFC, and shouldn't be smuggled with an ostensible library-level change. It's also quite different, because without having I find it much more likely that people will write code which depends on specific field ordering. On the other hand, I have yet to see an explicit example in code where it would be warranted to deal with offsets in
This is about as convincing as "I can already read the contents of uninitialized memory, watch:
None of that is relevant. The last thing I would want is for the build to crash or for the program to misbehave just because the compiler was updated. Rust goes to great lengths to ensure that toolchain upgrades are as safe and seamless as possible, and that change would be directly undermining those guarantees. The current RFC is way too cavalier with the analysis of drawbacks and alternatives for types which are not Also, I don't think anyone has an issue with adding |
In fact it does not even "rely on" implementation-specific behavior, it merely exposes implementation-specific behavior in a convenient way. It is possible to write code using |
I don't think that allowing it to be used on |
Member
programmerjake commented 6 days ago
imho an advantage of allowing
|
CryZe commented 6 days ago
That just sounds to me like Repr<"C"> or so should be a trait. |
Member
programmerjake commented 6 days ago
but why limit it to |
Member
Author
thomcc commented 6 days ago
While it's a bit different, I've mentioned this in prior art and rationale/alternatives. I agree that field projection / ptr-to-member libraries are a good example of stuff you can build on top of |
Member
Kixiron commented 5 days ago
Yah, another strong argument for allowing any repr is for enabling users to implement something like |
Member
programmerjake commented 5 days ago
that's not necessarily sound, because a
Notice how transmuting swaps the field values? That's because they have different layout.
|
comex commented 5 days ago
Does |
Actually, I have no idea what your example is trying to show, that's totally expected behavior and has nothing to do with uninit projection. |
Member
programmerjake commented 2 days ago
ah, yeah, i misunderstood what you meant. |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
No one assigned
None yet
No milestone
Successfully merging this pull request may close these issues.
None yet
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK