Add `core::mem::offset_of!` RFC #3308

Conversation

Member

thomcc commented 10 days ago •

edited

I started this a jillion years ago, and have been meaning to clean it up and submit it for a long time. The intention is to remove most (ideally all) need for hardcoding offsets, computing offsets manually, or using {memoffset,bytemuck}::offset_of!.

CC @JakobDegen, @saethlin (since they provided some amount of input on this during RustConf)

Rendered

ibraheemdev, DrMeepster, rodrimati1992, taiki-e, JakobDegen, Nilstrieb, evanrichter, xFrednet, saethlin, BurntSushi, and 16 more reacted with thumbs up emojiibraheemdev, saethlin, BurntSushi, mikeleany, mejrs, wesleywiser, Darksecond, TennyZhuang, riking, GrayJack, and tmccombs reacted with heart emoji All reactions

Member

Author

thomcc commented 10 days ago

Oh right, CC @rust-lang/wg-const-eval since this RFC allow using this macro during const-eval.

thomcc

added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label

10 days ago

Member

BurntSushi commented 9 days ago •

edited

I am personally very much in favor of something like this.

I do wonder if the lang team should be roped into this? From looking at bytemuck's implementation, it seems clear to me that this is something that can be implemented in library code. But, invoking it with a type as an "argument" is also something that feels outside of libs-api and more in lang territory. That is, the way this macro is used, it looks like a new language verb that's defined as a macro in order to get around adding a new language verb. (I don't mean that as any sort of accusation of subterfuge!)

And maybe that's all okay. I don't know. I don't think I mind having this as a macro. But that it might be something the lang team wants to evaluate too. (And they might just say, "nah libs-api can deal with this." But they might not.)

Contributor

Lokathor commented 9 days ago

As a user, I'd prefer if the docs could say "this info isn't some goofy library calculation, this info is directly from the compiler's data structures, guaranteed accurate, guaranteed no UB, guaranteed no runtime cost."

Which is to say, yeah, "language verb" is probably a better framing.

ChayimFriedman2, Darksecond, riking, madsmtm, tmccombs, and izik1 reacted with thumbs up emoji All reactions

thomcc

added the T-lang Relevant to the language subteam, which will review and decide on the RFC. label

9 days ago

afetisov commented 9 days ago •

edited

One important thing is missing from the reference: the layout of #[repr(rust)] data is unspecified, but is it even true that the layout is fixed? I can totally imagine a compiler optimization that e.g. uses different layout for different local variables of the same type, provided they do not "escape" in some sense (e.g. a type T is private and no explicit pointers to T are created). Without such guarantees any attempts to manipulate individual fields via offsets are meaningless (at least unless the offset is used for accessing the same "instantiation" of T, but that's generally impossible to enforce).

For example, the reference explicitly states that the layout of #[repr(rust)] types can vary between different compilations. This much more tame than the possibility above (vary with each use of type), but may also lead to subtle errors.

I think the RFC should have some motivation for allowing offset_of! on non-#[repr(C)] types. "memoffset does it" isn't sufficient justification, since memoffset has no way to enforce the layout requirements on the supplied types.

Being a macro, offset_of! also can't distinguish between structs and enums, or between fields and arbitrary garbage. I'm afraid this will lead to some very poor post-monomorphization errors. If offset_of! calls are embedded in other macros, the errors can be super cryptic.

One possibility which isn't considered in the RFC is to use special constants instead of an expression macro. Now, defining those constants manually is mentioned, and has many problems, but what if they are defined by a compiler macro? For example, there could be some #[derive(FieldOffsets)] derive macro which would provide the constants with the values substituted by the compiler, e.g.

// before expansion
#[derive(FieldOffsets)
struct Foo {
    first: u32,
    pub second: Vec<Bar>,
    pub(crate) third: (u8, bool, usize),
}

//after expansion
impl Foo {
    const FOO_FIRST_OFFSET: usize = 0;
    pub const FOO_SECOND_OFFSET: usize = 8; // for example
    pub(crate) const FOO_THIRD_OFFSET: usize = 32; // for example
}

For better ergonomics, the constants may be accessible via some expression macro offset_of!, which would expand to the relevant constant name.

A derive macro trivially sidesteps the issues of const evaluation and lack of runtime overhead (since the values of constants are inlined by the compiler).
It easily deals with unsupported types of data (enums etc), and can more easily support them in the future (since it has full information about the type definition).
It automatically mirrors the privacy rules for type and fields.
It can transparently handle any other requirements on the type, e.g a specific #[repr] (just check for desired attributes).
It could even provide more granular control to the representation of the type: if the author doesn't provide the #[derive(FieldOffsets)], then the consumer code cannot rely on the specific offsets.
This information can also be more granularly supported for specific fields via derive macro attributes (like #[offset_of(ignore)] or #[offset_of(pub(crate))] pub field: T.

This implementation is also very hard to provide in a library crate, which provides a stronger argument for including it into the core.

vi reacted with thumbs up emoji All reactions

I've always wanted this feature, but I'm wondering what the rational would be for not supporting ?Sized fields from the beginning.

Member

Author

thomcc commented 9 days ago •

edited

@BurntSushi:

I do wonder if the lang team should be roped into this

Yeah, I didn't know what to tag this as. I'm certainly not opposed to roping in t-lang, and added the label.

@Lokathor:

this info isn't some goofy library calculation, this info is directly from the compiler's data structures, guaranteed accurate, guaranteed no UB, guaranteed no runtime cost

The intention is that's how this would be implemented (that's what the bit at the end of the reference is intended to mean), but "no runtime cost" is kind of difficult to define in a way that still works in implementations that don't make as firm of a distinction between compile time and runtime.

Anyway, this felt likely to bog down the implementation under bikeshedding as to what this means, and in practice "evaluates to a constant expression" should be sufficient.

I also don't want to forbid a pure-library implementation, although I believe implementing it as a builtin macro is likely to be better for several reasons (error messages, compile-time performance, reliability, ...).

@mikeleany:

I've always wanted this feature, but I'm wondering what the rational would be for not supporting ?Sized fields from the beginning.

Hm, largely because it prevents a pure library implementation. That said, I don't really care about that, and perhaps with enough cleverness it would still be possible.

Member

Author

thomcc commented 9 days ago

@afetisov (your comments are long enough to get a separate reply)

I think the RFC should have some motivation for allowing offset_of! on non-#[repr(C)] types. "memoffset does it" isn't sufficient justification, since memoffset has no way to enforce the layout requirements on the supplied types.

The use cases for this are largely the same as the use cases for using offset_of! on repr(C) types -- {de,}serialization, debugging support code, FFI, and data structures -- in all of these it may be undesirable or impractical to force all types to use repr(C).

More generally, it is a goal of this RFC to remove the reasons people have to implement offset_of! manually (either explicitly, or often in an ad-hoc manner). If core::mem::offset_of! lacks functionality that you could implement yourself, then this adds more motivation to not use it.

One important thing is missing from the reference: the layout of #[repr(rust)] data is unspecified, but is it even true that the layout is fixed?

In several ways it is already fixed at compile time (size and alignment are, which limits the size of the overall structure). In other ways, it is fixed at runtime. Allowing use in const evaluation prevents an implementation from doing things like randomizing field offsets at startup, unless it also performs modifications to adjust offset_of! values (and calculations derived from them).

My suspicion is that many implementations which can randomize field offsets at startup can recompute constant values (because they're likely interpreted), but it's plausible that there's some middle-ground.

Either way, this is more or less what I'm getting at by the first item under "Drawbacks" -- previously this information was only available at runtime, and this allows accessing it at compile time.

I can totally imagine a compiler optimization that e.g. uses different layout for different local variables of the same type, provided they do not "escape" in some sense (e.g. a type T is private and no explicit pointers to T are created)

This kind of optimization (and more aggressive ones, like SROA) would still be allowed in exactly the same cases they are currently allowed. If a pointer to an instance of a structure does not escape, the offsets of its field do not matter. This is true regardless of the #[repr] of the type, and is not hindered or changed by offset_of! at all, as that information can already be computed using address arithmetic on pointers to the structure and field.

Being a macro, offset_of! also can't distinguish between structs and enums, or between fields and arbitrary garbage. I'm afraid this will lead to some very poor post-monomorphization errors

Are you sure? From my experience with the code in rustc_builtin_macros and from conversations with compiler contributors (which I had when preparing this RFC), none of this should be a problem.

Perhaps someone familiar with compiler internals cares to weigh in?

One possibility which isn't considered in the RFC is to use special constants instead of an expression macro

I'll add it to the RFC as an alternative, but I am not a fan of macros that expand to items not present in the source (you'll note that none of the existing builtin #[derive()]s do this); I find it results in poor tooling experience, and is generally confusing. Additionally:

<a derive macro> can more easily support them in the future (since it has full information about the type definition).

I don't think there's any reason a derive macro would have more information here than is available to a builtin macro. They're essentially the same thing, and have essentially the same limitations.

This implementation is also very hard to provide in a library crate, which provides a stronger argument for including it into the core.

I don't think this is true -- if core::mem::offset_of! is added, I think it would be quite straightforward to implement a #[derive(FieldOffsets)] equivalent to what you describe as a procedural macro library... What would be difficult about it?

I'll try to update drawbacks/alternatives/motivation with this feedback.

madsmtm reacted with thumbs up emoji All reactions

is it even true that the layout is fixed? I can totally imagine a compiler optimization that e.g. uses different layout for different local variables of the same type, provided they do not "escape" in some sense

Yes, the layouts are fixed. However, the as-if rule applies anyway. If the compiler can prove that the user can't tell the difference either way, it's always free to do whatever it wants as long as the behavior of the program is correct.

Being a macro, offset_of! also can't distinguish between structs and enums, or between fields and arbitrary garbage. I'm afraid this will lead to some very poor post-monomorphization errors. If offset_of! calls are embedded in other macros, the errors can be super cryptic.

offset_of! is less a macro and more a compiler built-in. The compiler certainly can distinguish between structs and enums perfectly well, it does this already when it allows you to write things like MyStruct { foo: 5 }. This macro would use much of the same logic. There also means that we would reject things like fn foo<T>() { offset_of!(T, field); } and so there is no possibility of post-mono errors.

thomcc, Nilstrieb, and Kixiron reacted with heart emoji All reactions

afetisov commented 9 days ago

If a pointer to an instance of a structure does not escape, the offsets of its field do not matter.

The pointer may not be computed explicitly, but perhaps there is still some roundabout way to get it via unsafe code. Trying to bound the behaviour of unsafe code hits Rice's theorem fast, and I can't imagine what would a reasonable definition of UB for offset_of! look like. "It is UB to access fields via this offset without explicitly using *mut T at some point"? That doesn't sound sensible.

Unless the reference explicitly guarantees that a type is always represented in the same way in memory, I would feel very uneasy about exposing offset_of! for unspecified layouts. It's also a semver footgun, but that's already mentioned in RFC.

that information can already be computed using address arithmetic on pointers to the structure and field.

If the layout stability within a single artifact is not guaranteed, then all such calculations are borderline UB (their results can not be used for any unsafe operations in any nontrivial case).

Yes, the layouts are fixed.

Thank you, but that doesn't carry much weight unless it passes an RFC and is specified in the Reference. For all I know, that's just the current implementation restriction.

The use cases for this are largely the same as the use cases for using offset_of! on repr(C) types -- {de,}serialization, debugging support code, FFI, and data structures -- in all of these it may be undesirable or impractical to force all types to use repr(C).

I would like to see specific use cases. I can't imagine how field offsets would help with (de)serialization of types with unspecified layouts, you certainly can't just transmute a slice of incoming data. Nor can I imagine how layout-unstable types could possibly be used in FFI, except as opaque types with no access to inner fields.

What would be difficult about it?

Well, that depends on the implementation and power of offset_of!. For example, it may use something which make the call UB in some cases, or it may be unusable in const contexts at the moment. But I guess being a builtin macro could grant it the necessary special powers. I don't know how complex a builtin macro can be.

Contributor

Lokathor commented 9 days ago

I would like to just put in the standard reminder that the Reference is a best effort documentation of how the compiler/language currently works, it's not actually an authority. If the reference and the compiler conflict about a point, it's just as likely that the reference will be updated as the compiler will be.

saethlin, madsmtm, and avl reacted with thumbs up emojiafetisov reacted with confused emoji All reactions

JakobDegen commented 9 days ago •

edited

The pointer may not be computed explicitly, but perhaps there is still some roundabout way to get it via unsafe code. Trying to bound the behaviour of unsafe code hits Rice's theorem fast

Fortunately, the language doesn't have to worry about this. People writing compiler optimizations will be responsible for proving that the results of the optimization are not observable.

Thank you, but that doesn't carry much weight unless it passes an RFC and is specified in the Reference.

I don't really know how to respond to this besides saying that this is what it means for a type to have a layout. If the layout is not fixed, then it is not a property of the type, but a property of some other thing

Contributor

Lokathor commented 9 days ago

Nor can I imagine how layout-unstable types could possibly be used in FFI, except as opaque types with no access to inner fields.

I got one: With the GPU you often have to send structured data and the way you do it is basically sending a bunch of bytes along with a description of what the fields of each field within each struct is. Details vary by API so I don't want to get into it too heavily, but you'd tell the API something like "the 0th field is an [f32;2] at offset 0, the 1th field is an [f32;3] at offset 8" or whatever, and then the GPU would read your buffer that way.

thomcc, afetisov, ChayimFriedman2, riking, madsmtm, and avl reacted with thumbs up emoji All reactions

afetisov commented 9 days ago

I don't really know how to respond to this besides saying that this is what it means for a type to have a layout.

I guess my question can be rephrased as "is it guaranteed that #[repr(rust)] types have a layout?" The reference says that no guarantees on the default layout are provided, and no information besides size and alignment are explicitly exposed. "No guarantees on layout" may well be interpreted as "no specific layout at all".

CryZe reacted with thumbs up emojirodrimati1992, Kixiron, and Nilstrieb reacted with confused emoji All reactions

Member

Kixiron commented 9 days ago

For something to have an unspecified layout it by definition must have a layout

CryZe commented 9 days ago •

edited

For something to have an unspecified layout it by definition must have a layout

Well of course it has a layout. But the question is if there might be situations in the future where the compiler might want to optimize the layout based on the particular case where the variable is used, just like how a struct might never be on the stack or heap at all and instead be represented entirely in registers. Similar optimizations could be possible where the struct is sorted in a particular way in one function and differently in another. Stabilizing this macro for repr(Rust) would prohibit such optimizations.

This macro is already writable with stable Rust, with 2 library implementations of varying QoL. A compiler that wishes to reorder structs (in a way as described above) must already ensure that such reordering does not impact the current library implementations. (This RFC, as written, allows a library-only implementation, even if that might not be the preferred route).

CryZe commented 9 days ago

You certainly can write the macro in stable Rust, but you still need unsafe to actually use that offset. So there's still a possibility that this offset is actually unsound to use then on a repr(Rust) type. However it's possibly indeed too late to allow such an optimization as people are probably relying on it enough already to break that.

Contributor

Diggsey commented 9 days ago

You certainly can write the macro in stable Rust, but you still need unsafe to actually use that offset. So there's still a possibility that this offset is actually unsound to use then on a repr(Rust) type. However it's possibly indeed too late to allow such an optimization as people are probably relying on it enough already to break that.

I've had a crate which wraps the concept of a field offset in a safe API which has existed for 6 years

https://docs.rs/field-offset/latest/field_offset/index.html

Member

RalfJung commented 8 days ago

For serialization, consider formats that describe what offset is used for each field of the record in a payload, etc.

This assumes that everything we need to know about struct layout can be described by size, alignment, and field offsets.

I agree that should be the case, but if that is an intended usecase of the RFC, it should be stated explicitly.

thomcc reacted with thumbs up emoji All reactions

Member

programmerjake commented 8 days ago

The unsized field may not have a fixed layout, but it often can have a fixed offset, thus offset_of! makes sense to have.
e.g.:

Often but not always, e.g. in (i32, dyn Send), the offset of the 2nd field is not fixed.

since the offset only depends (or can be made to only depend) on the other Sized fields and the alignment of the ?Sized field, it should be fine to use offset_of! on any field with type T: ?Sized + KnownAlignment where the alignment is known at monomorphization time. All slices and str would implement KnownAlignment because their alignment is the alignment of the element type, which is Sized. dyn types would have to be dyn MyTrait + KnownAlignment to be usable with offset_of!.

Member

programmerjake commented 8 days ago

perhaps Aligned is a better name than KnownAlignment. for backward compatibility, Sized: Aligned and T: ?Sized implies T: ?Sized + ?Aligned

Member

Author

thomcc commented 8 days ago

Something like KnownAlignment/Aligned is overkill -- align_of_val has the exact same issue here (not really needed for slices and types with trailing slices) and we get by without something like that. If we want to add that, it would be a totally separate thing. Anyway, I think the consistent fix for that would be an offset_of_val!, analogous to size_of_val or align_of_val.

That said, I suspect offset_of_val! is niche enough that it should be deferred to future work (I'll add it to that section when I update the RFC, hopefully later today).

Member

RalfJung commented 8 days ago •

edited

dyn types would have to be dyn MyTrait + KnownAlignment to be usable with offset_of!.

That would not work -- dyn MyTrait + KnownAlignment implies that the dynamically chosen type has known alignment, but we still don't know that alignment from the type alone. (Dynamically we already know the alignment, via align_of_val.)

@thomcc I feel like it makes sense to support asking for offsets of all the sized fields on an unsized type -- if we want to commit to those always being statically known.

Member

programmerjake commented 8 days ago

dyn types would have to be dyn MyTrait + KnownAlignment to be usable with offset_of!.

That would not work -- dyn MyTrait + KnownAlignment implies that the dynamically chosen type has known alignment, but we still don't know that alignment from the type alone. (Dynamically we already know the alignment, via align_of_val.)

ah, yeah. dyn Trait + Aligned<ALIGNMENT = N> would work though, for const generic N (assuming const generic stuff is implemented enough yet).

text/0000-offset_of.md

		Note that while this example shows a combination that supports array indexing,
		it's unclear if this is actually desirable for Rust.

		## `memoffset::span_of!` Functionality

Isn't the span easily computable using offset_of! and size_of and maybe align_of (if we include padding) or would this be something more subtle?

All reactions

It is if you have a type_of or so to compute the type of the field.

All reactions

Ah, I hadn't considered the need for that. Maybe it's a reasonable future addition.

All reactions

Member

Author

thomcc 8 days ago •

edited

It turns out you can do it, it's just hacky and non-obvious. Here's a version implemented using just memoffset::offset_of! https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3f8a716592b64b551364105a3087b955. This is not really robust but you get the idea.

(Somewhat inspired by some code @danielhenrymantilla shared for a C-style sizeof that supports passing in a value rather than a type)

All reactions

text/0000-offset_of.md

		That is, instead of `offset_of!(Foo, quank.zoop.2.quank[4])`, you'll have to
		compute the offsets of each step manually, and sum them.

		4. Finally, types other than tuples, structs, and unions are currently

What about arrays? It's pretty easy to compute, but it seems like it would make sense to allow something like offset_of!([T; N], 2) as a shorthand for 2 times the size of T rounded up to the alignment of T.

teor2345 reacted with thumbs up emoji All reactions

Size is always a multiple of alignment, so this is just 2 * size_of::<T>(). I suppose we could allow it for consistency, but I don't really see a good motivation tbh

All reactions

Member

Author

thomcc 6 days ago

This isn't a field so I'm really not a fan of allowing this. I kind of hint at it in the nested fields part of the future work, though. I think without nested field access, it's not worth the inconsistency that it introduces.

All reactions

text/0000-offset_of.md

		1. This exposes layout information at compile time which is otherwise not
		exposed until runtime. This can cause compatibility hazards similar to
		`mem::size_of` or `mem::align_of`, but plausibly greater as it provides even
		more information.

Note that with rust-lang/rust#96240 (const_ptr_offset_from) being merged for 1.65, this will already be available in a const context - it was the only feature gate left for memoffset::offset_of!() to be able to be const (as long as the type doesn't have UnsafeCell somewhere, that's const_refs_to_cell).

All reactions

afetisov commented 7 days ago •

edited

A couple extra objections against exposing offsets for #[repr(Rust)] types (and any other types with underdefined layout).

Although the layout of #[repr(Rust)] types is undefined, exposing offsets means that the layout is now an observable behaviour. This means that it would de facto become fixed and relied upon by end-users, and could no longer be changed in the future. This runs counter to the goal of no ABI stability for default layout, since it would prevent further layout optimizations.
In particular, currently the layout is not guaranteed to be stable even between different compiler runs of the same compiler on the same codebase. But that means that exposing offsets for default layout make possible horrible nonreproducible bugs: the runtime behaviour may branch on the values of offsets, thus successively compiled binaries will behave in entirely different and non-reproducible way. Paired with optimizations, that's as close as it gets to true Undefined Behaviour, which just shouldn't be possible in safe Rust.
As a consequence of the above, if offset_of! is const, then the crate may fail to compile on different compiler invocations (it's easy to write either compile-time assertions or types which fail to be well-formed based on the values of these constants). Nondeterministic compilation errors are unprecedented in current Rust and most if not all programming languages. They also make any good diagnostics impossible.
Let's say the compiler randomizes the layouts based on some randomness, which can be provided as explicit seed during the compilation. Although an obscure hack, this will eliminate the nondeterministic errors above. However, this would make the compiler behaviour unreproducible by any tool other than the specific compiler version itself. For example, IDEs need to be able to do constant evaluation, for their diagnostics and refactorings. However, since the layout algorithm is an internal, potentially very complex compiler detail which may change between versions, it cannot be reproduced in the IDE const evaluation engine. This will make any non-rustc tools fundamentally and unrepairably broken on codebases which do complex offset_of! tricks. Since those tricks can be wrapped in crates and easily distributed, this will likely make them broken on any sufficiently complex real-world codebase.

TL;DR: exposing offsets will effectively ossify the layouts of types, thus it must not happen on types with not fully defined layouts. Extending offset_of! to new types and layouts must be preceded by an RFC which fully stabilizes the corresponding layout.

@afetisov none of those things don't apply just as much to size_of. Is your claim that providing that was a mistake too?

exposing offsets means that the layout is now an observable behaviour

This is already the case. memoffset and such crates already exist today. You can cast pointers to integers to observe the offset of various fields.

But that means that exposing offsets for default layout make possible horrible nonreproducible bugs

This is not really the case in practice; the compiler does not gratuitously change layouts just because it can. That's not to say that this doesn't happen at all of course, but executing the same build twice won't just randomly change things. (and in any case, there is still a requirement for reproducible builds in general)

as a consequence of the above, if offset_of! is const, then the crate may fail to compile on different compiler invocations

This is already the case today with size_of. There are many crates (including lots of rustc_* crates) that const_assert!(size_of::<SomeType>() == 24) or whatever. We have been willing to break those in the past, and there is no reason that that would change.

MatrixDev and avl reacted with thumbs up emoji All reactions

ojeda commented 7 days ago •

edited

Although the layout of #[repr(Rust)] types is undefined, exposing offsets means that the layout is now an observable behaviour. This means that it would de facto become fixed and relied upon by end-users, and could no longer be changed in the future. This runs counter to the goal of no ABI stability for default layout, since it would prevent further layout optimizations.

The offsets can still change from one build to the next (as you mention in the second point), so they shouldn't be relied upon by end users.

But that is fine, because there are use cases where one needs to know the offsets but the particular layout does not need to be predictable (in fact, sometimes it is best to not have a predictable layout at all, not just for debugging, and a flag like -Zrandomize-layout could accomplish that).

But that means that exposing offsets for default layout make possible horrible nonreproducible bugs: the runtime behaviour may branch on the values of offsets, thus successively compiled binaries will behave in entirely different and non-reproducible way.

Even if -Zrandomize-layout happened to be the default, it would probably be fine and other flags could provide more control if needed (e.g. -Zlayout-seed).

Paired with optimizations, that's as close as it gets to true Undefined Behaviour, which just shouldn't be possible in safe Rust.

No, UB is different (and worse). If anything, you could compare it with using compile-time random numbers, or proc macros that depend on external events with unpredictable timing or race conditions, etc., and those are all fine in safe Rust (that is not to say reproducible builds should be broken without good reason, of course).

MatrixDev and avl reacted with thumbs up emoji All reactions

afetisov commented 6 days ago

none of those things don't apply just as much to size_of. Is your claim that providing that was a mistake too?

Maybe. But it doesn't matter since it's already a stable API. That doesn't mean that constraining the layouts further is a good idea. At the very least it's a kind of change which requires a separate RFC, and shouldn't be smuggled with an ostensible library-level change.

It's also quite different, because without having size_of on #[repr(Rust)] types it is impossible to work with arrays of such types, so there is a very strong reason to expose that information. I also find it hard to imagine where someone would compare sizes of different types, other than if they want to do a transmutation between them. But transmutation of #[repr(Rust)] types is already undefined behaviour and likely invalid.

I find it much more likely that people will write code which depends on specific field ordering.

On the other hand, I have yet to see an explicit example in code where it would be warranted to deal with offsets in #[repr(Rust)] types. The only example provided by @Lokathor was passing structs with field offsets to the GPU, which is probably valid, but also it seems like there is nothing preventing the use of #[repr(C)] structs for that use case. What is the case where using #[repr(C)] is impossible?

This is already the case. memoffset and such crates already exist today. You can cast pointers to integers to observe the offset of various fields.

This is about as convincing as "I can already read the contents of uninitialized memory, watch: MaybeUninit::uninit().assume_init()". memoffset goes beyond what is promised by the language, and using the results for anything other than debug logging is likely UB. Rust has a higher standard of quality in its standard library than "this API kinda sorta works sometimes if you don't breath on it the wrong way".

This is not really the case in practice; the compiler does not gratuitously change layouts just because it can. That's not to say that this doesn't happen at all of course, but executing the same build twice won't just randomly change things. (and in any case, there is still a requirement for reproducible builds in general)

None of that is relevant. The last thing I would want is for the build to crash or for the program to misbehave just because the compiler was updated. Rust goes to great lengths to ensure that toolchain upgrades are as safe and seamless as possible, and that change would be directly undermining those guarantees.

The current RFC is way too cavalier with the analysis of drawbacks and alternatives for types which are not #[repr(C)]. At the very least it needs a strong motivation for allowing such types, and a comprehensive analysis of all consequences, including sporadic build and runtime errors, of that behaviour. If the layouts produced by compiler are expected to be stable except for a number of known cases, then that policy should be explicitly specified, so that it can be meaningfully analyzed and not smuggled with other changes.

Also, I don't think anyone has an issue with adding offset_of! purely for types with fully defined layout. This RFC could also be scoped down to the unambiguous case, such as #[repr(C)] or #[repr(simd)], so that at least it gets the benefit of proper compiler support, without relying on possible UB and missed optimizations as memoffset does.

Member

RalfJung commented 6 days ago •

edited

memoffset does not rely on UB, it is fully well-defined (on sufficiently recent rustc). Please don't spread FUD. :)

In fact it does not even "rely on" implementation-specific behavior, it merely exposes implementation-specific behavior in a convenient way. It is possible to write code using memoffset that is correct for any implementation, and in fact, that is its main motivation. Of course people can also screw up their pointer arithmetic, but memoffset makes it harder to screw up by providing a convenient API.

Lokathor, Nilstrieb, SabrinaJewson, rodrimati1992, MatrixDev, coolreader18, RustyYato, thomcc, avl, and fee1-dead reacted with heart emoji All reactions

I don't think that allowing it to be used on repr(Rust) makes it harder to change the layout.
As we all know, the layout of repr(Rust) is unspecified. Using offset_of on it doesn't change the fact. I get back an offset and can use this on my pointers to offset right into the fields - and I'm guaranteed that it will work no matter what kind of layout the compiler chooses. Even better, there might (I am not aware of any) be cases where someone has to actually rely on the layout since they have no other choice - offset_of would help here by allowing them to take offsets that won't break when the layout is changed.
People can (and, for all I know, probably will) add assertions or other funky things depending on the offset, but here we have the same case as size_of, and if you do that it's your fault if it breaks, and I don't think that's a consideration we should have.
While I don't see a huge advantage of allowing it for repr(Rust) since I've personally never had a use for it, I also don't see a disadvantage. Even allowing it just for the sake of consistency sounds good to me.

Member

programmerjake commented 6 days ago

imho an advantage of allowing offset_of! for repr(Rust) is to allow you to create a DIY member pointer library that works on all structs, not just repr(C) structs:

#[derive(Copy, Clone)]
#[repr(transparent)]
pub struct FieldPtr<S, F>(usize, PhantomData<(fn(&S) -> &F, fn(&mut S) -> &mut F)>);

impl<S, F> FieldPtr<S, F> {
    pub const unsafe fn from_offset(offset: usize, _get: fn(&S) -> &F, _get_mut: fn(&mut S) -> &mut F) -> Self {
        Self(offset, PhantomData)
    }
    pub const fn get(self, s: &S) -> &F {
        unsafe { &*((s as *const S as *const u8).add(self.0) as *const F) }
    }
    pub const fn get_mut(self, s: &mut S) -> &mut F {
        unsafe { &mut *((s as *mut S as *mut u8).add(self.0) as *mut F) }
    }
}

macro_rules! field {
    ($S:ty, $f:tt) => {
        {
            let offset = offset_of!($S, $f);
            let get = |s: &$S| (s.$f);
            let get_mut = |s: &mut $S| (s.$f);
            unsafe { FieldPtr::from_offset(offset, get, get_mut) }
        }
    };
}

CryZe commented 6 days ago

member pointer library that works on all structs, not just repr(C) structs:

That just sounds to me like Repr<"C"> or so should be a trait.

Member

programmerjake commented 6 days ago

member pointer library that works on all structs, not just repr(C) structs:

That just sounds to me like Repr<"C"> or so should be a trait.

but why limit it to repr(C)? field pointers should work for any struct layout...

Member

Author

thomcc commented 6 days ago

Prior art for @programmerjake's example: docs.rs/field-offset

While it's a bit different, I've mentioned this in prior art and rationale/alternatives. I agree that field projection / ptr-to-member libraries are a good example of stuff you can build on top of offset_of! if it has #[repr(Rust)] support.

Member

Kixiron commented 5 days ago

Yah, another strong argument for allowing any repr is for enabling users to implement something like project_uninit!() that allows projecting a MaybeUninit<T> to its fields

Member

programmerjake commented 5 days ago

Yah, another strong argument for allowing any repr is for enabling users to implement something like project_uninit!() that allows projecting a MaybeUninit<T> to its fields

that's not necessarily sound, because a MaybeUninit of a struct can have a different layout than a struct with MaybeUninit fields:
https://rust.godbolt.org/z/c9sz4Y73x
with -Z randomize-layout, it outputs:

[/app/example.rs:21] (v.a.assume_init(), v.b.assume_init()) = (
    1,
    2,
)
[/app/example.rs:22] (v2.a, v2.b) = (
    2,
    1,
)

Notice how transmuting swaps the field values? That's because they have different layout.

use std::mem::MaybeUninit;

#[derive(Copy, Clone)]
pub struct MyStruct {
    pub a: u32,
    pub b: u32,
}

#[derive(Copy, Clone)]
pub struct MaybeUninitMyStruct {
    pub a: MaybeUninit<u32>,
    pub b: MaybeUninit<u32>,
}

pub fn main() {
    let v = MaybeUninitMyStruct {
        a: MaybeUninit::new(1),
        b: MaybeUninit::new(2),
    };
    let v2: MyStruct = unsafe { std::mem::transmute(v) };
    unsafe { dbg!((v.a.assume_init(), v.b.assume_init())) };
    dbg!((v2.a, v2.b));
}

rodrimati1992 reacted with confused emoji All reactions

comex commented 5 days ago

Yah, another strong argument for allowing any repr is for enabling users to implement something like project_uninit!() that allows projecting a MaybeUninit<T> to its fields

that's not necessarily sound, because a MaybeUninit of a struct can have a different layout than a struct with MaybeUninit fields: https://rust.godbolt.org/z/c9sz4Y73x with -Z randomize-layout, it outputs:

Does project_uninit! actually reinterpret the struct as a struct of MaybeUninit fields? It doesn't look like it does. It looks like it just obtains pointers to one or more fields and converts those to references to MaybeUninit. If it did what you're saying, it wouldn't need offset_of.

Member

Kixiron commented 2 days ago •

edited

that's not necessarily sound, because a MaybeUninit of a struct can have a different layout than a struct with MaybeUninit fields

~~As the one who wrote -Z randomize-layout, this is a bug. MaybeUninit is transparent and should have an identical layout to T.~~

Actually, I have no idea what your example is trying to show, that's totally expected behavior and has nothing to do with uninit projection.
In case you misunderstood me, this projection function would take an &mut MaybeUninit<T> and return an &mut MaybeUninit<FieldOfT>

Member

programmerjake commented 2 days ago

In case you misunderstood me, this projection function would take an &mut MaybeUninit<T> and return an &mut MaybeUninit<FieldOfT>

ah, yeah, i misunderstood what you meant.

Kixiron reacted with thumbs up emoji All reactions

Assignees

No one assigned

Labels

T-lang Relevant to the language subteam, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

None yet

19 participants

Add `core::mem::offset_of!` RFC by thomcc · Pull Request #3308 · rust-lang/rfcs...

Add core::mem::offset_of! RFC #3308

Conversation

thomcc commented 10 days ago •

thomcc commented 10 days ago

BurntSushi commented 9 days ago •

Lokathor commented 9 days ago

afetisov commented 9 days ago •

thomcc commented 9 days ago •

thomcc commented 9 days ago

afetisov commented 9 days ago

Lokathor commented 9 days ago

JakobDegen commented 9 days ago •

Lokathor commented 9 days ago

afetisov commented 9 days ago

Kixiron commented 9 days ago

CryZe commented 9 days ago •

CryZe commented 9 days ago

Diggsey commented 9 days ago

RalfJung commented 8 days ago

programmerjake commented 8 days ago

programmerjake commented 8 days ago

thomcc commented 8 days ago

RalfJung commented 8 days ago •

programmerjake commented 8 days ago

thomcc 8 days ago •

afetisov commented 7 days ago •

ojeda commented 7 days ago •

afetisov commented 6 days ago

RalfJung commented 6 days ago •

programmerjake commented 6 days ago

CryZe commented 6 days ago

programmerjake commented 6 days ago

thomcc commented 6 days ago

Kixiron commented 5 days ago

programmerjake commented 5 days ago

comex commented 5 days ago

Kixiron commented 2 days ago •

programmerjake commented 2 days ago

Recommend

About Joyk

Add `core::mem::offset_of!` RFC #3308