Github Add language support for C-compatible bit-fields by mahkoh · Pull Request...
source link: https://github.com/rust-lang/rfcs/pull/3064
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Add language support for C-compatible bit-fields #3064
Conversation
This RFC adds support for bit-fields in repr(C)
structs by
- introducing a new attribute
bitfield(N)
that can be applied to integer
fields, - allowing such annotated fields to be unnamed.
changed the title Add language support for C-compatible bitfields
Add language support for C-compatible bit-fields
Contributor
Author
mahkoh commented 4 days ago
This is a rewrite of #1449 with concerns addressed.
In particular:
- The overflow behavior is now fully specified.
- The bit-field width must now be a literal.
- Bit-fields are now restricted to
repr(C)
structs and their layout is defined in terms of the corresponding layout used by Clang.
On top of the features in the previous RFC, this RFC adds syntax for unnamed bit-fields. This is necessary because such fields have different effects on the layout than named bit-fields.
Other questions from the other RFC:
-
Why not use some kind of code generation?
Any code generation needs to have access to the types of all fields in the struct. These types can usually only be determined during compilation. See the last motivating example in the RFC.
-
Why not a more general approach that allows us to specify arbitrary struct layout?
The problem in the first place is that we do not know the layout until during compilation.
-
Is this really needed? I thought C bitfields were an esoteric feature nobody uses.
There are over 400 bit-fields in the Linux kernel user-space API.
A good future extension would be allowing bitfield widths and/or types to be generic parameters
Member
Kixiron commented 3 days ago
What value do the bits in an unnamed field hold? Will they have an undefined value (and hence be ub to read in any aspect, including casting/transmuting from the struct as a whole), a guaranteed value (0 I'd guess?) or simply an unspecified value (allowing reading the field without undefined behavior but still not allowing the actual value to be relied on)?
Contributor
Lokathor commented 3 days ago
I would hope they have simple "unspecified value". AKA "they can be whatever, no niche there". This lets bitfield types be transmuted to and from plain integer types with far less worry.
Member
Kixiron commented 3 days ago
I most definitely agree, I was just bringing it up so it gets added to the rfc
Outdated
A field named `_` is called an *unnamed* field. All other fields are called
*named* fields. Each named field annotated with `bitfield(N)` occupies `N` bits
of storage.
of storage. Unnamed fields do not occupy any storage.
Comment on lines
193 to 195
Kixiron 3 days ago
Member
This doesn't really clarify anything, the term "storage" is really ambiguous and the wording of "unnamed fields do not occupy any storage" implies that unnamed fields don't take up any space whatsoever, which isn't correct
mahkoh 3 days ago •
Author
Contributor
implies that unnamed fields don't take up any space whatsoever, which isn't correct
That is the intended reading. Unnamed fields only affect the layout but are not themselves members of the struct. See also the equivalent notation for unnamed fields using attributes in the RFC.
In C++, unnamed fields are described as "not being members" of their surrounding struct. In C the language is not as simple but this difference is hopefully irrelevant in practice.
mahkoh 3 days ago •
Author
Contributor
I do not know what you mean by "the space exists". Space between fields in structs "exists" but the behavior of that space is not defined in great detail as far as I can tell.
Lokathor 3 days ago
Contributor
Right, that's the problem. You have to say what's going on with that. Even if C doesn't, in Rust we care about those sorts of things.
mahkoh 3 days ago
Author
Contributor
This issue seems to be the most relevant: rust-lang/unsafe-code-guidelines#156
Lokathor 3 days ago
Contributor
You don't have to fix all of uninit memory for rust, you just have to say something about the spare bits in this one case.
It can be "the bits are unspecified" (my pick), or "the bits are niche", or "the bits are always zero", or whatever, but the RFC shouldn't leave this unanswered.
mahkoh 3 days ago •
Author
Contributor
If you get such language merged into the language reference for padding bits in general, then I will refer to it. Otherwise I do not see why this RFC should document the state of padding bits of
#[repr(C)] struct X { a: u8, #[bitfield(8)] _: u8, b: u16, }
when the state of padding bits of
#[repr(C)] struct X { a: u8, b: u16, }
is not documented.
Kixiron 3 days ago
Member
Yah, this RFC is in a position in which it can entirely bypass the issues of previous code with uninit memory since it can simply declare what the state of "padding bits" should be, which should almost definitely be unspecified
Contributor
Lokathor commented 3 days ago
Future Extension:
- bitfields that are enums with an int repr
- bitfields that are repr(transparent) over an int type
let X { a, b, d } = x;
```
Just like in C, you cannot take the address of a bit-field:
joshtriplett 3 days ago
Member
We could, potentially, allow taking the address of a bitfield if and only if it's aligned to a byte boundary. But I don't think we should. So, this seems fine.
(resp. 0-extended) to the size of the type of the field. (1-extended means
that the new bits will have the value of the most significant bit. In
particular, bit-fields with signed types with width 1 can only store the
values `0` and `-1`.)
Outdated
A field named `_` is called an *unnamed* field. All other fields are called
*named* fields. Each named field annotated with `bitfield(N)` occupies `N` bits
of storage. Unnamed fields do not occupy any storage.
joshtriplett 3 days ago
Member
I agree that this needs clarification.
Kixiron 3 days ago
Member
I'm wary of the "in every way like padding bits" phrasing because it brings to mind undefined padding bytes (a la this comment)
mahkoh 3 days ago
Author
Contributor
I've rewritten the reference section to make it clear that _
bit-fields are not fields.
Outdated
If the width of the bit-field is `0`, then the name of the field must be `_`.
A field named `_` is called an *unnamed* field. All other fields are called
joshtriplett 3 days ago •
Member
Please note that we already have a concept of unnamed fields of struct or union type, which have a different semantic. For clarity, I'd suggest:
Or, for that matter, perhaps directly call it a padding bit-field.
Outdated
struct are possible unless all accesses are immutable. Therefore, unlike in C,
the compiler can implement access to bit-fields with any kind of load/write
instruction, even if such an instruction overlaps the memory locations of other
fields in the struct.
joshtriplett 3 days ago •
Member
I'd prefer not to leave the behavior quite this unspecified. I would prefer to have it explicitly specified that a store to a bitfield is permitted to be combined with other stores to bitfields or other fields in a single operation, but that the operation will not be performed via a wider write than a single machine word that affects fields other than those being written, unless the entire structure is actually being written.
People do use bitfields for flags within larger structures, where other parts of the structure might be concurrently accessed while under a lock, and they assume that a write to one of those bitfields will not overwrite a lock-protected value. We should be precise enough that people know how they can avoid that.
mahkoh 3 days ago •
Author
Contributor
People do use bitfields for flags within larger structures, where other parts of the structure might be concurrently accessed while under a lock
You're right, I didn't take into account that other fields might have interior mutability. I'll replace this by the usual language:
The memory locations of two different fields within a struct do not overlap if
- at least one of them is not a bit-field or
- they are separated, in declaration order, by at least one field
- that is not a bit-field or
- that is a bit-field with width 0.
"Memory location" is the language used by C to determine when two unsynchronized concurrent accesses are well-defined.
joshtriplett 3 days ago •
Member
That sounds reasonable to me, assuming that it includes associated text explaining what can and can't be done in the same memory operation.
mahkoh 3 days ago
Author
Contributor
I've chosen a somewhat different formulation to avoid pulling in the C memory model:
When a bit-field field is accessed, the abstract machine may also access
adjacent bit-field fields but not fields that are separated from the field by a
StructBodyElement that is not a bit-field field. (Note: This paragraph restricts
the kinds of loads and stores the compiler can perform when accessing a
bit-field. This paragraph does not need to be specially advertised to users as
the inability to take references to bit-field fields makes it impossible to
access adjacent bit-field fields in otherwise sound code.)
- When the assignment happens, the compiler writes the bits on the rhs to
the correct positions in the struct.
This is a more structured solution but would need to be fleshed out.
joshtriplett 3 days ago
Member
I agree both that this might be a good long-term solution (and bit-sized types would be useful for other purposes), and that this should not block the implementation of bitfields.
If, in the future, we add bit-sized types, we can specify how to put them in a repr(C)
struct and have them behave like a bitfield. That should not stop us from providing the most straightforward implementation of bitfields without introducing bit-sized types.
# Unresolved questions
[unresolved-questions]: #unresolved-questions
- On Windows, Clang and GCC produce different layouts for packed structs:
joshtriplett 3 days ago
Member
We should specify, explicitly, that the behavior on MSVC targets must match that of MSVC.
(We don't, today, need to provide a means of controlling the ms_struct
flag. We may need to eventually, though.)
mahkoh 3 days ago
Author
Contributor
@retep998 Please give your opinion on which language, if any, should be added to this RFC and which language should be included in the reference.
Outdated
Nevertheless, such fields appear like regular fields in the rustdoc output if
they have the `pub` visibility.
If the width of the bit-field is `0`, then the name of the field must be `_`.
joshtriplett 3 days ago
Member
Please specify the behavior of bitfield(0)
in this RFC. In general, I don't have any objection to defining this feature in terms of C-compatible bitfield types. However, I'd like to have a self-contained explanation for bitfield(0)
.
mahkoh 3 days ago •
Author
Contributor
I do not think the behavior of zero-width bit fields is specified enough to make this worth it. I think the most than can be said is that a zero-width bit field causes the storage unit allocated for the next bit field to be aligned at least to the next 8-bit boundary (the next byte). But this is probably not what you had in mind.
Outdated
struct.
Nevertheless, such fields appear like regular fields in the rustdoc output if
they have the `pub` visibility.
joshtriplett 3 days ago •
Member
Why don't we just prohibit the use of pub
(or any visibility) on _
fields?
mahkoh 3 days ago
Author
Contributor
I want _
to appear in rustdoc if I write a sys-wrapper for a C API that uses unnamed bit fields. On the other hand, sometimes such wrappers are supposed to have private fields. How would rustdoc decide if it should show the field or not?
With visibility modifiers this is not a problem and the author can also add doc comments if necessary. E.g. if the original C struct documents that field.
Member
joshtriplett commented 3 days ago
Future Extension:
* bitfields that are enums with an int repr * bitfields that are repr(transparent) over an int type
I agree that both of those are useful future extensions, and that they shouldn't be in the initial version of the RFC.
I'd like to see them in the future possibilities section, though.
UnnamedStructElements are used when determining the layout of the struct but are
otherwise ignored. Since they are not fields, they cannot be accessed, do not
appear in the construction of a struct, etc.
Comment on lines
+180 to +182
Kixiron 2 days ago
Member
mahkoh 2 days ago
Author
Contributor
Assume that structs could have associated types:
struct X {
type MyType = i32;
field1: u8,
field2: u16,
}
Does the statement
However, the contents of AssociatedTypes are unspecified, allowing reading from and transmuting to/from structs with associated types without incurring undefined behavior.
make sense to you?
mahkoh 2 days ago
Author
Contributor
AssociatedType is not a type. It's a non-terminal in the Rust grammar. It's a part of the syntax, and syntax does not exist at runtime. As such, they don't have contents except in the syntax sense.
Lokathor 2 days ago
Contributor
If you agree they have no content, then I don't understand what you're getting at.
Lokathor 2 days ago
Contributor
Kixiron said what I was trying to say.
Essentially, if the spare bits are defined to be anything other than "initialized but unspecified" then I can make a more safe version of bitfields, by hand, today, than this RFC would allow for.
joshtriplett 7 hours ago •
Member
I think I see why this was initially confusing.
It'd be conceptually straightforward to interpret the unnamed fields as "padding", except for the bitfield(0)
case. But then it's necessary to specify the behavior of those fields.
You're proposing, instead, that unnamed "fields" (including bitfield(0)
fields) don't define padding themselves; rather, they define the positioning of bitfields, and the positioning of bitfields within a larger structure defines padding. And that padding behaves like any other padding.
I think this explanation could use further in-RFC clarification to make the approach more obvious, and to separate the concept of "unnamed 'fields' defining where fields are placed" from "padding defined by where fields are placed". But given some additional explanation in that area, I don't object to the idea of making this RFC entirely orthogonal to any definition of padding..
I do think that many practical uses of bitfields would benefit from some additional definition of padding. But I also know that defining the behavior of padding is a much more controversial topic, and I don't want to see bitfields held up behind that topic.
mahkoh 7 hours ago
Author
Contributor
You're proposing, instead, that unnamed "fields" (including
bitfield(0)
fields) don't define padding themselves; rather, they define the positioning of bitfields, and the positioning of bitfields within a larger structure defines padding.
No I think that's too narrow. Unnamed "fields" also affect the positioning of regular fields and the overall alignment. They are layout modifiers like repr(align)
except that they are written inside the struct body.
joshtriplett 6 hours ago
Member
And, reading further into the updated description, I think the new approach of not calling them "fields" provides the necessary clarity. (The distinction of "bit-field fields" is perhaps a little awkward, but nonetheless clear.)
```
But this proposal is flawed because, in C, both the width and the underlying
type influence the layout. The Zig proposal throws the underlying type away.
clarfonthey 2 days ago •
Contributor
This is an extremely small point that I didn't know that affects the proposal a lot. How does it affect it?
This to me is an important aspect that justifies using this method instead of Zig's.
Shouldn't just be a footnote.
clarfonthey 2 days ago
Contributor
Not really. It doesn't do that great a job of providing motivation beyond "this is how it affects C."
Why not just require additional alignment attributes instead of using different types for bitfields? I feel like these should be weighed as additional options.
Compatibility with C definitions could definitely be enough to justify it, but it should be explained better in the RFC rather than assuming every reader knows what it means and how it affects their code.
mahkoh 2 days ago
Author
Contributor
It doesn't do that great a job of providing motivation beyond "this is how it affects C."
Being compatible with C without having to write target-specific code is, in fact, the motivation of the RFC and this is made pretty clear. Please read the discussion of the original RFC where at least 50% of the over 90 comments were trying to argue that what we actually want is arbitrarily sized integers and custom struct layouts.
Contributor
clarfonthey commented 2 days ago
I'm extremely torn. On one hand, there needs to be a functional way to support bitfields on repr(C) structs with full compatibility with C, and on the other, this proposal is absolutely incompatible with a more rusty but less-immediately-implementable version like repr(bitpacked) that I mentioned in my generic integers RFC.
Feel like at least some comments would be warranted on why it's reasonable to have a dedicated syntax for bitfields in repr(C) structs without an easy path toward doing so for repr(rust) and repr(packed) in a similar way.
Contributor
Lokathor commented 2 days ago
Doing this for repr(C) doesn't affect the possibility for doing this with repr(rust) or repr(bitpacked) or anything else.
Contributor
clarfonthey commented 2 days ago
I disagree; if you just design features exactly for individual use cases you end up with far more features than necessary. It's a valid criticism to say that C-style bitfields are necessary and there doesn't need to be a Rusty counterpart, but if you think that there should be a Rusty version in the future then the feature should be compatible with it.
I don't think that having to distinguish between a 3-bit u8 and a 3-bit u16 is something that should be possible in a Rusty version, and as it stands, a Rusty version could potentially look much more different than this.
Contributor
Lokathor commented 2 days ago
a 3-bit u8 is almost surely 1 byte and a 3-bit u16 is almost surely 2 bytes, under any repr, so that would distinguish them plenty.
Regardless of that: It's already the case that repr(rust) and repr(C) do very different things in select cases (eg, MyStruct(u8,u16,u8)
), so having repr(rust) and repr(C) bitfields also be different doesn't seem like a stretch at all.
Contributor
clarfonthey commented 2 days ago
a 3-bit u8 is almost surely 1 byte and a 3-bit u16 is almost surely 2 bytes, under any repr, so that would distinguish them plenty.
Respectfully disagree. When you specify bitfields you're specifying bits, under the assumption that adjacent fields will be squashed together. What happens when you specify a 2-bit u8 and then a 4-bit u16? Does that mean one byte with a value limited to 2 bits and two bytes with a value limited to 4 bits? What about a collective two bytes with six bits squashed into one byte and the rest left alone? Or maybe one byte with two bits and one with four and both aligned to two bytes?
It's not at all trivial what it means to limit bits, expecting them to be squashed together, while still having them be in a different container. In a repr(Rust) variant I can imagine it being very reasonable to simply request bits and have them by arranged in the struct wherever they fit, including inside niche areas. In cases like this I can imagine bitfields being used to reduce total space used without explicitly caring where they're placed.
Contributor
Lokathor commented 2 days ago
Alright, fair.
But none of that really affects how a repr(C) bitfield struct would work. So this RFC doesn't actually block any of that sort of discussion eventually happening.
Member
joshtriplett commented 6 hours ago
I think it's reasonable to define a #[bitfield(N)]
attribute for use with repr(C)
. If, in the future, we define a repr(Rust)
mechanism for bitfields, we can potentially reuse the attribute, while defining different semantics. Conversely, if in the future we define sized integer types, we can define how those types fit into structures and relate that to this RFC. Either way, I don't think the resulting total surface area will be substantially larger than if we started with the larger and more complex problem to begin with.
I do believe in thinking about potential future expansion and aiming to be compatible with that, but I don't think this RFC needs to expand to cover that entire potential future space. The proposal here seems like the most straightforward way to provide C-compatible bitfields, and I believe that the proposal is forward-compatible with other things we may want to do in the future.
At least 1 approving review is required to merge this pull request.
No one assigned
None yet
No milestone
Successfully merging this pull request may close these issues.
None yet
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK