9

Github Add language support for C-compatible bit-fields by mahkoh · Pull Request...

 3 years ago
source link: https://github.com/rust-lang/rfcs/pull/3064
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Add language support for C-compatible bit-fields #3064

Conversation

Contributor

mahkoh commented 4 days ago

edited

This RFC adds support for bit-fields in repr(C) structs by

  • introducing a new attribute bitfield(N) that can be applied to integer
    fields,
  • allowing such annotated fields to be unnamed.

Rendered

mahkoh

changed the title Add language support for C-compatible bitfields

Add language support for C-compatible bit-fields

4 days ago

Contributor

Author

mahkoh commented 4 days ago

This is a rewrite of #1449 with concerns addressed.

In particular:

  • The overflow behavior is now fully specified.
  • The bit-field width must now be a literal.
  • Bit-fields are now restricted to repr(C) structs and their layout is defined in terms of the corresponding layout used by Clang.

On top of the features in the previous RFC, this RFC adds syntax for unnamed bit-fields. This is necessary because such fields have different effects on the layout than named bit-fields.

Other questions from the other RFC:

  • Why not use some kind of code generation?

    Any code generation needs to have access to the types of all fields in the struct. These types can usually only be determined during compilation. See the last motivating example in the RFC.

  • Why not a more general approach that allows us to specify arbitrary struct layout?

    The problem in the first place is that we do not know the layout until during compilation.

  • Is this really needed? I thought C bitfields were an esoteric feature nobody uses.

    There are over 400 bit-fields in the Linux kernel user-space API.

A good future extension would be allowing bitfield widths and/or types to be generic parameters

Member

Kixiron commented 3 days ago

What value do the bits in an unnamed field hold? Will they have an undefined value (and hence be ub to read in any aspect, including casting/transmuting from the struct as a whole), a guaranteed value (0 I'd guess?) or simply an unspecified value (allowing reading the field without undefined behavior but still not allowing the actual value to be relied on)?

Contributor

Lokathor commented 3 days ago

I would hope they have simple "unspecified value". AKA "they can be whatever, no niche there". This lets bitfield types be transmuted to and from plain integer types with far less worry.

Member

Kixiron commented 3 days ago

I most definitely agree, I was just bringing it up so it gets added to the rfc

A field named `_` is called an *unnamed* field. All other fields are called

*named* fields. Each named field annotated with `bitfield(N)` occupies `N` bits

of storage.

of storage. Unnamed fields do not occupy any storage.

Comment on lines

193 to 195

Kixiron 3 days ago

Member

This doesn't really clarify anything, the term "storage" is really ambiguous and the wording of "unnamed fields do not occupy any storage" implies that unnamed fields don't take up any space whatsoever, which isn't correct

mahkoh 3 days ago

edited

Author

Contributor

implies that unnamed fields don't take up any space whatsoever, which isn't correct

That is the intended reading. Unnamed fields only affect the layout but are not themselves members of the struct. See also the equivalent notation for unnamed fields using attributes in the RFC.

In C++, unnamed fields are described as "not being members" of their surrounding struct. In C the language is not as simple but this difference is hopefully irrelevant in practice.

Kixiron 3 days ago

Member

The space still exists though, those bits don't just disappear right?

mahkoh 3 days ago

edited

Author

Contributor

I do not know what you mean by "the space exists". Space between fields in structs "exists" but the behavior of that space is not defined in great detail as far as I can tell.

Lokathor 3 days ago

Contributor

Right, that's the problem. You have to say what's going on with that. Even if C doesn't, in Rust we care about those sorts of things.

mahkoh 3 days ago

Author

Contributor

This RFC is not the place to resolve these issues.

mahkoh 3 days ago

Author

Contributor

This issue seems to be the most relevant: rust-lang/unsafe-code-guidelines#156

Lokathor 3 days ago

Contributor

You don't have to fix all of uninit memory for rust, you just have to say something about the spare bits in this one case.

It can be "the bits are unspecified" (my pick), or "the bits are niche", or "the bits are always zero", or whatever, but the RFC shouldn't leave this unanswered.

mahkoh 3 days ago

edited

Author

Contributor

If you get such language merged into the language reference for padding bits in general, then I will refer to it. Otherwise I do not see why this RFC should document the state of padding bits of

#[repr(C)]
struct X {
	a: u8,
	#[bitfield(8)] _: u8,
	b: u16,
}

when the state of padding bits of

#[repr(C)]
struct X {
	a: u8,
	b: u16,
}

is not documented.

Kixiron 3 days ago

Member

Yah, this RFC is in a position in which it can entirely bypass the issues of previous code with uninit memory since it can simply declare what the state of "padding bits" should be, which should almost definitely be unspecified

Contributor

Lokathor commented 3 days ago

Future Extension:

  • bitfields that are enums with an int repr
  • bitfields that are repr(transparent) over an int type

let X { a, b, d } = x;

```

Just like in C, you cannot take the address of a bit-field:

joshtriplett 3 days ago

Member

We could, potentially, allow taking the address of a bitfield if and only if it's aligned to a byte boundary. But I don't think we should. So, this seems fine.

(resp. 0-extended) to the size of the type of the field. (1-extended means

that the new bits will have the value of the most significant bit. In

particular, bit-fields with signed types with width 1 can only store the

values `0` and `-1`.)

joshtriplett 3 days ago

Member

Thank you for specifying this behavior precisely.

A field named `_` is called an *unnamed* field. All other fields are called

*named* fields. Each named field annotated with `bitfield(N)` occupies `N` bits

of storage. Unnamed fields do not occupy any storage.

joshtriplett 3 days ago

Member

I agree that this needs clarification.

Suggested change
of storage. Unnamed fields do not occupy any storage. of storage. Unnamed bit-fields act like padding within the structure; they affect the positioning of other fields, but are not themselves accessible. They act in every way like padding bits.

Kixiron 3 days ago

Member

I'm wary of the "in every way like padding bits" phrasing because it brings to mind undefined padding bytes (a la this comment)

mahkoh 3 days ago

Author

Contributor

I've rewritten the reference section to make it clear that _ bit-fields are not fields.

If the width of the bit-field is `0`, then the name of the field must be `_`.

A field named `_` is called an *unnamed* field. All other fields are called

joshtriplett 3 days ago

edited

Member

Please note that we already have a concept of unnamed fields of struct or union type, which have a different semantic. For clarity, I'd suggest:

Suggested change
A field named `_` is called an *unnamed* field. All other fields are called A bit-field named `_` is called an *unnamed bit-field*. All other bit-fields are called

Or, for that matter, perhaps directly call it a padding bit-field.

struct are possible unless all accesses are immutable. Therefore, unlike in C,

the compiler can implement access to bit-fields with any kind of load/write

instruction, even if such an instruction overlaps the memory locations of other

fields in the struct.

joshtriplett 3 days ago

edited

Member

I'd prefer not to leave the behavior quite this unspecified. I would prefer to have it explicitly specified that a store to a bitfield is permitted to be combined with other stores to bitfields or other fields in a single operation, but that the operation will not be performed via a wider write than a single machine word that affects fields other than those being written, unless the entire structure is actually being written.

People do use bitfields for flags within larger structures, where other parts of the structure might be concurrently accessed while under a lock, and they assume that a write to one of those bitfields will not overwrite a lock-protected value. We should be precise enough that people know how they can avoid that.

mahkoh 3 days ago

edited

Author

Contributor

People do use bitfields for flags within larger structures, where other parts of the structure might be concurrently accessed while under a lock

You're right, I didn't take into account that other fields might have interior mutability. I'll replace this by the usual language:

The memory locations of two different fields within a struct do not overlap if

  • at least one of them is not a bit-field or
  • they are separated, in declaration order, by at least one field
    • that is not a bit-field or
    • that is a bit-field with width 0.

"Memory location" is the language used by C to determine when two unsynchronized concurrent accesses are well-defined.

joshtriplett 3 days ago

edited

Member

That sounds reasonable to me, assuming that it includes associated text explaining what can and can't be done in the same memory operation.

mahkoh 3 days ago

Author

Contributor

I've chosen a somewhat different formulation to avoid pulling in the C memory model:

When a bit-field field is accessed, the abstract machine may also access
adjacent bit-field fields but not fields that are separated from the field by a
StructBodyElement that is not a bit-field field. (Note: This paragraph restricts
the kinds of loads and stores the compiler can perform when accessing a
bit-field. This paragraph does not need to be specially advertised to users as
the inability to take references to bit-field fields makes it impossible to
access adjacent bit-field fields in otherwise sound code.)

- When the assignment happens, the compiler writes the bits on the rhs to

the correct positions in the struct.

This is a more structured solution but would need to be fleshed out.

joshtriplett 3 days ago

Member

I agree both that this might be a good long-term solution (and bit-sized types would be useful for other purposes), and that this should not block the implementation of bitfields.

If, in the future, we add bit-sized types, we can specify how to put them in a repr(C) struct and have them behave like a bitfield. That should not stop us from providing the most straightforward implementation of bitfields without introducing bit-sized types.

# Unresolved questions

[unresolved-questions]: #unresolved-questions

- On Windows, Clang and GCC produce different layouts for packed structs:

joshtriplett 3 days ago

Member

We should specify, explicitly, that the behavior on MSVC targets must match that of MSVC.

(We don't, today, need to provide a means of controlling the ms_struct flag. We may need to eventually, though.)

mahkoh 3 days ago

Author

Contributor

@retep998 Please give your opinion on which language, if any, should be added to this RFC and which language should be included in the reference.

Nevertheless, such fields appear like regular fields in the rustdoc output if

they have the `pub` visibility.

If the width of the bit-field is `0`, then the name of the field must be `_`.

joshtriplett 3 days ago

Member

Please specify the behavior of bitfield(0) in this RFC. In general, I don't have any objection to defining this feature in terms of C-compatible bitfield types. However, I'd like to have a self-contained explanation for bitfield(0).

mahkoh 3 days ago

edited

Author

Contributor

I do not think the behavior of zero-width bit fields is specified enough to make this worth it. I think the most than can be said is that a zero-width bit field causes the storage unit allocated for the next bit field to be aligned at least to the next 8-bit boundary (the next byte). But this is probably not what you had in mind.

struct.

Nevertheless, such fields appear like regular fields in the rustdoc output if

they have the `pub` visibility.

joshtriplett 3 days ago

edited

Member

Why don't we just prohibit the use of pub (or any visibility) on _ fields?

mahkoh 3 days ago

Author

Contributor

I want _ to appear in rustdoc if I write a sys-wrapper for a C API that uses unnamed bit fields. On the other hand, sometimes such wrappers are supposed to have private fields. How would rustdoc decide if it should show the field or not?

With visibility modifiers this is not a problem and the author can also add doc comments if necessary. E.g. if the original C struct documents that field.

Member

joshtriplett commented 3 days ago

Future Extension:

* bitfields that are enums with an int repr

* bitfields that are repr(transparent) over an int type

I agree that both of those are useful future extensions, and that they shouldn't be in the initial version of the RFC.

I'd like to see them in the future possibilities section, though.

UnnamedStructElements are used when determining the layout of the struct but are

otherwise ignored. Since they are not fields, they cannot be accessed, do not

appear in the construction of a struct, etc.

Comment on lines

+180 to +182

Kixiron 2 days ago

Member

Suggested change
UnnamedStructElements are used when determining the layout of the struct but are otherwise ignored. Since they are not fields, they cannot be accessed, do not appear in the construction of a struct, etc. UnnamedStructElements are used when determining the layout of the struct but are otherwise ignored. Since they are not fields, they cannot be accessed, do not appear in the construction of a struct, etc. However, the contents of UnnamedStructElements are unspecified, allowing reading from and transmuting to/from bitfield structs without incurring undefined behavior.

mahkoh 2 days ago

Author

Contributor

Assume that structs could have associated types:

struct X {
	type MyType = i32;
	field1: u8,
	field2: u16,
}

Does the statement

However, the contents of AssociatedTypes are unspecified, allowing reading from and transmuting to/from structs with associated types without incurring undefined behavior.

make sense to you?

Lokathor 2 days ago

Contributor

Of course not, because types don't hold values.

mahkoh 2 days ago

Author

Contributor

AssociatedType is not a type. It's a non-terminal in the Rust grammar. It's a part of the syntax, and syntax does not exist at runtime. As such, they don't have contents except in the syntax sense.

Lokathor 2 days ago

Contributor

If you agree they have no content, then I don't understand what you're getting at.

Lokathor 2 days ago

Contributor

Kixiron said what I was trying to say.

Essentially, if the spare bits are defined to be anything other than "initialized but unspecified" then I can make a more safe version of bitfields, by hand, today, than this RFC would allow for.

joshtriplett 7 hours ago

edited

Member

I think I see why this was initially confusing.

It'd be conceptually straightforward to interpret the unnamed fields as "padding", except for the bitfield(0) case. But then it's necessary to specify the behavior of those fields.

You're proposing, instead, that unnamed "fields" (including bitfield(0) fields) don't define padding themselves; rather, they define the positioning of bitfields, and the positioning of bitfields within a larger structure defines padding. And that padding behaves like any other padding.

I think this explanation could use further in-RFC clarification to make the approach more obvious, and to separate the concept of "unnamed 'fields' defining where fields are placed" from "padding defined by where fields are placed". But given some additional explanation in that area, I don't object to the idea of making this RFC entirely orthogonal to any definition of padding..

I do think that many practical uses of bitfields would benefit from some additional definition of padding. But I also know that defining the behavior of padding is a much more controversial topic, and I don't want to see bitfields held up behind that topic.

mahkoh 7 hours ago

Author

Contributor

You're proposing, instead, that unnamed "fields" (including bitfield(0) fields) don't define padding themselves; rather, they define the positioning of bitfields, and the positioning of bitfields within a larger structure defines padding.

No I think that's too narrow. Unnamed "fields" also affect the positioning of regular fields and the overall alignment. They are layout modifiers like repr(align) except that they are written inside the struct body.

joshtriplett 6 hours ago

Member

And, reading further into the updated description, I think the new approach of not calling them "fields" provides the necessary clarity. (The distinction of "bit-field fields" is perhaps a little awkward, but nonetheless clear.)

```

But this proposal is flawed because, in C, both the width and the underlying

type influence the layout. The Zig proposal throws the underlying type away.

clarfonthey 2 days ago

edited

Contributor

This is an extremely small point that I didn't know that affects the proposal a lot. How does it affect it?

This to me is an important aspect that justifies using this method instead of Zig's.

Shouldn't just be a footnote.

mahkoh 2 days ago

Author

Contributor

I believe the first motivating example should make this clear.

clarfonthey 2 days ago

Contributor

Not really. It doesn't do that great a job of providing motivation beyond "this is how it affects C."

Why not just require additional alignment attributes instead of using different types for bitfields? I feel like these should be weighed as additional options.

Compatibility with C definitions could definitely be enough to justify it, but it should be explained better in the RFC rather than assuming every reader knows what it means and how it affects their code.

mahkoh 2 days ago

Author

Contributor

It doesn't do that great a job of providing motivation beyond "this is how it affects C."

Being compatible with C without having to write target-specific code is, in fact, the motivation of the RFC and this is made pretty clear. Please read the discussion of the original RFC where at least 50% of the over 90 comments were trying to argue that what we actually want is arbitrarily sized integers and custom struct layouts.

Contributor

clarfonthey commented 2 days ago

I'm extremely torn. On one hand, there needs to be a functional way to support bitfields on repr(C) structs with full compatibility with C, and on the other, this proposal is absolutely incompatible with a more rusty but less-immediately-implementable version like repr(bitpacked) that I mentioned in my generic integers RFC.

Feel like at least some comments would be warranted on why it's reasonable to have a dedicated syntax for bitfields in repr(C) structs without an easy path toward doing so for repr(rust) and repr(packed) in a similar way.

Contributor

Lokathor commented 2 days ago

Doing this for repr(C) doesn't affect the possibility for doing this with repr(rust) or repr(bitpacked) or anything else.

Contributor

clarfonthey commented 2 days ago

I disagree; if you just design features exactly for individual use cases you end up with far more features than necessary. It's a valid criticism to say that C-style bitfields are necessary and there doesn't need to be a Rusty counterpart, but if you think that there should be a Rusty version in the future then the feature should be compatible with it.

I don't think that having to distinguish between a 3-bit u8 and a 3-bit u16 is something that should be possible in a Rusty version, and as it stands, a Rusty version could potentially look much more different than this.

Contributor

Lokathor commented 2 days ago

a 3-bit u8 is almost surely 1 byte and a 3-bit u16 is almost surely 2 bytes, under any repr, so that would distinguish them plenty.

Regardless of that: It's already the case that repr(rust) and repr(C) do very different things in select cases (eg, MyStruct(u8,u16,u8)), so having repr(rust) and repr(C) bitfields also be different doesn't seem like a stretch at all.

Contributor

clarfonthey commented 2 days ago

a 3-bit u8 is almost surely 1 byte and a 3-bit u16 is almost surely 2 bytes, under any repr, so that would distinguish them plenty.

Respectfully disagree. When you specify bitfields you're specifying bits, under the assumption that adjacent fields will be squashed together. What happens when you specify a 2-bit u8 and then a 4-bit u16? Does that mean one byte with a value limited to 2 bits and two bytes with a value limited to 4 bits? What about a collective two bytes with six bits squashed into one byte and the rest left alone? Or maybe one byte with two bits and one with four and both aligned to two bytes?

It's not at all trivial what it means to limit bits, expecting them to be squashed together, while still having them be in a different container. In a repr(Rust) variant I can imagine it being very reasonable to simply request bits and have them by arranged in the struct wherever they fit, including inside niche areas. In cases like this I can imagine bitfields being used to reduce total space used without explicitly caring where they're placed.

Contributor

Lokathor commented 2 days ago

Alright, fair.

But none of that really affects how a repr(C) bitfield struct would work. So this RFC doesn't actually block any of that sort of discussion eventually happening.

Member

joshtriplett commented 6 hours ago

I think it's reasonable to define a #[bitfield(N)] attribute for use with repr(C). If, in the future, we define a repr(Rust) mechanism for bitfields, we can potentially reuse the attribute, while defining different semantics. Conversely, if in the future we define sized integer types, we can define how those types fit into structures and relate that to this RFC. Either way, I don't think the resulting total surface area will be substantially larger than if we started with the larger and more complex problem to begin with.

I do believe in thinking about potential future expansion and aiming to be compatible with that, but I don't think this RFC needs to expand to cover that entire potential future space. The proposal here seems like the most straightforward way to provide C-compatible bitfields, and I believe that the proposal is forward-compatible with other things we may want to do in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers

joshtriplett

Lokathor

clarfonthey

Kixiron

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels
None yet
Projects

None yet

Milestone

No milestone

Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK