2

Github RFC: Custom DSTs by strega-nil · Pull Request #2594 · rust-lang/rfcs · Gi...

 3 years ago
source link: https://github.com/rust-lang/rfcs/pull/2594
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

New issue

RFC: Custom DSTs #2594

Closed

strega-nil

wants to merge 6 commits into

rust-lang:master

from

strega-nil:custom-dst

Closed

RFC: Custom DSTs #2594

strega-nil

wants to merge 6 commits into

rust-lang:master

from

strega-nil:custom-dst

Conversation

Copy link

strega-nil commented on Nov 11, 2018

edited by cramertj

This has been a long time coming - similar to #2580, but more general.

On the unresolved question about ?Sized -> DynamicallySized, and the need for ?DynamicallySized. I think this is actually a reasonable change which makes extern types act much nicer around generic functions that already exist, and fixes all the Rust code that has been written already. No longer is, for example, Box<extern-type> allowed. I've chosen to make this change.

Note that DynamicallySized is a new language trait, on the level of Sized and Copy.

Rendered

Copy link

Author

strega-nil commented on Nov 11, 2018

It's likely that this RFC should be merged with some amount of #2580, specifically the stuff for mentioning VTables.

Copy link

Contributor

Ixrec commented on Nov 11, 2018

#2255 also seems like a relevant link, since I'm not entirely sure whether those issues are settled. I think the current status of that discussion is:

  • introducing any new ?Trait bounds like ?Move or ?DynamicallySized, such that Trait is a new implicit bound on all existing generics, is backwards incompatible and therefore can never be done (not even with editions)
  • the only loophole would be traits that are "not really new" because they're already implied by Sized, such as DynamicallySized
  • the contention that ?Move was a brute force solution seems to have proven accurate now that we've discovered the alternative of Pin, so DynamicallySized appears to be the only plausible new implicit trait bound
  • the only alternative to DynamicallySized that we've been able to find is panicking at runtime

This seems like a crucial (albeit not obviously crucial) piece of the argument in favor of this RFC, though I've never seen it spelled out like this before, so it seems worth asking whether I got all of that right.


My only actual concern with the RFC is the same one that niko articulated last time: this doesn't appear to be a hard blocker or a game changer for anything, so do we really want to prioritize work on it in the near future? But we can always accept the RFC and not implement it for a while; this is clearly the best proposal so far and the design space seems pretty thoroughly explored at this point.

To double-check: The CStr and FAM use cases are not "hard blockers" because you can do all that stuff with unsafe shenanigans today, right? i.e. there isn't any C code that's impossible to soundly integrate with Rust because of this, and this is "just" (emphasis on the scare quotes) a really huge ergonomics win for those use cases?

Copy link

Contributor

glaebhoerl commented on Nov 11, 2018

Instead of declaring a type which would otherwise be Sized and having it become !Sized if you implement DynamicallySized for it, could we go the other way, and have the way to do things be to declare a type which is not-even-DynamicallySized and then implement DynamicallySized for it?

In the current language the only such types are extern types, so you'd start off with an extern type or a struct which has an extern type as the last field, and then implement DynamicallySized.

The problem there of course is not knowing what the alignment of that field should be, so maybe instead of relying entirely on extern types we ought to introduce something like a Flexible<T> (name subject to bikeshedding!) which has the alignment of T but is neither Sized nor DynamicallySizeds, so then you'd have Flexible<T> as the last field instead of [T; 0] as in the RFC.

Just in general it feels slightly nicer and less magical if the act of implementing a trait only causes a trait to become implemented, rather than another one to become deimplemented. (There would be some precedent for the latter w.r.t. Drop causing various things like pattern matching to become disabled if you implement it, though.)

Another question: I'm guessing the language implementation would implicitly call the provided size_of_val and align_of_val in various places? What are those places? Depending on what the places are, a possible drawback might be the ability to incur unexpected costs, e.g. if instead of just reading the size from a vtable it ends up traversing a 1MB string to find the end.

Copy link

Author

strega-nil commented on Nov 11, 2018

@glaebhoerl I would argue that an extern type solution would make using FAMs far more annoying. Also, I think this is the most ergonomic way of doing this, even if it's slightly odd. Sized, Move, and DynamicallySized are odd traits.

Places the language calls size_of_val/align_of_val - afaik, just in the intrinsics (which could probably be removed). Otherwise, only in the library, when dealing with allocators.

Copy link

Author

strega-nil commented on Nov 11, 2018

edited

@Ixrec technically you're right, but it's such an incredible ergonomics win, and it's been suggested so many times. It also is an incredible ergonomics win for Matrix libraries, imo, since they'll suddenly be able to use the Deref/Index traits.

I'd also argue it should be a blocker for extern type, since as it currently stands, they're fairly broken.

Copy link

Contributor

Ixrec commented on Nov 11, 2018

@Ixrec technically you're right, but it's such an incredible ergonomics win, and it's been suggested so many times. It also is an incredible ergonomics win for Matrix libraries, imo, since they'll suddenly be able to use the Deref/Index traits.

Yeah, I'm totally on board with actually doing this. Just wanted to be sure I understood it all correctly.

I'd also argue it should be a blocker for extern type, since as it currently stands, they're fairly broken.

Interestingly, the extern types RFC states "before this is stabilized there should be some trait bound or similar on them that prevents [size_of_val's and align_of_val's] use statically", but does not list this as an unresolved question. Fortunately, the tracking issue comments also seem to imply that nobody wants to stabilize it before we figure this out.

Copy link

Author

strega-nil commented on Nov 11, 2018

@Ixrec ah, good, I literally started writing this because someone was suggesting we panic on size_of_val.

Copy link

Contributor

Ixrec commented on Nov 11, 2018

Sorry, by "figure this out" I meant come to an explicit consensus on whether or not panicking was the least bad solution to that problem.

Now that I look closer, it seems like that discussion petered out after rust-lang/rust#43467 (comment) which explicitly suggests we make align_of_val work and let size_of_val panic. So it does seem like size_of_val panics are still on the table. I guess we should ping @RalfJung and @eddyb to comment further on that.

Copy link

Author

strega-nil commented on Nov 11, 2018

Since we have a reasonable type system solution in this RFC, I don't think panicking is a reasonable choice.

text/0000-custom-dst.md

Outdated

Show resolved

text/0000-custom-dst.md

Outdated

Show resolved

# Unresolved questions

[unresolved-questions]: #unresolved-questions

- Bikeshedding names.

Centril on Nov 11, 2018

Contributor

DynamicallySized -> DynSized;
Feels distinctly more rust-y to me since we have dyn.

eddyb on Nov 11, 2018

Member

And DynSized has been used in previous (non-RFC?) proposals and discussion, so more people are already familiar with it. Not the best argument, but I was surprised to see DynamicallySized.

strega-nil on Nov 11, 2018

edited

Author

I like longer names, probably the C++ programmer in me. This name's not all that important tho :P

Ixrec on Nov 11, 2018

edited

Contributor

I am very mildly in favor of the longer name if this is expected to be a trait bound that relatively few users will ever need to write directly (it is, right?)

strega-nil on Nov 11, 2018

Author

Yeah - it gets implied by ?Sized. The only time you'd need to write it is for T: ?DynamicallySized, in case you want to be generic wrt extern types.

gnzlbg on Dec 7, 2018

Contributor

I prefer longer names, but DynSized feels more consistent with the rest of Rust to me and that trumps my "longer name" preference.

text/0000-custom-dst.md

Outdated

Show resolved

- `extern type`s do not implement `DynamicallySized`, although in theory one

could choose to implement the trait for them

(that usecase is not supported by this RFC).

- `T: DynamicallySized` bounds imply a `T: ?Sized` bound.

Centril on Nov 11, 2018

Contributor

N.B. Type: Trait is not a bound (because Trait + 'static is a bound...), it is a constraint.

strega-nil on Nov 11, 2018

Author

What's the distinction?

Centril on Nov 11, 2018

Contributor

So a trait Foo { ... } has an implicit first parameter we call Self which is not present in Haskell; This means that when you say something like MyTrait, it has the kind bound = k1 -> constraint where k1 : type | lifetime ;. The operator : in this context can then be understood as something of kind type -> bound -> constraint. Notice that when you say Show Int => in Haskell, Show :: * -> Constraint (a "bound") but Show Int :: Constraint.

Something like this RFC would be really nice for the ndarray crate. It would greatly simplify handling of array views (ArrayView/ArrayViewMut) and would make it possible to do things like return n-D array views from Index and use the standard library's Cow with arrays.

However, ndarray supports arrays with a dynamic number of dimensions (axes), so we would need the Metadata (the shape and strides of a view) to support being dynamically sized (or sized with heap-allocated data). Would it be possible to extend this RFC to handle this use-case? Maybe we could change the Copy bound on Metadata to Clone, and then provide a special implementation of Copy for &T that actually clones the metadata?

Copy link

Member

eddyb commented on Nov 12, 2018

@jturner314 That's unworkable because it would break existing code generic over references and pointers. I think what you'd want to be able to use this feature is const generics and actually parametrize those types by the number of dimensions, e.g. ArrayView<A, 4>.

*/

type Metadata: 'static + Copy + Send + Sync + Eq + Ord + Unpin;

fn size_of_val(&self) -> usize;

kennytm on Nov 12, 2018

Member

What happens if these methods are implemented to return invalid values (UB?)

struct CWStr([u16; 0]);

unsafe impl DynamicallySized {
    type Metadata = ();
    fn size_of_val(&self) -> usize { 3 }
    fn align_of_val(&self) -> usize { 3 }
}

strega-nil on Nov 12, 2018

Author

UB, that's why it's an unsafe impl.

gnzlbg on Dec 7, 2018

Contributor

The text should spell out exactly why DynamicallySized is unsafe to implement and which guarantees implementations have to uphold.

cramertj on Jan 8, 2019

Member

@ubsan Can you add these additional docs? I'd find it really helpful.

text/0000-custom-dst.md

Outdated

Show resolved

text/0000-custom-dst.md

Outdated

Show resolved

Copy link

jturner314 commented on Nov 12, 2018

edited

I think what you'd want to be able to use this feature is const generics and actually parametrize those types by the number of dimensions, e.g. ArrayView<A, 4>.

ndarray supports arrays with the number of dimensions (axes) known at compile time (e.g. arrays with "const dimensionality" Ix3) and arrays with the number of dimensions known only at runtime (arrays with "dynamic dimensionality" IxDyn). For const-dimensionality arrays, the shape and strides are represented as [T; NDIM], while for dynamic-dimensionality arrays, the shape and strides are represented Vec<T>1. The "known only at runtime"/"dynamic dimensionality" case is what I'm asking about, since const generic [T; N] doesn't work for this case.

1We actually use a small-vector optimization for four axes or less, but that's irrelevant for this discussion.

That's unworkable because it would break existing code generic over references and pointers.

Yeah, that's what I was afraid of. Another possible approach that would enable this use-case is providing an owned equivalent of &[T] that could be placed on the stack, used in Metadata, etc. (This would be similar to C99's variable length arrays.) That feature is probably outside the scope of this RFC, though. [Edit: After further thought, I believe this wouldn't work well anyway because it would make &T itself potentially dynamically-sized.]

Out of curiosity, what's a case where the "use Clone bound on Metadata, and implement Copy for &T/*const T/etc. in terms of cloning the metadata" proposal would break things?

Copy link

Contributor

Ixrec commented on Nov 12, 2018

Out of curiosity, what's a case where the "use Clone bound on Metadata, and implement Copy for &T/*const T/etc. in terms of cloning the metadata" proposal would break things?

I guess the simplest case would be any code relying on the fact that copying a Copy type cannot panic.

Copy link

jturner314 commented on Nov 12, 2018

edited

I guess the simplest case would be any code relying on the fact that copying a Copy type cannot panic.

Good point. We could treats panics as aborts in this case, though. (We already do this kind of thing for panic inside drop when unwinding from another panic.)

I thought of another potentially breaking case – code could rely on the original and copy from Copy being equal, but .clone() doesn't necessarily guarantee that. We could say that ensuring clones of Metadata are equal is a safety requirement for implementing DynamicallySized, but if code relies on the copies being bitwise identical, that wouldn't allow for Vec in Metadata.

Another potentially breaking case is dealing with uninitialized memory. If the type of the uninitialized memory is Copy, it's usually considered safe to assign to it normally (without using ptr::write) because Copy requires that the type not implement Drop. With only a Clone bound that's not the case. Unlike the other issues, I don't see a backwards compatible workaround for this that would enable using Vec in Metadata. [Is normal assignment in the Copy case really safe, though? In order to perform the assignment without ptr::write, you have to create an &mut T. Is creating an &mut T always defined behavior for uninitialized T where T: Copy?]

Yeah, I'm coming to the conclusion that a Clone bound instead of Copy bound on Metadata is unworkable without breaking changes.

Copy link

Member

eddyb commented on Nov 12, 2018

@jturner314 Rust is designed with "moves and copies are literally bitwise copies always" as a core promise, as contrasted with C++. Unsafe code is free to rely on this as much as it wants to.
So yeah, it's a serious breaking change, and one that I don't see happening.

If it's opt-in, maybe it could happen. But I also feel like the dynamic number of dimensions is less common (or will become less common with const generics).

I understand, and I agree that it's probably a fairly uncommon use-case. It's just disappointing that this RFC is so close and yet not quite sufficient for all views in ndarray. Unfortunately, I don't see a good alternative that would support the dynamic-dimensions use-case. Thanks for answering my questions about this!

Copy link

jturner314 left a comment

I added a few questions as comments.

Additionally, I'm having trouble understanding the "2D Views of Planes" example. From what I can figure out, it seems to be trying to represent a 2D, column-major, contiguous, owned array type (PlaneBuf) and a 2D (non-contiguous?) view type (Plane). However, it seems to treat the "width" as the number of rows (i.e. column length) and the "height" as the number of columns (i.e. row length); this is the reverse of how they're usually defined. Also, it's unclear if Plane is supposed to be contiguous in memory or not.

I'd be happy to fix/improve the example, but I have a couple of questions:

  1. Is the data of a Plane assumed to be contiguous in memory? I guess 'no' since there's a stride field separate from the width and height.
  2. How should size_of_val be defined for non-contiguous Planes?

}

```

These functions return the size and alignment of the header of a type;

jturner314 on Nov 13, 2018

What is the "header of a type"? Is it the Metadata for that type? The Metadata plus any padding between the Metadata and the pointer?

strega-nil on Nov 14, 2018

Author

No, the header is the bit of the type that is unsized - it's what sizeof(FAM_type) returns in C; i.e.

struct fam {
  int x;
  char buf[];
};

_Static_assert(sizeof(fam) == sizeof(int), "");

jturner314 on Nov 17, 2018

Oh, that makes sense. Will you please define "header" in the RFC itself?

and for existing DSTs,

```rust

assert_eq!(size_of_header::<[T]>(), 0);

jturner314 on Nov 13, 2018

Wouldn't this be the size of one usize (representing the length of the slice), not 0?

strega-nil on Nov 14, 2018

Author

No, because size_of_header returns the minimum size of [T], not the size of the metadata.

```rust

assert_eq!(size_of_header::<[T]>(), 0);

assert_eq!(align_of_header::<[T]>(), align_of::<T>());

jturner314 on Nov 13, 2018

Why is the alignment equal to the alignment of T? Wouldn't it be the alignment of usize?

strega-nil on Nov 14, 2018

Author

Because this is the alignment of [T], not the alignment of &[T]

```rust

#[repr(C)]

struct CStr([c_char; 0]);

jturner314 on Nov 13, 2018

Could this be struct CStr([c_char])?

strega-nil on Nov 14, 2018

Author

No - you specifically unsize sized types. The definition here is just the "header"; similar to a C definition

struct CStr {
  char buffer[];
};

Copy link

Contributor

gnzlbg commented on Aug 22, 2019

edited

I have an API that semantically accepts a &[&str] but that I'd like to use both with &[&str] and &[String]. Right now I always allocate memory to create a Vec<&str> from a Vec<String> and be able to use this API with both. (EDIT: the alternative is to use AsRef and make my API generic..).

I wish I had a StridedSlice<&str> custom DST with (len, stride) metadata that I could use in this API instead to accept both &[String] (stride = size_of::<String>()) and &[&str] (stride = size_of::<&str>()).

What's the latest on this RFC in any case?

Copy link

Contributor

nikomatsakis commented on Oct 26, 2020

Some updates from the @rust-lang/lang team:

  • We are planning to have a design meeting on November 4th to discuss #2580 and to touch on some of the extensions to it, including this RFC. Feel free to ping me if you'd like more info about the meeting.
  • To that end, I've been preparing a hackmd that summarizes key points from RFC 2580 and this RFC along with other related work. I'd appreciate any comments on it for inaccuracies or other mistakes.

// - Unpin + Copy - all pointer types

// - Send + Sync - &T, &mut T

// - Eq + Ord - *const T, *mut T

type Metadata: 'static + Copy + Send + Sync + Eq + Ord + Unpin;

HeroicKatora on Jan 7

edited

When written this way, these bounds can never be relaxed later as doing so would be a breaking change. An alternative formulation would be to turn them into constraints.

It's probably not wise most beneficial to move all bounds. But can we be sure all future use cases for extern types guarantee 'static and Send, Sync, Unpin in particular?

bjorn3 on Jan 7

Contributor

An alternative formulation would be to turn them into constraints.

Both are handled almost identically in the compiler. In either case a user of Pointee is allowed to depend on it.

HeroicKatora on Jan 7

Hm, I didn't realize. Then what would be a proper way of introducing those constraints?

bjorn3 on Jan 7

Contributor

I don't think it is possible to add constraints to the implementer without allowing the user to depend on the constraints.

Kixunil on Jan 7

Is it realistic that these bounds can be relaxed?

  • 'static - even something as simple as creating Box<TypeWithNonStaticMetadata> and calling Box::leak would invalidate it
  • Copy - creating shared reference would become impossible (yeah, you didn't mention it, but for completeness...)
  • T: Sync implies &T: Send and the users can rely on it today. print_in_other_thread would stop compiling in this example if the bounds were relaxed
  • Relaxing Unpin would mean that it'd be unsound to move &mut T. Even if conceivable, it'd be a massive footgun.

HeroicKatora on Jan 7

edited

'static - even something as simple as creating Box and calling Box::leak would invalidate it

Who's to say that all custom DSTs can be boxed, or allocated?

T: Sync implies &T: Send and the users can rely on it today. print_in_other_thread would stop compiling in this example if the bounds were relaxed

As noted T: ?Sized will imply T: Contiguous and similarly could imply <T as Pointee>::Metadata: Sync etc. such that all soundness requirements of the existing impl<T: ?Sized + Sync> Send for &T are fulfilled. Then that print_in_other_thread would need to be relaxed further when it wants to support arbitrary metdata.

Relaxing Unpin would mean that it'd be unsound to move &mut T. Even if conceivable, it'd be a massive footgun.

Same as above, if Unpin is implied by T: ?Sized (or stronger) then the footgun is hardly massive.

Kixunil on Jan 8

Yeah, that could work, I think.

Then that print_in_other_thread would need to be relaxed further when it wants to support arbitrary metdata.

It's a bit sad that many generics will have to be modified to become more flexible but there's probably no better solution. At least I don't know of such. Perhaps an automated analysis tool would help here.

Copy link

CAD97 commented on Jan 11

The rendered link in the OP is broken. Here's a current one.

Copy link

Member

pnkfelix commented on Mar 30

edited

The lang team discussed this as part of following up on "Backlog Bonanza" on 2021-03-17.

Custom DSTs are not part of the roadmap for 2021.

We have already accepted #2580 (tracking issue rust-lang/rust#81513).

We want to see that work completed and gather experience with what it offers; after that has happened, we would consider Custom DSTs as a roadmap item.

@rfcbot postpone

Copy link

rfcbot commented on Mar 30

edited by BurntSushi

Team member @pnkfelix has proposed to postpone this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Copy link

rfcbot commented 16 days ago

bellThis is now entering its final comment period, as per the review above. bell

Copy link

rfcbot commented 6 days ago

The final comment period, with a disposition to postpone, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC is now postponed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK