11

Github RFC: Pointer metadata & VTable by SimonSapin · Pull Request #2580 · r...

 3 years ago
source link: https://github.com/rust-lang/rfcs/pull/2580
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Contributor

SimonSapin commented on Oct 28, 2018

Add generic APIs that allow manipulating the metadata of fat pointers:

  • Naming the metadata’s type (as an associated type)
  • Extracting metadata from a pointer
  • Reconstructing a pointer from a data pointer and metadata
  • Representing vtables, the metadata for trait objects, as a type with some limited API

This RFC does not propose a mechanism for defining custom dynamically-sized types, but tries to stay compatible with future proposals that do.

HTML view

Contributor

Author

SimonSapin commented on Oct 28, 2018

mikeyhew commented on Oct 28, 2018

edited

First of all, thanks for opening this rfc. It's the right way to fix the raw::TraitObject API, and is a big step toward custom DST.

My only criticism is I think the Vtable type should be generic over the trait object type, as mentioned in the alternatives section. Having different Metadata types for each trait object type would help catch errors at compile time. Also, it seems like &'static would preclude us from implementing trait object types like dyn Trait1 + Trait2 using multiple vtable pointers, and i think it would be nice to avoid making that decision in this RFC.

I like the name Pointee, I think it's an improvement over Referent because it's more clear what it means.

Contributor

gnzlbg commented on Oct 28, 2018

cc @ubsan

text/0000-ptr-meta.md

Outdated

Show resolved

text/0000-ptr-meta.md

Outdated

Show resolved

text/0000-ptr-meta.md

Outdated

Show resolved

text/0000-ptr-meta.md

Outdated

Show resolved

text/0000-ptr-meta.md

Outdated

Show resolved

text/0000-ptr-meta.md

Outdated

Show resolved

///

/// [dst]: https://doc.rust-lang.org/nomicon/exotic-sizes.html#dynamically-sized-types-dsts

#[lang = "pointee"]

pub trait Pointee {

oli-obk on Oct 28, 2018

Contributor

so... I'm assuming the compiler implements

default impl<T: ?Sized> Pointee for T {
    type Metadata = &'static Vtable;
}
impl<T: Sized> Pointee for T {
    type Metadata = ();
}
impl Pointee for str {
    type Metadata = usize;
}
impl<T: Sized> Pointee for [T] {
    type Metadata = usize;
}

Which means theoretically we could make Vtable generic over T allowing the drop_in_place method to take a raw pointer with the correct pointee type?

SimonSapin on Oct 28, 2018

Author

Contributor

These impls would be accurate in current Rust, but what I had in mind instead was that the compiler would automatically generate impls, similar to what it does for the std::marker::Unsize trait. As far as the standard library is concerned these impls would be "magic", not based on specialization.

Regardless, yes, making VTable generic with a type parameter for the trait object type is possible.

text/0000-ptr-meta.md

Outdated

Show resolved

(Answer: they can use a different metadata type like `[&'static VTable; N]`.)

`VTable` could be made generic with a type parameter for the trait object type that it describes.

This would avoid forcing that the size, alignment, and destruction pointers

oli-obk on Oct 28, 2018

Contributor

How would that avoid forcing this? Can you elaborate?

SimonSapin on Oct 28, 2018

Author

Contributor

Without a type parameter, x.size() with x: &'static VTable necessarily executes the same code for any vtable. With a type parameter, x: &'static VTable<dyn Foo> and x: &'static VTable<dyn Bar> are different types and could execute different code. (For example, do table lookup with different offsets.) However, keeping the offset of size the same within all vtables might be desirable regardless of this API.

`VTable` could be made generic with a type parameter for the trait object type that it describes.

This would avoid forcing that the size, alignment, and destruction pointers

be in the same location (offset) for every vtables.

But keeping them in the same location is probaly desirable anyway to keep code size

scottmcm on Oct 28, 2018

Member

Missing the end of a sentence?

type Metadata;

}

/// Pointers to types implementing this trait alias are

scottmcm on Oct 28, 2018

Member

Missing the end of a sentence?

Contributor

Author

SimonSapin commented on Oct 28, 2018

@mikeyhew I’ve very open to adding a type parameter to VTable, I’ll just wait to get some more feedback on the RFC as-is before making significant changes.

As to supporting super-fat pointers with multiple vtable pointers, as mentioned in the alternatives section I believe this design doesn’t prevent it. Types that don’t exist yet and are added to the language in the future (possibly custom DSTs) can have a different metadata type. For dyn A + B that could be [&'static VTable; 2], for example. This proposal does however force dyn C with trait C: A + B {} to keep only having a single vtable pointer.

pub unsafe fn drop_in_place(&self, data: *mut ()) { ... }

}

```

Centril on Oct 28, 2018

Contributor

No drawbacks section...?

SimonSapin on Oct 28, 2018

Author

Contributor

I came up short trying to think of a reason not to do this at all (as opposed to doing it differently). Suggestions welcome.

and (hopefully) more compatible with future custom DSTs proposals,

this RFC resolves the question of what happens

if trait objects with super-fat pointers with multiple vtable pointers are ever added.

(Answer: they can use a different metadata type like `[&'static VTable; N]`.)

Centril on Oct 28, 2018

Contributor

Should then [&'static VTable; 1] for dyn SomeTrait be used to make that transition smoother and to fit better with const generics?

SimonSapin on Oct 28, 2018

Author

Contributor

This would make some sense if we were definitely gonna have super-fat pointers with multiple separate vtable pointers as fat pointer metadata. But if we don’t and end up with a different solution to upcasting, we’ll end up with a always-single-item arrays for no reason. This isn’t really the thread to get into that discussion, but my opinion is that super-fat pointer have a significant enough size cost that I’d much prefer a different solution.

Perhaps an alternative for this RFC, more neutral with respect super-fat pointers v.s. not, would be to have type Metadata = VTable<Self>; for trait objects. (See other comments about VTable’s possible type paramater.) With the pointer/reference indirection hidden away in private fields of the VTable type, this design would be compatible with having VTable<dyn A + B> contain two pointers in the future.

SimonSapin on Oct 28, 2018

Author

Contributor

Also, is there a use case for generic code that accepts any trait object with any number of vtable pointer but not other kinds of DSTs?

and (hopefully) more compatible with future custom DSTs proposals,

this RFC resolves the question of what happens

if trait objects with super-fat pointers with multiple vtable pointers are ever added.

(Answer: they can use a different metadata type like `[&'static VTable; N]`.)

Centril on Oct 28, 2018

Contributor

Are we doing the proposals in the right order? Shouldn't we focus on dealing with dyn A + B + C, upcasting, and such things first? Also, cc #2035.

SimonSapin on Oct 28, 2018

Author

Contributor

I believe the design proposed here is compatible enough with various options for multi-traits trait objects and upcasting such that there isn’t a strong dependency, and we don’t need to block this RFC on everything else being settled.

* The name of `Pointee`. [Internals thread #6663][6663] used `Referent`.

* The location of `VTable`. Is another module more appropriate than `std::ptr`?

Centril on Oct 28, 2018

Contributor

and should it be called Dictionary instead? ("type class dictionary")

kennytm on Oct 28, 2018

Member

Big -1 to calling it Dictionary since this typically means a key-value map (example: C#, Swift, Python).

Furthermore, here in Rust the VTable is implemented as an array of function pointers, not a HashMap (unlike e.g. Python where it is really implemented as a dict), so calling it Dictionary obscures the alleged complexity.

SimonSapin on Oct 28, 2018

Author

Contributor

Even if the implementation happened to use HashMap, I’d prefer VTable since it’s more descriptive of the role of this type. (As opposed to: dictionary of what?) I believe that vtable is a well-enough established term of art.

Contributor

Author

SimonSapin commented on Oct 28, 2018

edited

Regarding making VTable generic, if it looks like this:

struct VTable<Dyn> { … }

… then it could be used with any type as a parameter. What does VTable<u32> mean? If we want to restrict VTable’s parameter to only types where it makes sense (dyn SomeTrait, dyn SomeTrait + SomeAutoTrait, etc.), we’ll need a dedicated public trait:

struct VTable<Dyn> where Dyn: ?Sized + std::marker::DynTrait { … }

Do we want such a trait?

Contributor

Author

SimonSapin commented on Oct 28, 2018

edited

Hmm, maybe we could get away with this?

struct VTable<Dyn> where Dyn: ?Sized + Pointee<Metadata=Self> { … }

Contributor

Gankra commented on Oct 28, 2018

Is there any way to ensure minimal compiler time is wasted performing monomorphization on useless vtable type params? If not, I would rather the API just be less safe.

@gankro the word "useless" sounds a little strong there... the purpose it to make sure you can't use a vtable from a *const dyn Trait to make a *const dyn SomeOtherTrait, at least not without doing something obviously hacky like transmute.

Contributor

Gankra commented on Oct 29, 2018

You're supposing a situation where I somehow am writing code with two vtable types floating around, in which case I have two data pointers, and nothing can stop me from swapping the data pointers, producing the exact same effect.

rfcbot commented on Nov 5, 2020

edited by pnkfelix

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Contributor

nikomatsakis commented on Nov 5, 2020

@rfcbot concern remove-static-bound

As discussed in the lang meeting, the 'static bound on Metadata is not necessary and should be removed.

Member

kennytm commented on Nov 6, 2020

since T: 'static implies T::Metadata: 'static anyway by Rust's rules

But what about T: 'a? Like what should happen below?

use std::marker::PhantomData;
use std::hash::Hash;

struct Weird<'a> {
    x: PhantomData<&'a ()>,
}

pub trait Pointee {
    type Metadata: Copy + Send + Sync + Ord + Hash + Unpin /* + 'static */;
}

impl<'a> Pointee for Weird<'a> {
    type Metadata = PhantomData<&'a mut &'a ()>;
}

fn ok<'a, 'b: 'a>(
    mut a: *const Weird<'a>, 
    b: *const Weird<'b>,
) {
    // should the next line be rejected?
    a = b;
}

fn not_ok<'a, 'b: 'a>(
    mut a: (*const (), <Weird<'a> as Pointee>::Metadata), 
    b: (*const (), <Weird<'b> as Pointee>::Metadata),
) {
    // the next line will be E0623 lifetime mismatch due to the invariance
    a = b;
}

Member

joshtriplett commented on Nov 6, 2020

edited

@SimonSapin

* Is this about the field inside `DynMetadata`?

No, this is about the type of the data argument of from_raw_parts.

Contributor

nikomatsakis commented on Nov 6, 2020

@kennytm

But what about T: 'a? Like what should happen below?

T: 'a => T::Metadata: 'a in general, yes.

Contributor

nikomatsakis commented on Nov 6, 2020

edited

However, @kennytm, I am now reading your example more closely, and I see I didn't address the substance of it really. The point is that because *const T (and &T) are covariant with respect to T, that implies that the Metadata might change too as T is adjusted, but we don't know that the resulting T::Metadata will be valid. That's...a good point.

Contributor

nikomatsakis commented on Nov 6, 2020

edited

Now I'm wondering about other sorts of subtyping that we have in Rust, and whether they to can cause problems. Seems like yes. In particular, we have subtyping from fn types with bound lifetimes, though I often wish we didn't. So conceivably you could do something like

impl Pointee for Weird<fn(&u8)> {
    type Metadata = X;
}

impl<T> Pointee for Weird<T> {
    type Metadata = Y;
}

and that would get us into some awkward trouble using a similar construction, where you "upcast" from *const Weird<fn(&u8)> to (say) *const Weird<fn(&'static u8)> (which works).

Contributor

nikomatsakis commented on Nov 6, 2020

edited

So the TL;DR is that I don't think that a 'static bound actually solves the problem you're concerned about, but indeed it is a real problem that we would have to address in some way if we permitted user-provided impls.

@nikomatsakis as far as I understand fn(T) -> .. is contravariant with respect to T, and https://github.com/sunshowers/lifetime-variance-example/blob/master/src/lib.rs suggests that that is fine, what can materially go wrong with the playground example you linked?

Contributor

nikomatsakis commented on Nov 10, 2020

@guswynn the problem is that one could, if we permitted user-defined Pointee impls, conceivably change the Metadata types for those two types. In that case, if you had a *const Weird<fn(&u8)> and you upcast it to the supertype *const Weird<fn(&'static u8)> (for example), the metadata would have changed. That doesn't work because subtype and supertype have to have the same representation (including metadata).

@nikomatsakis does any set of dangerous impl's require full-specialization? as far as I can tell, only min_specialization wouldn't permit this overlap: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=74c83f9a0748c1043e479c0547243968 (actually, #![feature(specialization)] doesn't even work here either, and I'm not sure the status of specializable associated types, rust-lang/rust#31844 looks like it has some unresolved details)

Contributor

nikomatsakis commented on Nov 14, 2020

@guswynn this isn't really about specialization -- these two impls are not considered specializations of one another, they apply to distinct types. Currently we give a future-compatibility warning, but I expect that we will eventually accept that pattern so as not to break wasm-bindgen.

Member

scottmcm commented on Nov 16, 2020

why Hash and Ord? because of impls on *const T

How much does it matter that these impls care about the data portion? Is it even desirable? IIRC it's an anti-pattern to care about it on *const dyn Trait anyway since it might have different values in different places.

I guess we're stuck with it for Ord because of *const [T]?

Contributor

Author

SimonSapin commented on Nov 18, 2020

Yes it’s a problem that vtable pointers can have different addresses for the same vtable generated independently in separate compilation units, but it’s still a compatibility constraint that Hash and Ord are implemented for *const dyn Trait. At they moment they also hash/compare the vtable pointer address, even though that might give unexpected results. I’m not sure how compatible it would be to change that behavior to ignore vtable pointers entirely.

Wouldn't rustc be able to use a linkonce_odr comdat for vtables to merge them, just like C++ does for classes that don't have a specific owning .cpp file?

https://gcc.godbolt.org/z/M16voj

Contributor

bjorn3 commented on Nov 18, 2020

That doesn't work across dynamic libraries for example.

That doesn't work across dynamic libraries for example.

That's not entirely true, GCC seems to think it does:
https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gccint/WHOPR.html

The whole program mode assumptions are slightly more complex in C++, where inline functions in headers are put into COMDAT sections. COMDAT function and variables can be defined by multiple object files and their bodies are unified at link-time and dynamic link-time.

Contributor

Amanieu commented on Nov 18, 2020

The whole program mode assumptions are slightly more complex in C++, where inline functions in headers are put into COMDAT sections. COMDAT function and variables can be defined by multiple object files and their bodies are unified at link-time and dynamic link-time.

This is only true on ELF platforms. Windows and macOS don't use ELF and have much more limited run-time linking.

Contributor

Author

SimonSapin commented 18 days ago

I’ve pushed a few changes:

  • Remove the 'static bound as requested. This should resolve the concern filed by @nikomatsakis
  • Parameterize DynMetadata so it’s used like DynMetadata<dyn SomeTrait>. After recent discussions I have a mild preference for it but I left a unresolved question about changing it back. Mention DynMetadata<u64> as a type you can name but not obtain a meaningful value of.
  • Add an unresolved question about adding into_raw_parts, potentially instead of the metadata function. This was suggested in #2984 (comment)

I’m still in interested in opinions on unresolved questions, in particular the API design details.

Ping @Amanieu, @BurntSushi, @cramertj, @pnkfelix, and @withoutboats for checkboxes in #2580 (comment).


@joshtriplett

Is *const () the way we want to spell "untyped pointer" here? Is there any way we can make this forward-compatible with a potential future version of c_void that's in core rather than just std? (Note that if we end up going with one of the proposals that uses a typed pointer here, that's fine, but if we're going to use an untyped pointer, I'd rather not have multiple incompatible untyped pointer types.)

Sorry I misunderstood this earlier. That’s a good question.

c_void does exist in core these days, but at the language level it’s just another Sized type like () (except of size 1). What makes it special is only its use convention in libc and other FFI crates.

If the language had truly untyped pointers they would indeed be perfect for from_raw_parts, but it doesn’t and I’m not aware of a proposal to add them. I picked *const () as what felt like the closest thing that exist (more so than enum c_void { #[doc(hidden)] __variant1, #[doc(hidden)] __variant2 }) but I’m open to changing it.

Contributor

nikomatsakis commented 3 days ago

@rfcbot resolve remove-static-bound

rfcbot commented 3 days ago

bellThis is now entering its final comment period, as per the review above. bell


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK