11

Diving Deep: implied bounds and variance :3

 2 years ago
source link: https://lcnr.de/blog/diving-deep-implied-bounds-and-variance/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
implied bounds and variance :3

Diving Deep: implied bounds and variance #25860

     while Rust is a great language and tends to be sound in most cases, its complex trait system together with its fairly organic growth results in some unsoundness and unintended behavior. fixing or even just understanding these can often be non-trivial, which is why I am looking through them to summarize the issues and discuss some potential fixes. any unsoundness discussed here is pretty much impossible to trigger unintentionally and were mostly found by explicitly searching for them

The first issue I am taking a look at is probably the most well known unsound issue of Rust.

Let’s start with a slight modification of the example by @arielb1:

// because of `&'a &'b ()` in the signature, `foo` has an implicit bound `'b: 'a`
//
// In other words, in `foo` we assume 'a to be shorter than 'b
fn foo<'a, 'b, T>(_dummy: &'a &'b (), value: &'b T) -> &'a T {
    value
}

fn bad<'c, T>(x: &'c T) -> &'static T {
    // instantiate `foo`
    let foo1: for<'a, 'b> fn(&'a &'b (), &'b T) -> &'a T = foo;

    // subtyping: instantiate 'b as 'c
    let foo2: for<'a> fn(&'a &'c (), &'c T) -> &'a T = foo1;

    // subtyping: function arguments are contravariant,
    // go from 'c to 'static, but only for one of the inputs
    let foo3: for<'a> fn(&'a &'static (), &'c T) -> &'a T = foo2;

    // subtyping: instantiate 'a as 'static
    let foo4: fn(&'static &'static (), &'c T) -> &'static T = foo3;

    // boom!
    foo4(&&(), x)
}

The function bad now allows us to arbitrarily transmute lifetimes. To show that this is unsound, we can use the following safe code which results in a segmentation fault:

fn main() {
    let mut outer: Option<&'static u32> = Some(&3);
    // `reference` only borrows from `outer` inside of the `match`,
    // but we extend its lifetime to 'static here
    let static_ref: &'static &'static u32 = match outer {
        Some(ref reference) => bad(reference),
        None => unreachable!(),
    };
    // the inner reference of `static_ref` is now `0`
    outer = None;
    // dereference the inner reference of `static_ref`
    println!("{}", static_ref);
}

In foo, we use the implicit 'b: 'a bound to shrink the lifetime of value. Inside of bad this implicit bound only remains for the &'a &'b () input of the function item. This means that foo1 to foo4 have weaker implicit bounds than foo.

It still doesn’t seem too clear to me what is at fault here, so let’s go through a bunch of possible culprits here.

Removing contravariance

Contravariance - the ability to use an fn(&'a ()) as an fn(&'static ()) - is sound by itself, but rarely useful and was assumed to close this soundness issue. See this comment for example.

This is not true however. Even without contravariance, covariance in the return type is enough to cause unsoundness:

// we again have an implicit `'b: 'a` bound, this time because of the return type
fn foo<'a, 'b, T>(_dummy: &'a (), value: &'b T) -> (&'a &'b (), &'a T) {
    (&&(), value)
}

fn bad<'c, T>(x: &'c T) -> &'static T {
    // instantiate `foo`
    let foo1: for<'a, 'b> fn(&'a (), &'b T) -> (&'a &'b (), &'a T) = foo;

    // instantiate 'a as 'static
    let foo2: for<'b> fn(&'static (), &'b T) -> (&'static &'b (), &'static T) = foo1;

    // function return types are covariant,
    // go from 'static to 'b
    let foo3: for<'b> fn(&'static (), &'b T) -> (&'b &'b (), &'static T) = foo2;

    // instantiate 'b as 'c
    let foo4: fn(&'static (), &'c T) -> (&'c &'c (), &'static T) = foo2;

    // boom!
    foo4(&(), x).1
}

Add where bounds to binders

Explored by @nikomatsakis in the same comment as above, another solution would be to add where-bounds to binders, so instead of being of the type for<'a, 'b> fn(&'a (), &'b T) -> (&'a &'b (), &'a T), the function foo has the type for<'a, 'b> where<'b: 'a, T: 'b> fn(&'a (), &'b T) -> (&'a &'b (), &'a T).

Let’s now attempt the above example except that the function item has a where<'b: 'a, T: 'b> bound which we can’t discard:

fn bad<'c, T>(x: &'c T) -> &'static T {
    // instantiate `foo` with the relevant `where` bound
    let foo1: for<'a, 'b> where<'b: 'a, T: 'b> fn(&'a &'b (), &'b T) -> &'a T = foo;

    // instantiate 'b as 'c, also substituting 'b in the `where` bound
    let foo2: for<'a> where<'c: 'a, T: 'c> fn(&'a &'c (), &'c T) -> &'a T = foo1;

    // function arguments are contravariant,
    // go from 'c to 'static, but only for one of the inputs
    //
    // depending on our implementation, this would either duplicate the `where` bound
    // or actually just error here, because we can't split 'c into 'c and 'static
    //
    // while I assume that "splitting" lifetimes in `where` bounds will error
    // let's assume that we can split 'c into two separate lifetimes
    let foo3: for<'a> where<'c: 'a, 'static: 'a, T: 'c> fn(&'a &'static (), &'c T) -> &'a T = foo2;

    // not necessary, but discard the `'static: 'a` bound, as it is trivially true and makes
    // this more difficult to read
    let foo4: for<'a> where<'c: 'a, T: 'c> fn(&'a &'static (), &'c T) -> &'a T = foo3;

    // try to instantiate 'a as 'static, this means that we have to prove
    // 'c: 'static which does not hold
    let foo5: fn(&'static &'static (), &'c T) -> &'static T = foo4;

    // all good, `foo5` is not a subtype of `foo4`
    foo5(&&(), x)
}

This shows that adding where bounds to binders would solve this soundness issue.

It does have a fairly high cost, as reading types like for<'a, 'b> where<'b: 'a, T: 'b> fn(&'a &'b (), &'b T) -> &'a T is pretty difficult and it also isn’t obvious how this interacts with lifetime elision. It might be possible to even keep these where bounds internal to the compiler, as mentioned here. I don’t personally like that idea, as I don’t see how this approach won’t break at function boundaries.

Implementing this without a noticeable performance impact or without missing some suble edge cases is definitely also a challenge.

Make lifetimes mentioned in implicit bounds early bound

You might have tried to explicitly write the 'b: 'a bound of foo while looking at this issue and have already noticed that it then doesn’t compile, therefore “fixing” this soundness issue.

// note the explicit `'b: 'a` bound
fn foo<'a, 'b, T>(_dummy: &'a &'b (), value: &'b T) -> &'a T
where
    'b: 'a
{
    value
}

fn bad<'c, T>(x: &'c T) -> &'static T {
    // try to instantiate `foo`
    //
    // this now errors with "mismatched types"
    //
    // expected fn pointer `for<'a, 'b> fn(&'a &'b (), &'b T) -> &'a T`
    // found fn pointer `fn(&&(), &T) -> &T`
    let foo1: for<'a, 'b> fn(&'a &'b (), &'b T) -> &'a T = foo;

    // ...
    todo!()
}

By adding an explicit bound mentioning both 'a and 'b, these two lifetimes are now considered early-bound. This means that you have to instantiate both of these lifetimes at the same time you also instantiate type and const parameters.

When instantiating 'a and 'b at the same time we need to instantiate them to two new lifetimes 'c and 'd with 'd: 'c, ending up with the type fn(&'c &'d (), &'d T) -> &'c T.

With this, we can’t ever go to a type fn(_, &'e T) -> &'f T without 'e: 'f as subtyping can only extend the lifetimes for the inputs or shrink the lifetimes of the output, which both maintain this bound.

This means that by changing lifetimes mentioned in implicit bounds to be early-bound we fix this bug. While I assume the breakage of this to be acceptable, we won’t know that for sure until we try.

Conclusion

With my current understanding, I personally would like to see some experimentation with forcing lifetimes mentioned in implicit bounds to be early-bound and only consider adding where bounds to binders if the breakage from this change is too great.

if you find any typos or errors in this post, please pm me on zulip, discord or twitter

back


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK