Diving Deep: implied bounds and variance :3
source link: https://lcnr.de/blog/diving-deep-implied-bounds-and-variance/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Diving Deep: implied bounds and variance #25860
while Rust is a great language and tends to be sound in most cases, its complex trait system together with its fairly organic growth results in some unsoundness and unintended behavior. fixing or even just understanding these can often be non-trivial, which is why I am looking through them to summarize the issues and discuss some potential fixes. any unsoundness discussed here is pretty much impossible to trigger unintentionally and were mostly found by explicitly searching for them
The first issue I am taking a look at is probably the most well known unsound issue of Rust.
Let’s start with a slight modification of the example by @arielb1:
// because of `&'a &'b ()` in the signature, `foo` has an implicit bound `'b: 'a`
//
// In other words, in `foo` we assume 'a to be shorter than 'b
fn foo<'a, 'b, T>(_dummy: &'a &'b (), value: &'b T) -> &'a T {
value
}
fn bad<'c, T>(x: &'c T) -> &'static T {
// instantiate `foo`
let foo1: for<'a, 'b> fn(&'a &'b (), &'b T) -> &'a T = foo;
// subtyping: instantiate 'b as 'c
let foo2: for<'a> fn(&'a &'c (), &'c T) -> &'a T = foo1;
// subtyping: function arguments are contravariant,
// go from 'c to 'static, but only for one of the inputs
let foo3: for<'a> fn(&'a &'static (), &'c T) -> &'a T = foo2;
// subtyping: instantiate 'a as 'static
let foo4: fn(&'static &'static (), &'c T) -> &'static T = foo3;
// boom!
foo4(&&(), x)
}
The function bad
now allows us to arbitrarily transmute lifetimes.
To show that this is unsound, we can use the following safe code which
results in a segmentation fault:
fn main() {
let mut outer: Option<&'static u32> = Some(&3);
// `reference` only borrows from `outer` inside of the `match`,
// but we extend its lifetime to 'static here
let static_ref: &'static &'static u32 = match outer {
Some(ref reference) => bad(reference),
None => unreachable!(),
};
// the inner reference of `static_ref` is now `0`
outer = None;
// dereference the inner reference of `static_ref`
println!("{}", static_ref);
}
In foo
, we use the implicit 'b: 'a
bound to shrink the lifetime of value
.
Inside of bad
this implicit bound only remains for the &'a &'b ()
input of the
function item. This means that foo1
to foo4
have weaker implicit bounds than foo
.
It still doesn’t seem too clear to me what is at fault here, so let’s go through a bunch of possible culprits here.
Removing contravariance
Contravariance - the ability to use an fn(&'a ())
as an fn(&'static ())
- is sound by
itself, but rarely useful and was assumed to close this soundness issue. See this comment for example.
This is not true however. Even without contravariance, covariance in the return type is enough to cause unsoundness:
// we again have an implicit `'b: 'a` bound, this time because of the return type
fn foo<'a, 'b, T>(_dummy: &'a (), value: &'b T) -> (&'a &'b (), &'a T) {
(&&(), value)
}
fn bad<'c, T>(x: &'c T) -> &'static T {
// instantiate `foo`
let foo1: for<'a, 'b> fn(&'a (), &'b T) -> (&'a &'b (), &'a T) = foo;
// instantiate 'a as 'static
let foo2: for<'b> fn(&'static (), &'b T) -> (&'static &'b (), &'static T) = foo1;
// function return types are covariant,
// go from 'static to 'b
let foo3: for<'b> fn(&'static (), &'b T) -> (&'b &'b (), &'static T) = foo2;
// instantiate 'b as 'c
let foo4: fn(&'static (), &'c T) -> (&'c &'c (), &'static T) = foo2;
// boom!
foo4(&(), x).1
}
Add where
bounds to binders
Explored by @nikomatsakis in the same comment as above, another solution would be
to add where
-bounds to binders, so instead of being of the type
for<'a, 'b> fn(&'a (), &'b T) -> (&'a &'b (), &'a T)
, the function
foo
has the type for<'a, 'b> where<'b: 'a, T: 'b> fn(&'a (), &'b T) -> (&'a &'b (), &'a T)
.
Let’s now attempt the above example except that the function item has a where<'b: 'a, T: 'b>
bound
which we can’t discard:
fn bad<'c, T>(x: &'c T) -> &'static T {
// instantiate `foo` with the relevant `where` bound
let foo1: for<'a, 'b> where<'b: 'a, T: 'b> fn(&'a &'b (), &'b T) -> &'a T = foo;
// instantiate 'b as 'c, also substituting 'b in the `where` bound
let foo2: for<'a> where<'c: 'a, T: 'c> fn(&'a &'c (), &'c T) -> &'a T = foo1;
// function arguments are contravariant,
// go from 'c to 'static, but only for one of the inputs
//
// depending on our implementation, this would either duplicate the `where` bound
// or actually just error here, because we can't split 'c into 'c and 'static
//
// while I assume that "splitting" lifetimes in `where` bounds will error
// let's assume that we can split 'c into two separate lifetimes
let foo3: for<'a> where<'c: 'a, 'static: 'a, T: 'c> fn(&'a &'static (), &'c T) -> &'a T = foo2;
// not necessary, but discard the `'static: 'a` bound, as it is trivially true and makes
// this more difficult to read
let foo4: for<'a> where<'c: 'a, T: 'c> fn(&'a &'static (), &'c T) -> &'a T = foo3;
// try to instantiate 'a as 'static, this means that we have to prove
// 'c: 'static which does not hold
let foo5: fn(&'static &'static (), &'c T) -> &'static T = foo4;
// all good, `foo5` is not a subtype of `foo4`
foo5(&&(), x)
}
This shows that adding where
bounds to binders would solve this soundness issue.
It does have a fairly high cost, as reading types like for<'a, 'b> where<'b: 'a, T: 'b> fn(&'a &'b (), &'b T) -> &'a T
is pretty difficult and it also isn’t obvious how this interacts with lifetime elision. It might be possible to even keep these where
bounds internal to the compiler, as mentioned here. I don’t personally like that idea,
as I don’t see how this approach won’t break at function boundaries.
Implementing this without a noticeable performance impact or without missing some suble edge cases is definitely also a challenge.
Make lifetimes mentioned in implicit bounds early bound
You might have tried to explicitly write the 'b: 'a
bound of foo
while looking at this issue and have already
noticed that it then doesn’t compile, therefore “fixing” this soundness issue.
// note the explicit `'b: 'a` bound
fn foo<'a, 'b, T>(_dummy: &'a &'b (), value: &'b T) -> &'a T
where
'b: 'a
{
value
}
fn bad<'c, T>(x: &'c T) -> &'static T {
// try to instantiate `foo`
//
// this now errors with "mismatched types"
//
// expected fn pointer `for<'a, 'b> fn(&'a &'b (), &'b T) -> &'a T`
// found fn pointer `fn(&&(), &T) -> &T`
let foo1: for<'a, 'b> fn(&'a &'b (), &'b T) -> &'a T = foo;
// ...
todo!()
}
By adding an explicit bound mentioning both 'a
and 'b
, these two lifetimes are now
considered early-bound. This means that you have to instantiate both of these lifetimes
at the same time you also instantiate type and const parameters.
When instantiating 'a
and 'b
at the same time we need to instantiate them to two
new lifetimes 'c
and 'd
with 'd: 'c
, ending up with the type fn(&'c &'d (), &'d T) -> &'c T
.
With this, we can’t ever go to a type fn(_, &'e T) -> &'f T
without 'e: 'f
as subtyping can only
extend the lifetimes for the inputs or shrink the lifetimes of the output, which both maintain
this bound.
This means that by changing lifetimes mentioned in implicit bounds to be early-bound we fix this bug. While I assume the breakage of this to be acceptable, we won’t know that for sure until we try.
Conclusion
With my current understanding, I personally would like to see some experimentation with forcing
lifetimes mentioned in implicit bounds to be early-bound and only consider adding where
bounds to binders
if the breakage from this change is too great.
if you find any typos or errors in this post, please pm me on zulip, discord or twitter
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK