5

Github RFC: I/O Safety by sunfishcode · Pull Request #3128 · rust-lang/rfcs · Gi...

 3 years ago
source link: https://github.com/rust-lang/rfcs/pull/3128
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Copy link

Member

sunfishcode commented 14 days ago

edited

Pre-RFC on IRLO

Raw OS handles such as RawFd and RawHandle have hazards similar to raw pointers; they may be bogus or may dangle, leading to broken encapsulation boundaries and code whose behavior is impossible to bound in general.

Introduce a concept of I/O safety, and introduce a new set of types and traits, led by OwnedFd and BorrowedFd, to support it.

[Edited to reflect changes to the RFC].

Rendered

Copy link

SOF3 commented 14 days ago

edited

Either way, creating an IO-safe random-access pointer to the mmap device file can mess up the memory safety of rust, right?

Copy link

Member

Author

sunfishcode commented 14 days ago

Either way, creating an IO-safe random-access pointer to the mmap device file can mess up the memory safety of rust, right?

A safe mmap wrapper requires preventing writes to the file (if you hand out &Ts or &mut Ts) and truncations (which can lead to SIGBUSs). This is difficult in general (and making it easier is out of scope here) but there are OS-specific ways to do it in some cases. Some of the ways work by making the file inaccessible to other processes, however they don't always prevent access from the same process. Any code in the process holding a bogus or dangling file descriptor can bypass encapsulation boundaries and operate on the file with the rights of the process. I/O safety prevents safe Rust from doing that.

Copy link

Member

Author

sunfishcode commented 14 days ago

Either way, creating an IO-safe random-access pointer to the mmap device file can mess up the memory safety of rust, right?

Ah, if you're talking about Linux's /proc/self/mem, yes, that can also break memory safety. Spawning a debugger process can too. These are independent concerns that don't cancel the usefulness of memory safety or I/O safety.

Copy link

the8472 commented 12 days ago

Any code in the process holding a bogus or dangling file descriptor can bypass encapsulation boundaries and operate on the file with the rights of the process. I/O safety prevents safe Rust from doing that.

File::open("/proc/self/fd/5")

Copy link

Member

RalfJung commented 12 days ago

File::open("/proc/self/fd/5")

I view that like /proc/self/mem: we have to basically ignore its existence to be able to say anything reasonable about what Rust programs can and cannot do. Ideally we'd be able to ask the Linux kernel to just turn off these files for our process... that's a different discussion.

Copy link

the8472 commented 12 days ago

Ideally we'd be able to ask the Linux kernel to just turn off these files for our process... that's a different discussion.

That could be done by passing a flag to openat2, at least for the fd magic symlinks, but not for mem. But that would break legit uses of the fd directory such as bash's process substitution feature.

Copy link

Contributor

workingjubilee commented 11 days ago

It's true that one can reasonably say that anything to do with File and OpenOptions is arguably unsafe because we can access things like the proc filesystem with it and thus edit a Rust program's runtime memory. But while marking those as unsafe in response might be humorous, the joke would be short lived, and I am not entirely sure the laughs would last much longer if we asked the OS to block a bunch of file descriptors. Most OS allow a programmer to invoke ambient authority early and often, and accordingly so does Rust. But that does not mean we should give up on this concern about I/O safety.

I believe the correct focus for unsafe semantics is on anything that projects into, and can be fully crystallized within, Rust logical control flow. File manipulation by necessity projects both ways: Rust extends requests to the OS, and the OS extends authority to Rust. With that, we can focus our concern on making sure the program does not violate Rust programmatic I/O safety.

That means it's not about whether the OS allows Rust to edit its own runtime memory, or even its binary code, but making sure that the authority to do so is obtained in a lawful manner obtained via File APIs. With an appropriate extension to the language, such as IoSafe or some other means, we could make sure sure that a Rust program can, following the explicit request of the programmer, soundly invalidate each and every last one of Rust's safety guarantees (and itself as a Rust program!). That's much better than unsoundly invalidating all of Rust's safety guarantees, I think.

Copy link

Member

Author

sunfishcode commented 9 days ago

Several folks have expressed interest in seeing what a less minimalist change to std might look like, both in comments above and in offline discussion with @joshtriplett.

I've now put together a prototype of such an API. The high-level summary is that it defines OwnedFd and BorrowedFd and related things. BorrowedFd looks like it'll be nicer than &OwnedFd because it can use repr(transparent) and participate in FFI directly. This means users can avoid touching Raw* values entirely.

The prototype repo, with an example that supports Unix and Windows, is here:

https://github.com/sunfishcode/io-experiment

Feedback welcome!

Copy link

Member

RalfJung commented 9 days ago

Yeah that looks great. :) Though I am a bit surprised by OptionFd... for this to be useful in FFI, I think the "sentry value" needs to be documented. But I am also not sure if it is truly needed -- since rust-lang/rust#74699 landed, we could probably arrange for Option<OwnedFd> to have the desired representation.

Copy link

Member

Author

sunfishcode commented 9 days ago

Option<OwnedFd> instead of OptionFd would be nice, if feasible. It looks like that would require special-casing it within rustc in places like this and this and possibly others. Does that sound like what you're thinking about?

Also, my prototype doesn't yet deal with this yet, but Windows has multiple representations for optional handles. I'm not a Windows expert, but if I understand correctly, we may still be able to have Option<Handle> with NULL as the sentry value, along with separate types for file handles with INVALID_HANDLE_VALUE as the sentry value.

Copy link

Member

RalfJung commented 9 days ago

edited

Option instead of OptionFd would be nice, if feasible. It looks like that would require special-casing it within rustc in places like this and this and possibly others. Does that sound like what you're thinking about?

I think it would just require setting the right rustc_layout_scalar_valid_range_start/rustc_layout_scalar_valid_range_end like rust-lang/rust#74699 did. No rustc changes should be necessary, only library changes (and documenting that we guarantee the appropriate ABI).

EDIT: Oh, you are pointing to the FFI-compat lints. Yeah those would also need to be adjusted, indeed. (This is the codification of "documenting that we guarantee the appropriate ABI".)

Copy link

Contributor

pickfire commented 8 days ago

Users could still do unsafe { File::from_raw_fd(5) } right? How can using IoSafe helps here? Isn't it the same?

Copy link

Member

RalfJung commented 8 days ago

@pickfire that's unsafe.
Consider this: users can also do unsafe { &*(42 as *const i32) }, and yet Rust has memory safety. The same works for I/O safety.

Copy link

FloLo100 commented 8 days ago

Users could still do unsafe { File::from_raw_fd(5) } right? How can using IoSafe helps here? Isn't it the same?

"Dereferencing" filedescriptors not obtained by functions implementing IoSafe would be considered UB aka "Dont do that".

Copy link

Member

Author

sunfishcode commented 7 days ago

I've now removed the OptionFd as @RalfJung suggested, and it works well using Option.

To support this, I added niche optimizations to the Windows types, so that Option<OwnedHandle> and Option<OwnedSocket> have the right representation for FFI in the general case, and added an OptionFileHandle to cover the INVALID_HANDLE_VALUE case.

Seeing how well repr(transparent) works out, and seeing how tricky IoSafe is turning out to be, I'm becoming much more optimistic about this approach in general.

Copy link

programmerjake commented 7 days ago

edited

To support this, I added niche optimizations to the Windows types, so that Option<OwnedHandle> and Option<OwnedSocket> have the right representation for FFI in the general case, and added an OptionFileHandle to cover the INVALID_HANDLE_VALUE case.

iirc Windows has two different invalid values it uses inconsistently for handles: NULL and INVALID_HANDLE_VALUE. I can't remember which one(s) are used for file, socket, and other handles we care about, this will require extra care to get right, at the minimum the types should be explicitly documented which values they have for niches since that could be a major foot-gun.

Copy link

Member

Author

sunfishcode commented 7 days ago

iirc Windows has two different invalid values it uses inconsistently for handles: NULL and INVALID_HANDLE_VALUE. Icr which one(s) are used for file, socket, and other handles we care about, this will require extra care to get right, at the minimum the types should be explicitly documented which values they have for niches since that could be a major foot-gun.

That's right; the blog post I linked to above has more info. io-experiment now deals with this, and documents which values are valid in each type, such as here for OptionFileHandle, the type where INVALID_HANDLE_VALUE is disallowed.

If possible, I think putting the IoSafe (or equivalent) traits in core is a good idea, allowing no_std crates to handle file descriptors or handles in an abstracted manner, or otherwise use IoSafe without needing to depend on std.

Copy link

Contributor

workingjubilee commented 7 days ago

iirc Windows has two different invalid values it uses inconsistently for handles: NULL and INVALID_HANDLE_VALUE. Icr which one(s) are used for file, socket, and other handles we care about, this will require extra care to get right, at the minimum the types should be explicitly documented which values they have for niches since that could be a major foot-gun.

"icr"?

"icr"?

I can't remember.

Copy link

Member

Author

sunfishcode commented 3 days ago

Feedback I've gotten so far on the new OwnedFd and BorrowedFd has been positive, and I've now ported a few codebases to use the new prototype, and it's working out well enough that I now think that this is the better design, so I've revised the RFC here to propose this new API instead of IoSafe. Of course, feedback is still welcome on either approach, and I've left a mention of IoSafe in the Alternatives section.

These two types are conceptual replacements for `RawFd`, and represent owned

and borrowed handle values. `OwnedFd` owns a file descriptor, including closing

it when it's dropped. `BorrowedFd`'s lifetime parameter ties it to the lifetime

of something that owns a file descriptor. These types enforce all of their I/O

RalfJung 3 days ago

edited

Member

I would phrase it as

BorrowedFd's lifetime parameter says for how long access to this file descriptor has been borrowed.

Just like with &'a T, this lifetime may be an under-approximation of the actual lifetime of the underlying object, and that's okay. We often borrow things for less than the full lifetime of an object. For example, the owner might want exclusive access back and then do more things with it.

Do you consider these two formulations to be equivalent? They sound non-equivalent to me, so I wonder what lead you to your choice.

sunfishcode 3 days ago

edited

Author

Member

You're right. I was just thinking that it's obvious that a user could always drop a BorrowedFd earlier than the end of the lifetime, but it's better to clarify. I've now updated it to use your proposed wording.

- A `as_filelike_view::<T>()` function returns a `View`, which contains a

temporary `ManuallyDrop` instance of T constructed from the contained

file descriptor, allowing users to "view" a raw file descriptor as a

`File`, `TcpStream`, and so on.

RalfJung 3 days ago

Member

Do you propose to add this to the standard library, or is this just mentioned as a possible use of these APIs?

sunfishcode 3 days ago

edited

Author

Member

I propose adding this to the standard library. Experience with it in unsafe-io and feedback in this RFC is that it's a relatively common thing to want to do with file descriptors, and it's a safe API that requires unsafe to implement.

That said, it isn't essential, and could also be provided by an extension trait in a crate.

safety invariants automatically.

it when it's dropped. `BorrowedFd`'s lifetime parameter says for how long

access to this file descriptor has been borrowed. These types enforce all of

their I/O safety invariants automatically.

RalfJung 3 days ago

Member

Thanks!

With this change, I think I am wondering why the lifetime parameter is called 'owned?

sunfishcode 3 days ago

Author

Member

I read 'owned as saying "assume there's an OwnedFd (or equivalent) that this BorrowedFd can't outlive". I'm not attached to it, and recognize that that may be specific to how I think about lifetimes. I'd be ok with 'a or another name here.

workingjubilee 2 days ago

edited

Contributor

Simply 'file or 'fd might be fine, because BorrowedFd's reference back to that OwnedFd is trying to represent the "inherent" lifetime of the file descriptor as a constraint.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK