7

Tracking Issue for pointer_bytes_offsets · Issue #96283 · rust-lang/rust · GitHu...

 11 months ago
source link: https://github.com/rust-lang/rust/issues/96283
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Tracking Issue for pointer_bytes_offsets #96283

2 of 4 tasks

Tracked by

#1568

Gankra opened this issue Apr 21, 2022 · 23 comments

· May be fixed by #116205

2 of 4 tasks

Tracked by

#1568

Tracking Issue for pointer_bytes_offsets #96283

Gankra opened this issue Apr 21, 2022 · 23 comments

· May be fixed by #116205

Comments

Contributor

Gankra

commented

Apr 21, 2022

edited by WaffleLapkin

Feature gates:

#![feature(pointer_byte_offsets)]

#![feature(const_pointer_byte_offsets)]

This is a tracking issue for the pointer_byte_offsets raw pointer conveniences like ptr.byte_add(offset)

Public API

impl<T: ?Sized> *const T {
    // feature gates `pointer_byte_offsets` and `const_pointer_byte_offsets`
    pub const unsafe fn byte_offset(self, count: isize) -> Self;
    pub const unsafe fn byte_add(self, count: usize) -> Self;
    pub const unsafe fn byte_sub(self, count: usize) -> Self;

    pub const fn wrapping_byte_offset(self, count: isize) -> Self;
    pub const fn wrapping_byte_add(self, count: usize) -> Self;
    pub const fn wrapping_byte_sub(self, count: usize) -> Self;

    pub const unsafe fn byte_offset_from<U: ?Sized>(self, origin: *const U) -> isize;
}

// ... and the same for` *mut T`

Steps / History

Unresolved Questions

  • Should these operations actually accomodate DSTs? This seems deeply semantically dubious/broken but idk, you can plausibly use them right if you are extremely careful and understand the implications of stacked borrows for slices/projections.
ajwerner, xSetech, goffrie, and marc0246 reacted with thumbs up emojiKolsky reacted with hooray emojiDCNick3 reacted with eyes emoji

Gankra

added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature.

labels

Apr 21, 2022

Contributor

Author

To clarify the open question's concerns a bit: if you make a ref-slice and then cast it to a raw-slice, offsetting the raw pointer will not change the slice's len, but you can't actually turn the offset slice back into a ref, because (at least under the strictest rules) the original ref-slice has provenance over only its exact range, and so derived pointers cannot access outside the range. Creating the offset ref-slice is trying to assert provenance over additional bytes you don't have permission to access.

Similarly, trait-object-refs will make you sad if you do that.

This can only be soundly used (meaningfully) if you from_raw_parts the raw-DST from a raw pointer with provenance over the whole "array" you want to stride through.

So like, in general you need to write something like (pseudocode of the general pattern):

let array = [...];

// Steal metadata from an example or make the metadata from whole cloth
let dst = &array[something];
let metadata = get_metadata(dst);

// Now build a raw DST from a pointer to the whole array
let raw_array_ptr = array.as_ptr();
let raw_dst = ptr::from_raw_parts(raw_array_ptr, metadata);

// Yay this actually can be strided through the whole array now
let next_dst = raw_dst.byte_add(offset_to_next_dst);

(CC @RalfJung since this is an interesting stacked borrows example, no need to reply if there's nothing more to say.)

RalfJung, HTGAzureX1212, and WaffleLapkin reacted with thumbs up emoji

Member

@Gankra actually, I think allowing these methods take pointers to DST unlocks a lot of abilities for custom DSTs. In particular byte_sub is required to implement Arc::from_raw (as it accepts a possibly fat pointer to T that needs to be offseted back to ArcInner<T>).

Member

I want to propose to stabilize these methods! (can someone open an fcp?)

Stabilization report

Implementation History

  • Initially these methods were implemented in Add convenience byte offset/check align functions to pointers #95643
  • The only unresolved question raised since then is whatever to allow calling these methods on pointers to unsized types. I propose to resolve this to "yes, we should actually", this will unlock a lot of options for working with unsized types (see expitience report for an example).

API Summary

impl *const T {
    pub const unsafe fn byte_offset(self, count: isize) -> Self;
    pub const unsafe fn byte_add(self, count: usize) -> Self;
    pub const unsafe fn byte_sub(self, count: usize) -> Self;

    pub const fn wrapping_byte_offset(self, count: isize) -> Self;
    pub const fn wrapping_byte_add(self, count: usize) -> Self;
    pub const fn wrapping_byte_sub(self, count: usize) -> Self;

    pub const unsafe fn byte_offset_from<U: ?Sized>(self, origin: *const U) -> isize;
}

// ... and the same for` *mut T`

Experience Report

#99113, #100819 and #100030 refactored some std code into using these methods simplifying the code.

These methods allow custom Arc implementations to work with unsized types just as well as std::arc::Arc, namely they allow to offset a pointer by <size of arc metadata> while keeping pointer metadata, which is required for example for Arc::from_raw.

This also allows to polyfil with_metadata_of by using wrapping_byte_offset, which is again very useful when working with DSTs.

Member

Pondering: does byte_offset_from really need its argument to be a *const T? Arguably it doesn't care what type you pass it, so it could be *const U instead (or *const impl Sized, I suppose).

That would allow things like ptr::addr_of!(x.b).byte_offset_from(ptr::addr_of!(x)) to compile without needing an arbitrary choice of cast.

EDIT: this was updated in #103489

Contributor

I have a question regarding the offset_from family of functions, stabilized or not. In all those cases where it is safe to use, why not use as integer casts, ignoring the fact that conversion from bytes to units of T might be required?

I just had the situation today, where I needed the offset between 2 bytes slices (one is a subslice of the other) and failed to see why I should use the unsafe offset_from. In either case, I have to be able to reason about why my code is correct, but not having to reason about why my code doesn't invoke UB seemed like a +1 for integer casts. Obviously this is just one example, and certainly not "all those cases".

But I'm wondering, what are safe examples in which I should use offset_from as opposed to integer casts?

Member

offset_from (and particularly sub_ptr) allows LLVM to optimize more based on the unsafety. If your code is not a place where you get a material advantage from those extra optimizations, then by all means use the safe phrasings. (sub_ptr is very important for slice::Iter::len, but there's very little code that's anywhere close to as perf-critical as that.)

Member

Another advantage of offset_from is that it can be used in const fn.

For some time we had safe variants of offset_from, maybe it is worth thinking about bringing them back.

fbq

pushed a commit to Rust-for-Linux/linux that referenced this issue

Aug 29, 2023

fbq

pushed a commit to Rust-for-Linux/linux that referenced this issue

Sep 15, 2023

What is the status of byte_offset_from() will it be stable anytime soon

WaffleLapkin

added the I-libs-api-nominated Indicates that an issue has been nominated for discussion during a libs-api team meeting. label

Sep 18, 2023

Member

I'm nominating this for t-libs-api discussion. I propose to stabilize those APIs, see the stabilization report: #96283 (comment).

(it looks like I forgot to make sure t-libs-api sees the report, oops 💀)

Member

@rfcbot merge

WaffleLapkin and marc0246 reacted with heart emoji

rfcbot

commented

Sep 26, 2023

edited by BurntSushi

Team member @m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot

added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised.

and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off.

labels

Sep 26, 2023

🔔 This is now entering its final comment period, as per the review above. 🔔

WaffleLapkin and marc0246 reacted with hooray emoji

m-ou-se

removed the I-libs-api-nominated Indicates that an issue has been nominated for discussion during a libs-api team meeting. label

Oct 3, 2023

Contributor

Very late thought - has reversing the names byte_add -> add_bytes, byte_offset -> offset_bytes been considered?

I think that would make sorting in the docs nicer, and make it generally more discoverable. E.g. you know you want to add something - either you scan the docs and see all the adding things there, or type add in an IDE and stumble across the suggestions to add bytes.

As-is, you have to be aware that the byte_x methods exist, they don't even fit on the same page as offset and sub in the TOC

272406967-696c5e7b-4c2a-4a3d-a9f6-5aba83b11e37.png
scottmcm reacted with heart emoji

Member

Discoverability should get better with stabilization. We don't mention unstable things from the docs for stable things, but as part of stabilization we can add something like this to add:

This counts in terms of elements. If you want to move the pointer by a number of bytes instead, see byte_add.

to help people find it.

Member

I have to always try a bunch of permutations when calling the "bytes" methods, I can never remember where the byte part goes.

Contributor

The same argument could be made for wrapping_add, calling it add_wrapping instead. Seeing as everything else puts add to the right (think also the methods on integers) calling it add_bytes would be inconsistent, which would make me very sad.

Contributor

The same argument could be made for wrapping_add, calling it add_wrapping instead. Seeing as everything else puts add to the right (think also the methods on integers) calling it add_bytes would be inconsistent, which would make me very sad.

I think that saturating/wrapping/checked arithmetic are actually the exception and not the standard, almost everything else follows action->refinement naming. For ptr there are copy_from/copy_to/copy_from_nonoverlapping, offset/offset_from/offset_to, read/read_unaligned/read_volatile (and write), sub/sub_ptr (unstable and naming TBD), integers have div/div_euclid, add/add_assign/add_with_overflow, str has split/split_once/split_inclusive/split_terminator...

Not that that's any strong justification, but the names here still seem a bit backwards to me. offset_bytes, add_bytes, wrapping_offset_bytes and wrapping_add_bytes would fit into the patterns better (not e.g. offset_bytes_wrapping, though I wish that's the way wrapping and friends worked too)

But yeah... it's late 🙂

Contributor

We are in agreement. I would have liked the modifier_operation to be operation_modifier as well, as that would be more consistent with other parts of the library. Nevertheless this is now a historical artifact, and doesn't change the fact that add_bytes would be less consistent in the context of arithmetic/bitwise operations.

hYdos reacted with thumbs up emoji

rfcbot

added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting

and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised.

labels

Oct 6, 2023

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

Member

We don't need to restart an FCP clock for this but I'd still like to be sure the team has seen the important discussion that took place after the 3 currently checked boxes. Thank you for raising this @tgross35. @rust-lang/libs-api: please have a look at #96283 (comment) which is an insightful breakdown.

I think the current set of names (see top of this issue) continue to be my first choice for word order.

@rfcbot poll libs-api Proceed with current word order?

scottmcm and tgross35 reacted with heart emoji

rfcbot

commented

Oct 11, 2023

edited by BurntSushi

Team member @dtolnay has asked teams: T-libs-api, for consensus on:

Proceed with current word order?

apiraino

removed the to-announce Announce this issue on triage meeting label

Oct 12, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Assignees

No one assigned

Labels
C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects

None yet

Milestone

No milestone

12 participants

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK