6

library: Move `CStr` to libcore, and `CString` to liballoc by petrochenkov · Pul...

 2 years ago
source link: https://github.com/rust-lang/rust/pull/94079
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

New issue

library: Move CStr to libcore, and CString to liballoc #94079

Conversation

Copy link

Contributor

@petrochenkov petrochenkov commented on Feb 17

edited

Closes #46736

Interesting points:

  • Stability:
    • To make CStr(ing) from libcore/liballoc unusable without enabling features I had to make these structures unstable, and reexport them from libstd using stable type aliases instead of pub use reexports. (Because stability of use items is not checked.)
  • Relying on target ABI in libcore is ok:
  • trait CStrExt (UPDATE: used only in cfg(bootstrap) mode, otherwise lang items are used instead)
  • strlen

Otherwise it's just a code move + some minor hackery usual for liballoc in cfg(test) mode.

joshtriplett, GrayJack, and chrysn reacted with thumbs up emoji joshtriplett, hkratz, mati865, paolobarbolini, Kobzol, ayrtonm, memoryruins, GrayJack, and DianaNites reacted with hooray emoji memoryruins, stevemk14ebr, Animeshz, panaman67, mati865, Kobzol, GrayJack, and DianaNites reacted with heart emoji

Copy link

Collaborator

rust-highfive commented on Feb 17

r? @yaahc

(rust-highfive has picked a reviewer for you, use r? to override)

rust-highfive

added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label

on Feb 17

petrochenkov

added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.

and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

labels

on Feb 17

petrochenkov

added T-libs Relevant to the library team, which will review and decide on the PR/issue.

T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

labels

on Feb 17

This comment has been hidden.

This comment has been hidden.

Copy link

Member

joshtriplett commented on Feb 17

This is great! I'd love to see this move, to make it easier to interface with C FFI in contexts that don't necessarily want std.

Platform strlen implementations typically have extensive optimizations, and I don't think we'd want to miss out on those. It might make sense to ensure we have a weak strlen in compiler-builtins or similar. (cc @Amanieu)

This comment has been minimized.

Copy link

Contributor

Author

petrochenkov commented on Feb 18

Relying on target ABI in libcore is ok

A lot of attention in #46736 is put on whether depending on c_char in libcore is an acceptable thing or not.
I don't think the self-imposed restriction of not doing that is reasonable.

Obviously, libcore cannot depend on calling OS services, be it syscalls or functions or system libraries.
But why cannot it rely on the target's ABI (which includes sizes/alignments/signedness of basic C types)? It doesn't introduce any new dependencies to libcore.

Even without libcore the compiler cares about the target's ABI and keeps the relevant data in target specifications (inside rustc and LLVM).
It must know these things to generate layouts for repr(C) data structures, and calling conventions for generating extern "C" functions.
rustc's target specs, for example, have fields like c_int_width and c_enum_min_bits, even if we don't count the LLVM part.
So libcores build for different targets are already different, because the compiler uses that ABI knowledge when generating code.
If the compiler can do it, then why a higher layer like libcore cannot?

Recently core contains more of target-dependent stuff than back in 2017 when #46736 was created, because it's simply too useful to break that restriction sometimes.
A bunch of code in core::arch depends on the target's CPU architecture, and the va_list implementation in core::ffi requires relying on even more ABI details.

I personally don't see issues with putting that cfg_if that defines c_char into libcore, but perhaps people will feel better about this if the char signedness is moved to the compiler's target specs (compiler\rustc_target\src\spec\*.rs)?
Then the c_char definition will look roughly like

#[cfg(target_c_char_signed)]
type c_char = i8;
#[cfg(not(target_c_char_signed))]
type c_char = u8;
joshtriplett and stevemk14ebr reacted with thumbs up emoji

Copy link

Contributor

Amanieu commented on Feb 18

Platform strlen implementations typically have extensive optimizations, and I don't think we'd want to miss out on those. It might make sense to ensure we have a weak strlen in compiler-builtins or similar. (cc @Amanieu)

I am generally in favor of depending on more libc symbols in core (strlen, memchr, bzero, etc) while providing fallbacks in compiler-builtins for platforms that don't have that symbol.

joshtriplett reacted with thumbs up emoji

This comment has been hidden.

Copy link

Contributor

Author

petrochenkov commented on Feb 22

trait CStrExt

Two inherent methods on CStr (into_c_string and to_string_lossy) have signatures including types unavailable in libcore (CString, Cow and Box).

To keep backward compatibility I had to add an unstable trait CStrExt with these two stable methods, and add it to prelude.
That trait is only implemented for CStr.

strlen

CStr(ing) needs to calculate lengths of zero-terminated byte arrays.
While this functionality is trivial, its implementation can also be optimized by using CPU architecture specific techniques.
Such optimized implementations usually live in platform's libc, this is where the current libstd takes them from.

libcore doesn't have access to platform's libc, so it has to either

  • Use the naive trivial implementation of strlen, or
  • Provide the optimized version of strlen, similarly to how it does with memchr, or
  • Use some scheme with weak symbols as suggested above, with weak naive implementation being provided by libcore by default, but then overridable by any other linked libraries, including platform's libc.

petrochenkov

added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label).

I-libs-api-nominated Indicates that an issue has been nominated for discussion during a libs-api team meeting.

I-libs-nominated Indicates that an issue has been nominated for discussion during a libs team meeting.

and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.

labels

on Feb 22

m-ou-se

removed the I-libs-nominated Indicates that an issue has been nominated for discussion during a libs team meeting. label

on Feb 23

Copy link

Member

joshtriplett commented on Feb 23

We discussed this in today's @rust-lang/libs-api team meeting. Many people are enthusiastic to see this work happen.

Regarding relying on target ABI, there were zero objections.

Regarding strlen, there was a consensus in favor of the general approach @Amanieu suggested, of going ahead and using platform C library functions in core as long as they're functions that we can fall back to compiler-builtins for, so that we can use the optimized implementations from the C library if available.

Regarding CStr functions that return types only available in alloc or std: we had some discussion about whether we should use an extension trait for this, or use a lang item or some similar mechanism to allow extending CStr from alloc and std. Our primary concern there was that an extension trait may technically be a very slightly breaking change, if someone had a different trait with the same method names; extension trait vs inherent method has different precedence. Using lang items to extend the set of inherent methods from alloc wouldn't have that issue.

Ultimately, though, we decided that we'd rather not add lang items for this, and we'd rather use an extension trait as long as that won't break anything in practice. So, summary: let's do a crater run to confirm that the extension trait doesn't break anything, and then go with the extension trait.

Copy link

Contributor

Author

petrochenkov commented on Feb 24

we had some discussion about whether we should use an extension trait for this, or use a lang item or some similar mechanism to allow extending CStr from alloc and `std

When implementing this I recalled that we had such extension traits before, but I couldn't find any of them in libcore now.
Apparently all of them were converted to lang items in 2018 (the main PR was #49896), so the existing policy seems to be preferring lang items for this.

This CStr(ing) move would require a single new lang item in liballoc.

yaahc and joshtriplett reacted with thumbs up emoji

Copy link

Member

joshtriplett commented on Mar 2

edited

@petrochenkov A lang item seems fine as well. And (EDIT: corrected) we don't need a crater run in that case.

Copy link

Contributor

bstrie commented on Mar 2

Note that #94503 has just moved c_char (among others) to core::ffi.

Copy link

Member

joshtriplett commented on Mar 2

We discussed this again in today's @rust-lang/libs meeting. We're fine with using a lang item in this case, since that's what we already do in other cases.

In general, we'd like to steer away in the future from having any inherent methods on types in core that return types from alloc or std (or inherent methods on types in alloc that return types from std, if any). Such methods make the binding between those two types more special, and make it a little harder to write drop-in replacements for types like CString/String/etc. So, for instance, in this case we should ideally steer people towards into() rather than into_c_string(), so that people can just impl From<CStr> for MyCStringType.

But none of that is a blocker for this move; this method can go ahead and use a lang item as other methods do, and we'll have the more general discussion in the future.

m-ou-se reacted with thumbs up emoji

m-ou-se

added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.

and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label).

I-libs-api-nominated Indicates that an issue has been nominated for discussion during a libs-api team meeting.

labels

on Mar 9

Copy link

Contributor

bors commented 8 days ago

pushpin Commit 6eaec56 has been approved by joshtriplett

bors

added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

labels

8 days ago

Copy link

Contributor

bors commented 8 days ago

hourglass Testing commit 6eaec56 with merge f50f07b...

This comment has been hidden.

bors

added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

labels

8 days ago

This comment has been hidden.

petrochenkov

added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.

and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

labels

7 days ago

Copy link

Contributor

Author

petrochenkov commented 7 days ago

@bors r=joshtriplett

Copy link

Contributor

bors commented 7 days ago

pushpin Commit f62c84e has been approved by joshtriplett

bors

added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.

labels

7 days ago

Copy link

Contributor

bors commented 7 days ago

hourglass Testing commit f62c84e with merge 1e6fe58...

Copy link

Contributor

bors commented 7 days ago

sunny Test successful - checks-actions
Approved by: joshtriplett
Pushing 1e6fe58 to master...

petrochenkov, joshtriplett, and memoryruins reacted with hooray emoji

Copy link

Collaborator

rust-timer commented 7 days ago

Finished benchmarking commit (1e6fe58): comparison url.

Summary:

  • Primary benchmarks: crying_cat_face relevant regression found
  • Secondary benchmarks: mixed results

Regressions crying_cat_face
(primary) Regressions crying_cat_face
(secondary) Improvements tada
(primary) Improvements tada
(secondary) All crying_cat_facetada
(primary)

count1 1 3 0 2 1

mean2 0.8% 0.9% N/A -0.6% 0.8%

max 0.8% 1.2% N/A -0.7% 0.8%

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

Footnotes

  1. number of relevant changes

  2. the arithmetic mean of the percent change

rustbot

added the perf-regression Performance regressions label

7 days ago

lygstate

Copy link

Contributor

@lygstate lygstate commented on 5bee741 6 days ago

Can we use memchr instead of strlen?
Do we really need strlen in core? because strlen belongs libc

Copy link

Contributor

Author

petrochenkov commented 6 days ago

@lygstate
This PR implements the strategy from #94079 (comment).
If libc is linked then strlen is used from there (this is typically a fast platform-provided implementation), otherwise it's used from compiler-builtins (this is a naive implementation).

If you have issues with this setup, could you open a new issue and describe your case?

Copy link

Member

pnkfelix commented 2 days ago

Visiting for weekly performance triage

  • The report above claims this caused a 0.8% regression on primary benchmark unicode-normalization-0.1.19 (check full).
  • I was not able to reproduce the regression locally; all my runs showed an improvement.
  • This may be running afoul of rust-lang/rustc-perf#1299; so I'm going to chalk this up to noise/undeterminism, rather than a proper regression.

@rustbot label: +perf-regression-triaged

rustbot

added the perf-regression-triaged The performance regression has been triaged. label

2 days ago

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Assignees

yaahc

Labels

merged-by-bors This PR was explicitly merged by bors perf-regression Performance regressions perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Projects

None yet

Milestone

1.62.0

Development

Successfully merging this pull request may close these issues.

13 participants

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK