library: Move `CStr` to libcore, and `CString` to liballoc by petrochenkov · Pul...
source link: https://github.com/rust-lang/rust/pull/94079
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
New issue
library: Move CStr
to libcore, and CString
to liballoc
#94079
Conversation
Closes #46736
Interesting points:
- Stability:
- To make
CStr(ing)
from libcore/liballoc unusable without enabling features I had to make these structures unstable, and reexport them from libstd using stable type aliases instead ofpub use
reexports. (Because stability ofuse
items is not checked.)
- To make
- Relying on target ABI in libcore is ok:
trait CStrExt
(UPDATE: used only incfg(bootstrap)
mode, otherwise lang items are used instead)strlen
Otherwise it's just a code move + some minor hackery usual for liballoc in cfg(test)
mode.
r? @yaahc
(rust-highfive has picked a reviewer for you, use r? to override)
added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label
added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
labels
added T-libs Relevant to the library team, which will review and decide on the PR/issue.
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
labels
This comment has been hidden.
This comment has been hidden.
This is great! I'd love to see this move, to make it easier to interface with C FFI in contexts that don't necessarily want std.
Platform strlen
implementations typically have extensive optimizations, and I don't think we'd want to miss out on those. It might make sense to ensure we have a weak strlen in compiler-builtins or similar. (cc @Amanieu)
This comment has been minimized.
Relying on target ABI in libcore is ok
A lot of attention in #46736 is put on whether depending on c_char
in libcore is an acceptable thing or not.
I don't think the self-imposed restriction of not doing that is reasonable.
Obviously, libcore cannot depend on calling OS services, be it syscalls or functions or system libraries.
But why cannot it rely on the target's ABI (which includes sizes/alignments/signedness of basic C types)? It doesn't introduce any new dependencies to libcore.
Even without libcore the compiler cares about the target's ABI and keeps the relevant data in target specifications (inside rustc and LLVM).
It must know these things to generate layouts for repr(C)
data structures, and calling conventions for generating extern "C"
functions.
rustc's target specs, for example, have fields like c_int_width
and c_enum_min_bits
, even if we don't count the LLVM part.
So libcores build for different targets are already different, because the compiler uses that ABI knowledge when generating code.
If the compiler can do it, then why a higher layer like libcore cannot?
Recently core contains more of target-dependent stuff than back in 2017 when #46736 was created, because it's simply too useful to break that restriction sometimes.
A bunch of code in core::arch
depends on the target's CPU architecture, and the va_list
implementation in core::ffi
requires relying on even more ABI details.
I personally don't see issues with putting that cfg_if
that defines c_char
into libcore, but perhaps people will feel better about this if the char signedness is moved to the compiler's target specs (compiler\rustc_target\src\spec\*.rs
)?
Then the c_char
definition will look roughly like
#[cfg(target_c_char_signed)] type c_char = i8; #[cfg(not(target_c_char_signed))] type c_char = u8;
Platform
strlen
implementations typically have extensive optimizations, and I don't think we'd want to miss out on those. It might make sense to ensure we have a weak strlen in compiler-builtins or similar. (cc @Amanieu)
I am generally in favor of depending on more libc symbols in core (strlen
, memchr
, bzero
, etc) while providing fallbacks in compiler-builtins for platforms that don't have that symbol.
This comment has been hidden.
trait CStrExt
Two inherent methods on CStr
(into_c_string
and to_string_lossy
) have signatures including types unavailable in libcore (CString
, Cow
and Box
).
To keep backward compatibility I had to add an unstable trait CStrExt
with these two stable methods, and add it to prelude.
That trait is only implemented for CStr
.
strlen
CStr(ing)
needs to calculate lengths of zero-terminated byte arrays.
While this functionality is trivial, its implementation can also be optimized by using CPU architecture specific techniques.
Such optimized implementations usually live in platform's libc, this is where the current libstd takes them from.
libcore doesn't have access to platform's libc, so it has to either
- Use the naive trivial implementation of
strlen
, or - Provide the optimized version of
strlen
, similarly to how it does withmemchr
, or - Use some scheme with weak symbols as suggested above, with weak naive implementation being provided by libcore by default, but then overridable by any other linked libraries, including platform's libc.
added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label).
I-libs-api-nominated Indicates that an issue has been nominated for discussion during a libs-api team meeting.
I-libs-nominated Indicates that an issue has been nominated for discussion during a libs team meeting.
and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
labels
removed the I-libs-nominated Indicates that an issue has been nominated for discussion during a libs team meeting. label
We discussed this in today's @rust-lang/libs-api team meeting. Many people are enthusiastic to see this work happen.
Regarding relying on target ABI, there were zero objections.
Regarding strlen, there was a consensus in favor of the general approach @Amanieu suggested, of going ahead and using platform C library functions in core as long as they're functions that we can fall back to compiler-builtins for, so that we can use the optimized implementations from the C library if available.
Regarding CStr functions that return types only available in alloc or std: we had some discussion about whether we should use an extension trait for this, or use a lang item or some similar mechanism to allow extending CStr
from alloc
and std
. Our primary concern there was that an extension trait may technically be a very slightly breaking change, if someone had a different trait with the same method names; extension trait vs inherent method has different precedence. Using lang items to extend the set of inherent methods from alloc wouldn't have that issue.
Ultimately, though, we decided that we'd rather not add lang items for this, and we'd rather use an extension trait as long as that won't break anything in practice. So, summary: let's do a crater run to confirm that the extension trait doesn't break anything, and then go with the extension trait.
we had some discussion about whether we should use an extension trait for this, or use a lang item or some similar mechanism to allow extending
CStr
fromalloc
and `std
When implementing this I recalled that we had such extension traits before, but I couldn't find any of them in libcore now.
Apparently all of them were converted to lang items in 2018 (the main PR was #49896), so the existing policy seems to be preferring lang items for this.
This CStr(ing)
move would require a single new lang item in liballoc.
@petrochenkov A lang item seems fine as well. And (EDIT: corrected) we don't need a crater run in that case.
We discussed this again in today's @rust-lang/libs meeting. We're fine with using a lang item in this case, since that's what we already do in other cases.
In general, we'd like to steer away in the future from having any inherent methods on types in core that return types from alloc or std (or inherent methods on types in alloc that return types from std, if any). Such methods make the binding between those two types more special, and make it a little harder to write drop-in replacements for types like CString/String/etc. So, for instance, in this case we should ideally steer people towards into()
rather than into_c_string()
, so that people can just impl From<CStr> for MyCStringType
.
But none of that is a blocker for this move; this method can go ahead and use a lang item as other methods do, and we'll have the more general discussion in the future.
added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label).
I-libs-api-nominated Indicates that an issue has been nominated for discussion during a libs-api team meeting.
labels
Commit 6eaec56 has been approved by joshtriplett
added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
labels
This comment has been hidden.
added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
labels
This comment has been hidden.
added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
labels
@bors r=joshtriplett
Commit f62c84e has been approved by joshtriplett
added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
labels
Test successful - checks-actions
Approved by: joshtriplett
Pushing 1e6fe58 to master...
Finished benchmarking commit (1e6fe58): comparison url.
Summary:
- Primary benchmarks: relevant regression found
- Secondary benchmarks: mixed results
Regressions
(primary)
Regressions
(secondary)
Improvements
(primary)
Improvements
(secondary)
All
(primary)
count1 1 3 0 2 1
mean2 0.8% 0.9% N/A -0.6% 0.8%
max 0.8% 1.2% N/A -0.7% 0.8%
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.
Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged
along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged
label to this PR.
@rustbot label: +perf-regression
Footnotes
-
number of relevant changes
-
the arithmetic mean of the percent change
Can we use memchr
instead of strlen?
Do we really need strlen in core? because strlen
belongs libc
@lygstate
This PR implements the strategy from #94079 (comment).
If libc is linked then strlen
is used from there (this is typically a fast platform-provided implementation), otherwise it's used from compiler-builtins (this is a naive implementation).
If you have issues with this setup, could you open a new issue and describe your case?
Visiting for weekly performance triage
- The report above claims this caused a 0.8% regression on primary benchmark unicode-normalization-0.1.19 (check full).
- I was not able to reproduce the regression locally; all my runs showed an improvement.
- This may be running afoul of rust-lang/rustc-perf#1299; so I'm going to chalk this up to noise/undeterminism, rather than a proper regression.
@rustbot label: +perf-regression-triaged
added the perf-regression-triaged The performance regression has been triaged. label
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
merged-by-bors This PR was explicitly merged by bors perf-regression Performance regressions perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
None yet
Successfully merging this pull request may close these issues.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK