Experimental feature gate for `extern "crabi"` ABI · Issue #631 · rust...
source link: https://github.com/rust-lang/compiler-team/issues/631
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Experimental feature gate for `extern "crabi"` ABI · Issue #631 · rust-lang/compiler-team · GitHub
Note: This is a fairly long proposal, halfway to an eRFC, largely to lay out the requirements and alternatives. It's been reviewed and approved by T-lang from a language experiment point of view, but not yet reviewed or approved by T-compiler. This MCP seeks approval for an experimental feature gate supporting experimentation on an extern "crabi"
and repr(crabi)
ABI.
There will need to be a full RFC before this can be stabilized (or marked as a non-experimental feature gate), and that full RFC will include the ABI spec. This proposal allows for experimentation to design and test that ABI spec.
Summary
This experimental feature gate proposal proposes developing a new ABI,extern "crabi"
, and a new in-memory representation, repr(crabi)
, for
interoperability across high-level programming languages that have safe data
types.
This will use the feature gate crabi
, which will be marked as experimental
until a subsequent RFC provides a precise definition of crABI.
This work was previously discussed under the names "safe ABI" and "interop
ABI", but was renamed to "crabi" to avoid misleadingly broad implications of
"safe" or "interop".
Motivation
Today, developers building projects incorporating multiple languages, or
calling a library written in one language from another, often have to use the C
ABI as a lowest-common-denominator for cross-language function calls. As a
result, such cross-language calls use unsafe C representations, even for types
that both languages understand. For instance, passing a string from Rust to
another high-level language will typically use an unsafe C char *
, even if
both languages have a safe type for counted UTF-8 strings.
For popular pairs of languages, developers sometimes create higher-level
binding layers for combining those languages. However, the creation of such
binding layers requires one-off effort between every pair of programming
languages. Such binding layers also add work and overhead to the project for
each pair of languages, and may not play well together when using more than one
in the same project.
Furthermore, higher-level data types such as Option
and Result
currently
require translation into C-ABI-compatible types, which discourages the use of
such types in cross-language interfaces, and encourages the use of more complex
and less safe encodings (e.g. manually encoding Option
via an invalid value
of a parameter).
Finally, system libraries and other shared libraries typically use the C ABI
as well. Software making a Linux .so
, Windows DLL, or macOS dylib
, will
typically expose a C-compatible ABI, and cannot easily provide a higher-level
safe ABI without shipping language-specific high-level bindings.
crABI will define a standard way to make calls across high-level languages,
passing high-level data types, without dropping to the lowest common
denominator of C. crABI will work with any language providing a C-compatible
FFI (including C itself), and languages can also add specific higher-level
native support for crABI.
crABI aims to be a reasonable default for compiled libraries in both static and
dynamic form, including system libraries.
Requirements
The crABI experiment will include a new ABI, extern "crabi"
, and a new
in-memory representation, repr(crabi)
.
The crABI support for Rust will be a strict superset of the C ABI support for
Rust. This ensures that, for functionality not yet supported by crABI, users
still have the option of using their own translations to the raw C ABI, while
still using crABI for what it does support.
crABI will be defined via "lowering" to the C ABI: crABI will define how to
pass or return types not supported by C, by defining how to translate them to
types and structures supported by C. This allows any language with C FFI
support to also call functions using crABI, without requiring special language
support. However, languages may still wish to add higher-level support for
crABI, to avoid having to write a translation layer for their own native types.
To the extent crABI supports passing ownership (e.g. strings), it must also
specify how to reclaim the associated memory. (However, future support for
objects or traits may require invoking a destructor instead.)
crABI could define a symbol naming scheme, to allow identifying symbols that
use crABI. However, crABI must be compatible with languages that only support C
FFI and do not have native crABI support, and which must thus reference the
symbol via its name; therefore, crABI should not have a complex or non-obvious
mangling scheme.
crABI should include a versioning scheme, to allow for future compatible
extensibility. crABI version 1 will handle many simple cases of widespread
interest. More complex cases, such as trait objects, or arbitrary objects with
methods, will get deferred to future versions. The versioning scheme will allow
for both compatible and incompatible changes; changes to crABI will strive to
remain compatible with previous versions when not using functionality
unsupported by those previous versions.
Rust will support defining functions using crABI, and calling
crABI functions defined elsewhere. Rust will support compiling both
static and dynamic libraries that export crABI symbols.
Rust should also support passing around function pointers to functions that use
crABI.
Non-requirements
crABI does not aim to support the full richness of Rust's type system, or that
of other languages. It aims to support common cases more safely and simply.
In particular, while crABI will over time support an increasing subset of Rust
features, and specific types from the standard library will become available as
the necessary features to support them do, crABI does not aim to support the
entire Rust standard library.
crABI will not aim to support complex lifetime handling, or to fully solve
problems related to describing pointer lifetimes across different languages.
crABI may provide limited support for some subsets of this, such as "this
pointer is only valid for the duration of this call and must not be retained",
or "this pointer transfers ownership to the callee, and the caller must not
retain it".
crABI (at least in the first version) will not provide an interface description
language (IDL), in either source or compiled form; function symbols using crABI
will not provide function signature information in compiled objects. A future
version of crABI may generate and provide machine-readable interface
descriptions.
crABI does not aim to provide "translations" between the most native
representations of different languages. For instance, though different
languages may store strings in different fashions, crABI string types will have
a specific representation in memory and a specific lowering to C function
parameters/results. Languages whose native string representation does not match
crABI string representation may need to translate, or may need to treat the
crABI string object as a distinct data type and provide distinct mechanisms for
working with it. (By contrast, WebAssembly Interface Types (WIT) aims to
provide such translations in an efficient fashion, by generating translation
code as needed between formats.)
crABI cannot support arbitrary compile-time generic functions; generics will
require the use of opaque objects, trait objects, or similar. A future version
could support exporting specific instantiations of generics. (However, crABI
will support enough of generics to allow types like Option<u64>
orResult<u64, ConcreteError>
or [u8; 16]
or [u8]
to work, such as by
supporting their use with concrete types as long as no generic parameters
remain unbound in the final function signature.)
crABI cannot prevent callers from passing parameters that violate the
specification, and does not claim to. More generally, crABI does not provide
sandboxing or similar functionality that would be required to interoperate with
untrusted code.
The initial version of crABI will likely not attempt to standardize destructors
or memory reclamation, though future versions may. Users of crABI will still
need to provide and use xyz_free
functions to delegate object destruction and
reclamation back to the code that provided the object.
Potential functionality
This section includes some potential examples of types crABI could support.
Some of these will appear in the first version of crABI; many will get deferred
to a future version.
- Tuples, of arbitrary size.
- The "unit" type
()
. - enums, including enum variants containing fields.
- More specifically,
Option
andResult
.
- More specifically,
- Counted UTF-8 strings, (with no guarantee of a NUL terminator).
- A Unicode scalar value (Rust
char
). - Filesystem paths, or other operating-system strings.
- Arrays, with a compile-time-known size.
- Counted slices.
- Ranges
- Owned pointers to any supported type (e.g.
Box
), as well as owned pointers
to types that can't be passed by value. - References, with a limited degree of lifetime support.
&str
- Closures, with a limited degree of lifetime support.
- Futures, with a limited degree of lifetime support. This would in particular
supportextern "crabi" async fn
. - "noreturn" functions, as expressed in Rust via
-> !
. - Opaque objects with crABI methods, without exposing representation. (This
would allow passing objects likeVec
orHashMap
orHashSet
, without
constraining the internals. This would also allow interoperating across
versions of Rust.)- An opaque error container, for use with
Result
.
- An opaque error container, for use with
- Trait objects with crABI methods. (This may use the same mechanism as
objects.)
Open questions
- Niches: should we support cases like
Option<bool>
without a separate
discriminant, or should we (for simplicity) always pass a separate
discriminant? Likely the latter. However, what about things likeOption<&T>
andOption<NonZeroU32>
, for which Rust guarantees the representation ofNone
? Those work with the C ABI, and they have to work with crABI, but can
we make them work with crABI using the same encoding ofNone
? - What subset of lifetimes can, and should, we support? We can't enforce them
cross-language, but they may be useful as an advisory/documentation
mechanism. Or we could leave them out entirely. - To what extent should crABI make any attempt to specify things that can't
be enforced, rather than ignoring semantics entirely and only specifying
how types get passed? - How can we make it easy to support data structures without having to do
translation fromrepr(Rust)
torepr(crabi)
and have parallel structures?
Can we make that less painful to express, and ideally mostly free at runtime?- Related: how can we handle tuples? Do we need a way to express
repr(crabi)
tuples? How can we do that conveniently?
- Related: how can we handle tuples? Do we need a way to express
- Should we provide support for extensible enums, such that we don't assume the
discriminant matches one of the known variants? Would doing so make using
enums less ergonomic? Could we address that with language changes? - For handling objects, could we avoid having to pass in-memory function
pointers via a vtable, and instead reference specific symbols? This wouldn't
work for generics, though. Can we do any better than a vtable? - For ranges, should we provide a concrete range type or types, or should we
defer that and handle ranges as opaque objects or traits? - Do we get any value out of supporting
()
, other than completeness? Passing()
by value should just be ignored as if it weren't specified. Do we want
people using pointers to()
, and do those have any advantage over pointers
to void? - Should we do anything special about
i128
andu128
, or should we just push
for getting those supported correctly inextern "C"
? - For generics, such as
Option<u64>
orResult<u32, ConcreteError>
or[u8; 16]
, does the rule "all generic parameters must be bound to concrete
types in the function signature" suffice, or do we need a more complex rule
than that? - Unwinding: The default
extern "crabi"
should not support unwind, and most
languages don't tend to have support for unwinding through C-ABI functions,
but should we have acrabi-unwind
variant? Would doing so provide value?
Prior art
Some potential sources of inspiration:
- WebAssembly Interface Types
- The
abi_stable
crate (which aims for Rust-to-Rust stability, not
cross-language interoperation, but it still serves as a useful reference) stabby
- UniFFI
- Diplomat
- Swift's stable ABI
- C++'s various ABIs (and the history of its ABI changes). crABI should not
strive to be a superset of any C++ ABI, though. - Many, many interface description languages (IDLs).
- The x86-64 psABI. While we're not specifying the lowering all the way to
specific architectures, we can still learn from how it handles various types.
Rationale and alternatives
Rather than being defined via lowering to the C ABI, crABI could directly
define how to pass parameters on underlying architectures, such as which
registers to use for which parameters and how to pass or return specific types.
This would have the advantage of allowing improvements over the C ABI. However,
this would have multiple substantial disadvantages, such as requiring dedicated
support in every programming language (rather than leveraging C FFI support),
and requiring definition for every target architecture. Instead, this proposal
suggests making such improvements at the C ABI level, such as by defining
extensions for passing or returning specific types in a more efficient fashion.
crABI could exclude portions of the C ABI considered unsafe, such as raw
pointers. This would make crABI not a strict superset of the C ABI. This
would make it difficult to handle functionality that crABI does not yet
support, while simultaneously using crABI for functionality it does support.
For instance, a program may wish to pass both an enum parameter and a raw
pointer parameter. Leaving out this functionality might encourage people to
avoid crABI or to define some functions via crABI and some via C ABI.
"crABI" serves as a neutral name identifying this ABI and its functionality.
This work previously went under the name "safe ABI", but given that the ABI
does not exclude portions of the C ABI considered unsafe, a name like "safe"
would be a misnomer. This work also previously went under the names "interop"
and "interoperable ABI"; however, the names interop
and "interoperable ABI"
are not particularly identifying, unambiguous, easy to talk about, or other
properties of a good name. In addition, "interop"/"interoperable" can imply a
greater breadth than the initial version of crABI aspires to, such as including
an IDL.
crABI does not officially stand for anything. Insert your favorite backronym.
Future work
- Debug/trace tools, such as debugger support or
ltrace
support, to decode
crABI structures and types. - Adding native crABI support to various languages.
- Shipping C header files defining structures for crABI.
Mentors or Reviewers
@m-ou-se and @joshtriplett are planning to work on this, and potentially coordinate other contributions.
Process
The main points of the Major Change Process are as follows:
- File an issue describing the proposal.
- A compiler team member or contributor who is knowledgeable in the area can second by writing
@rustbot second
.- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a
-C flag
, then full team check-off is required. - Compiler team members can initiate a check-off via
@rfcbot fcp merge
on either the MCP or the PR.
- Finding a "second" suffices for internal changes. If however, you are proposing a new public-facing feature, such as a
- Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.
You can read more about Major Change Proposals on forge.
Comments
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK