2

Experimental feature gate proposal `interoperable_abi` by joshtriplett · Pull Re...

 1 year ago
source link: https://github.com/rust-lang/rust/pull/105586
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Member

@joshtriplett joshtriplett commented Dec 12, 2022

edited

Summary

This experimental feature gate proposal proposes developing a new ABI, extern "interop", and a new in-memory representation, repr(interop), for
interoperability across high-level programming languages that have safe data
types.

This will use the feature gate interoperable_abi, which will be marked as
experimental until a subsequent RFC provides a precise definition of the
interoperable ABI.

The tag interop is a placeholder for the final name, and is likely to change
before stabilization. This work was previously discussed under the name "safe
ABI", but that was renamed to the placeholder "interop" to avoid misleading
implications of "safe".

Motivation

Today, developers building projects incorporating multiple languages, or
calling a library written in one language from another, often have to use the C
ABI as a lowest-common-denominator for cross-language function calls. As a
result, such cross-language calls use unsafe C representations, even for types
that both languages understand. For instance, passing a string from Rust to
another high-level language will typically use an unsafe C char *, even if
both languages have a safe type for counted UTF-8 strings.

For popular pairs of languages, developers sometimes create higher-level
binding layers for combining those languages. However, the creation of such
binding layers requires one-off effort between every pair of programming
languages. Such binding layers also add work and overhead to the project for
each pair of languages, and may not play well together when using more than one
in the same project.

Furthermore, higher-level data types such as Option and Result currently
require translation into C-ABI-compatible types, which discourages the use of
such types in cross-language interfaces, and encourages the use of more complex
and less safe encodings (e.g. manually encoding Option via an invalid value
of a parameter).

Finally, system libraries and other shared libraries typically use the C ABI
as well. Software making a Linux .so, Windows DLL, or macOS dylib, will
typically expose a C-compatible ABI, and cannot easily provide a higher-level
safe ABI without shipping language-specific high-level bindings.

The interoperable ABI will define a standard way to make calls across
high-level languages, passing high-level data types, without dropping to the
lowest common denominator of C. This ABI will work with any language providing
a C-compatible FFI (including C itself), and languages can also add specific
higher-level support for the interoperable ABI.

The interoperable ABI aims to be a reasonable default for compiled libraries in
both static and dynamic form, including system libraries.

Requirements

The interoperable ABI experiment will include a new ABI, extern "interop",
and a new in-memory representation, repr(interop).

The interoperable ABI will be a strict superset of the C ABI. This ensures
that, for functionality not yet supported by the interoperable ABI, users still
have the option of using their own translations to the raw C ABI, while still
using the interoperable ABI for what it does support.

The interoperable ABI will be defined via "lowering" to the C ABI: the
interoperable ABI will define how to pass or return types not supported by C,
by defining how to translate them to types and structures supported by C. This
allows any language with C FFI support to also call functions using the
interoperable ABI, without requiring special language support. However,
languages may still wish to add higher-level support for the interoperable ABI,
to avoid having to write a translation layer for their own native types.

To the extent the interoperable ABI supports passing ownership (e.g. strings),
it must also specify how to reclaim the associated memory. (However, future
support for objects or traits may require invoking a destructor instead.)

The interoperable ABI may define a symbol naming scheme, to allow identifying
symbols that use the interoperable ABI. This symbol naming scheme will be
compatible with languages that do not have native interoperable ABI support,
and thus must reference the symbol via that name.

The interoperable ABI should include a versioning scheme, to allow for future
compatible extensibility. Safe ABI version 1 will handle many simple cases of
widespread interest. More complex cases, such as trait objects, or arbitrary
objects with methods, will get deferred to future versions. The versioning
scheme will allow for both compatible and incompatible changes; changes to the
interoperable ABI will strive to remain compatible with previous versions when
not using functionality unsupported by those previous versions.

Rust will support defining functions using the interoperable ABI, and calling
interoperable-ABI functions defined elsewhere. Rust will support compiling both
static and dynamic libraries that export interoperable-ABI symbols.

Rust should also support passing around function pointers to functions that use
the interoperable ABI.

Non-requirements

The interoperable ABI does not aim to support the full richness of Rust's type
system, or that of other languages. It aims to support common cases more safely
and simply.

In particular, while the interoperable ABI will over time support an increasing
subset of Rust features, and specific types from the standard library will
become available as the necessary features to support them do, the
interoperable ABI does not aim to support the entire Rust standard library.

The interoperable ABI will not aim to support complex lifetime handling, or to
fully solve problems related to describing pointer lifetimes across different
languages. The interoperable ABI may provide limited support for some subsets
of this, such as "this pointer is only valid for the duration of this call and
must not be retained", or "this pointer transfers ownership to the callee, and
the caller must not retain it".

The interoperable ABI (at least in the first version) will not provide an
interface description language (IDL), in either source or compiled form;
function symbols using the interoperable ABI will not provide function
signature information in compiled objects.

The interoperable ABI does not aim to provide "translations" between the
representations of different languages. For instance, though different
languages may store strings in different fashions, the interoperable ABI string
types will have a specific representation in memory and a specific lowering to
C function parameters/results. Languages whose native string representation
does not match the interoperable ABI string representation may need to
translate, or may need to treat the interoperable-ABI string object as a
distinct data type and provide distinct mechanisms for working with it. (By
contrast, WebAssembly Interface Types aims to provide such translations in an
efficient fashion, by generating translation code as needed between formats.)

The interoperable ABI cannot support arbitrary compile-time generic functions;
generics will require the use of opaque objects, trait objects, or similar. A
future version could support exporting specific instantiations of generics.
(However, the interoperable ABI will support enough of generics to allow
types like Option<u64> or Result<u64, ConcreteError> or [u8; 16] or
[u8] to work, such as by supporting their use with concrete types as long
as no generic parameters remain unbound in the final function signature.)

The interoperable ABI cannot prevent callers from passing parameters that
violate the specification, and should not claim to. More generally, the
interoperable ABI does not provide sandboxing or similar functionality that
would be required to interoperate with untrusted code.

The initial version of the interoperable ABI will likely not attempt to
standardize destructors or memory reclamation, though future versions may.
Users of the interoperable ABI will still need to provide and use xyz_free
functions to delegate object destruction and reclamation back to the code that
provided the object.

Potential functionality

This section includes some potential examples of types the interoperable ABI
could support. Some of these will appear in the first version of the
interoperable ABI; many will get deferred to a future version.

  • Tuples, of arbitrary size.
  • The "unit" type ().
  • enums, including enum variants containing fields.
    • More specifically, Option and Result.
  • Counted UTF-8 strings, (with no guarantee of a NUL terminator).
  • A Unicode scalar value (Rust char).
  • Filesystem paths, or other operating-system strings.
  • Arrays, with a compile-time-known size.
  • Counted slices.
  • Ranges
  • Owned pointers to any supported type (e.g. Box), as well as owned pointers
    to types that can't be passed by value.
  • References, with a limited degree of lifetime support.
    • &str
  • Closures, with a limited degree of lifetime support.
  • Futures, with a limited degree of lifetime support. This would in particular
    support extern "interop" async fn.
  • "noreturn" functions, as expressed in Rust via -> !.
  • Opaque objects with interoperable-ABI methods, without exposing
    representation. (This would allow passing objects like Vec or HashMap or
    HashSet, without constraining the internals. This would also allow
    interoperating across versions of Rust.)
    • An opaque error container, for use with Result.
  • Trait objects with interoperable-ABI methods. (This may use the same
    mechanism as objects.)

Open questions

  • Niches: should we support cases like Option<bool> without a separate
    discriminant, or should we (for simplicity) always pass a separate
    discriminant? Likely the latter. However, what about things like Option<&T>
    and Option<NonZeroU32>, for which Rust guarantees the representation of
    None? Those work with the C ABI, and they have to work with the
    interoperable ABI, but can we make them work with the interoperable ABI
    using the same encoding of None?
  • What subset of lifetimes can, and should, we support? We can't enforce them
    cross-language, but they may be useful as an advisory/documentation
    mechanism. Or we could leave them out entirely.
  • To what extent should the interoperable ABI make any attempt to specify
    things that can't be enforced, rather than ignoring semantics entirely and
    only specifying how types get passed?
  • How can we make it easy to support data structures without having to do
    translation from repr(Rust) to repr(interop) and have parallel structures?
    Can we make that less painful to express, and ideally mostly free at runtime?
    • Related: how can we handle tuples? Do we need a way to express
      repr(interop) tuples? How can we do that conveniently?
  • Should we provide support for extensible enums, such that we don't assume the
    discriminant matches one of the known variants? Would doing so make using
    enums less ergonomic? Could we address that with language changes?
  • For handling objects, could we avoid having to pass in-memory function
    pointers via a vtable, and instead reference specific symbols? This wouldn't
    work for generics, though. Can we do any better than a vtable?
  • For ranges, should we provide a concrete range type or types, or should we
    defer that and handle ranges as opaque objects or traits?
  • Do we get any value out of supporting (), other than completeness? Passing
    () by value should just be ignored as if it weren't specified. Do we want
    people using pointers to (), and do those have any advantage over pointers
    to void?
  • Should we do anything special about i128 and u128, or should we just push
    for getting those supported correctly in extern "C"?
  • For generics, such as Option<u64> or Result<u32, ConcreteError> or
    [u8; 16], does the rule "all generic parameters must be bound to concrete
    types in the function signature" suffice, or do we need a more complex rule
    than that?
  • Unwinding: The default interoperable-ABI should not support unwind, and most
    languages don't tend to have support for unwinding through C-ABI functions,
    but should we have a interop-unwind variant? Would doing so provide value?

Prior art

Some potential sources of inspiration:

  • WebAssembly Interface Types
  • The abi_stable crate (which aims for Rust-to-Rust stability, not
    cross-language interoperation, but it still serves as a useful reference)
  • Swift's stable ABI
  • C++'s various ABIs (and the history of its ABI changes). The interoperable
    ABI should not strive to be a superset of any C++ ABI, though.
  • Many, many interface description languages (IDLs).
  • The x86-64 psABI. While we're not specifying the lowering all the way to
    specific architectures, we can still learn from how it handles various types.

Rationale and alternatives

Rather than being defined via lowering to the C ABI, the interoperable ABI could
directly define how to pass parameters on underlying architectures, such as
which registers to use for which parameters and how to pass or return specific
types. This would have the advantage of allowing improvements over the C ABI.
However, this would have multiple substantial disadvantages, such as requiring
dedicated support in every programming language (rather than leveraging C FFI
support), and requiring definition for every target architecture. Instead, this
proposal suggests making such improvements at the C ABI level, such as by
defining extensions for passing or returning specific types in a more efficient
fashion.

The interoperable ABI could exclude portions of the C ABI considered unsafe,
such as raw pointers. This would make the interoperable ABI not a strict
superset of the C ABI. This would make it difficult to handle functionality
that the interoperable ABI does not yet support, while simultaneously using the
interoperable ABI for functionality it does support. For instance, a program
may wish to pass both an enum parameter and a raw pointer parameter. Leaving
out this functionality might encourage people to avoid the interoperable ABI,
or to define some functions via interoperable ABI and some via C ABI.

The names interop and "interoperable ABI" are not particularly identifying,
unambiguous, easy to talk about, or other properties of a good name. This ABI
should get a better name before stabilization. For instance, "habi" ('h' for
'high-level'), "hli" (high-level interoperable), "spore" (systems programming
object representation enhancement, since "spores" are how rust spreads),
"crabi" (insert your favorite backronym here), or some arbitrary proper noun
with no particular meaning.

Given that the ABI does not exclude portions of the C ABI considered
unsafe, a name like "safe" would be a misnomer.

Future work

  • Debug/trace tools, such as debugger support or ltrace support, to decode
    interoperable-ABI structures and types.
  • Adding native support to various languages.
  • Shipping C header files defining structures for this ABI.
comex, F001, godvino, yerke, kornelski, 12101111, YaLTeR, BrainBlasted, golddranks, GuilhermeWerner, and 95 more reacted with thumbs up emojisdroege, yerke, kornelski, xerz-one, DianaNites, golddranks, GuilhermeWerner, jedel1043, aslynatilla, Vixea, and 43 more reacted with hooray emojikevinaboos, yerke, agluszak, dureuill, golddranks, GuilhermeWerner, WilliamTCarroll, NotAFile, clarfonthey, jedel1043, and 40 more reacted with heart emojiinquisitivecrystal, faptc, rami3l, kanru, yerke, rafaelcaricio, CarlSchwan, DianaNites, golddranks, GuilhermeWerner, and 27 more reacted with rocket emojiBe-ing, jplatte, JEnoch, GuilhermeWerner, vincentdephily, DasEtwas, danakj, dominikwilkowski, casey, Stumblinbear, and 25 more reacted with eyes emoji

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK