14

Rewrite it in Rust by ridiculousfish · Pull Request #9512 · fish-shell/fish-shel...

 1 year ago
source link: https://github.com/fish-shell/fish-shell/pull/9512
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Conversation

(Sorry for the meme; also this is obligatory.)

I think we should transition to Rust and aim to have it done by the next major release:

  • Nobody really likes C++ or CMake, and there's no clear path for getting off old toolchains. Every year the pain will get worse.
  • C++ is becoming a legacy language and finding contributors in the future will become difficult, while Rust has an active and growing community.
  • Rust is what we need to turn on concurrent function execution.
  • Being written in Rust will help fish continue to be perceived as modern and relevant.

This should be thought of as a "port" instead of a "rewrite" because we would not start from scratch; instead we would translate C++ to Rust, incrementally, module by module, in the span of one release. We'll use an FFI so the Rust and C++ bits can talk to each other until C++ is gone, and tests and CI keep passing at every commit.

To prove it can work, in this PR I've ported FLOG, topic monitor, wgetopt, builtin_wait, and some others to Rust. The Rust bits live in a crate that lives inside the C++ and links with it. You can just build it the usual way:

  1. Install Rust 1.67 or later
  2. cmake as usual, and it should work, by way of corrosion

It works in GH Actions CI too.

The Plan has a high level description, and the Development Guide has more technical details on how to proceed. Please consider both to be part of this PR.

PedroHLC, roland-rollo, ClydeHuibregtse, kkysen, mattrighetti, andrewbanchich, JaszonAlexzander, thebitstick, tim-field, muloka, and 634 more reacted with thumbs up emojiEmbeddedBacon, hydrobeam, danem, twpayne, Virgiel, vaygr, ipostr08, sumchattering, omar2205, german-e-a, and 52 more reacted with thumbs down emojiho-229, mati865, MiguelGuthridge, matoruru, murlakatamenka, hinell, louwers, Lin2Jing4, VallanDeMorty, adriankumpf, and 16 more reacted with laugh emojiroland-rollo, kkysen, andrewbanchich, phase, thebitstick, blindside85, pkoch, echelon, freych, PoorlyDefinedBehaviour, and 136 more reacted with hooray emojiLin2Jing4 and EchedeyLR reacted with confused emojikrobelus, JacobTravers, philpax, nazmulidris, matusf, vrmiguel-cw, CosmicHorrorDev, kamulos, Adanos020, renato145, and 232 more reacted with heart emojispacekookie, exploide, krobelus, JacobTravers, dgudim, philpax, nazmulidris, dbofmmbt, kaanyalova, vrmiguel-cw, and 200 more reacted with rocket emojiexploide, Be-ing, JacobTravers, philpax, nazmulidris, dbofmmbt, CathalMullan, j-james, michaelBelsanti, cole-h, and 74 more reacted with eyes emoji
All reactions

Member

So, a few quick comments:

  1. I basically don't know rust at all (I believe I've used it for a combined twelve minutes), so I can't speak to any of the code itself.
  2. Rust 1.67 was released literally two days ago. Compared to us sticking to C++11 that's a staggering difference in version support. Is it that much easier to get a new rust on old systems? Our typical targets are old CentOS and Debian.
  3. Following on from that, how easy is this to package from a distro perspective? We've expended quite a bit of effort on making that easy, and I'd hate to drop that aspect.

As of yet unknown compatibility story for Tier 2+ platforms (Cygwin, etc).

Cygwin is mostly a major PITA because of its wchar_t type. If we can get away from using libc that could make some things easier.

kristafari, grantshandy, aameen-tulip, SKTT1Ryze, Tyler799, omninonsense, Arzte, ilyagr, CodingKoopa, fabianhjr, and 19 more reacted with thumbs up emoji

Contributor

get away from using libc

Rust uses libc under the hood.

kevincox, betancour, tristanisham, adelarsq, and Eoic reacted with thumbs up emojihinell, benelot, rounakdatta, sigmonsays, natdm, Eoic, and vasilev-alex reacted with laugh emojirumpelsepp reacted with eyes emoji

Member

faho

commented

Jan 29, 2023

edited

Rust uses libc under the hood.

Of course technically that's true, but that's not what I'm talking about.

What I mean is that we wouldn't use e.g. libc's "iswdigit" and friends, like we're forced to do now because C++ never got around to replacing them with ones that work with its strings.

And because we use those functions, we're beholden to the libc wchar_t type where we use them, which is everywhere, which means the difference between cygwin (2 byte wchar_t) and as best as I can tell any other system (4 byte wchar_t) can appear everywhere.

And we're beholden to platform differences like "does this provide a uselocale" or "does this have wcscasecmp or std::wcscasecmp", because those directly appear in libc.

In effect what I'm saying is that my hope is rust would not rely on libc for everything and abstract some of the other libc gunk away instead of throwing it in our face like C++ does. And so this historical wchar_t difference could just go away.

ssg, omninonsense, dryya, Th3Whit3Wolf, solson, benjaminbrassart, ax-bronsen, Speykious, bb010g, grxnola, and 9 more reacted with thumbs up emoji

Member

The progress to date is impressive, and the motivation to allow concurrent mode is compelling.

I agree that packaging is important but I'd be happy to tackle it. RHEL 7 is only supported for another 18 months, and I have managed to backport GCC to RHEL in the past so it might be possible to do the same. There's a couple of ways to generate Debian packages, which are worth exploring.

I don't know any rust, but I didn't know any C++ before I started work on fish, so I'd be keen to learn.

alexxbb, wmealing, weihanglo, zzhaolei, hlmtre, tjkirch, zolrath, lu-zero, ax-bronsen, cryptoquick, and 22 more reacted with thumbs up emojicryptoquick, lefoun, erlend-sh, jk-gan, bboydflo, Yukaii, lastmjs, rounakdatta, cljoly, garfieldnate, and 3 more reacted with heart emojibbhoss, Abourass, Th3Whit3Wolf, Chris--B, mattbrandlysonos, hen-x, avarun42, apokolokyntosis, dunxen, Uzaaft, and 15 more reacted with rocket emoji

Member

(Maybe a way forward to #972 etc?)

@@ -0,0 +1,48 @@

include(FetchContent)

This is going to make builds using OBS difficult; is it possible to include Corrosion in the tree and use it as a subdirectory instead?

Also, distro packagers are not going to like this, but it's fine if you try find_package first:

find_package(Corrosion)
if (NOT Corrosion_FOUND)
   include(FetchContent)
   ...
endif()
anordal, dschmidt, and ulidtko reacted with thumbs up emoji

Also, distro packagers are not going to like this, but it's fine if you try find_package first:

find_package(Corrosion)
if (NOT Corrosion_FOUND)
   include(FetchContent)
   ...
endif()

Distro packager for Arch Linux here, this would make our lives so much easier with the ability to use system libraries.

dschmidt, ulidtko, and janhicken reacted with thumbs up emoji

I'm happy to add that, but my hope is that no distro packages fish while it uses Corrosion. This use is meant to be temporary during a single development cycle; by the next release I hope to have no CMake at all, and therefore no Corrosion.

wezm, Th3Whit3Wolf, solson, Icy-Thought, Chris--B, zanchey, thaliaarchi, avarun42, Resonious, lu-zero, and 13 more reacted with thumbs up emoji

That's an ambitious goal. Good luck.

cryptoquick, Dorsug, vvzen, ulidtko, 0xpr03, and theotherjimmy reacted with rocket emoji

This is going to make builds using OBS difficult; is it possible to include Corrosion in the tree and use it as a subdirectory instead?

Corrosion can be added as a subdirectory/ submodule.

Considering the use of a fork of Corrosion, and that it is intended as a temporary measure that won't be released, I think using Corrosion as a subdirectory would make sense.

Hello, throwing myself into this conversation with some additional info that distro maintainers might find useful. FetchContent has always supported placing the source directory for any dependency in any location and then setting -DFETCHCONTENT_SOURCE_DIR_<uppercase-name>=<path>.

In this case, this would allow package maintainers to do -DFETCHCONTENT_SOURCE_DIR_CORROSION=<path/to/corrosion/sources> and not interrupt the workflow. No conditional find_package call is required.

Additionally, starting in CMake 3.24, it's possible to tell FetchContent_MakeAvailable to call find_package first (and vice versa where find_package calls FetchContent_MakeAvailable if no file was found).

Be-ing reacted with thumbs up emoji

crate-type=["staticlib"]

[patch.crates-io]

cxx = { git = "https://github.com/ridiculousfish/cxx", branch = "fish" }

Yikes, all this is going to require internet access from build hosts, I think?

yeah we need to tell cargo to pre-fetch all dependencies, like https://kressle.in/articles/2021/packaging-rust-apps-in-obs.php

bb010g, kevincox, and lakoliu reacted with thumbs up emoji

Cargo can download all dependencies' source code prior to building. See cargo fetch and cargo vendor. The [patch] table in Cargo.toml is an easy way to tell Cargo to use a repository you've forked, which is very convenient before your changes are merged upstream.

bb010g, Speykious, sburton84, kpcyrd, and ulidtko reacted with thumbs up emojicryptoquick and vvzen reacted with heart emoji
@@ -148,6 +148,7 @@ Dependencies

Compiling fish requires:

- Rust (version 1.67 or later)

- a C++11 compiler (g++ 4.8 or later, or clang 3.3 or later)

- CMake (version 3.5 or later)

I think the use of FindContent bumps CMake to 3.11 (unless corrosion gets included in the tree as discussed elsewhere)

bb010g reacted with thumbs up emoji

Current versions of corrosion require at least CMake 3.15. Some features are only available with CMake 3.19 and newer.

## Risks

- Large amount of work with possible introduction of new bugs.

This is amazing, thanks for spearheading this work.
The plan sounds all-around sensible to me.

The port sounds like a great investment while being easier to pull off than a rewrite.
There are many fresh shells these days but I don't think they offer a comparable interactive experience (most of them focus on inventing their language).

Still, the port is a lot of work with late payoff, so it might be hard to find persistent contributors.
Somehow Rust projects seem to be doing fine in this regard.
Granted most of them are written from scratch.
One incremental port that died a slow death is https://github.com/remacs/remacs - but I don't think it's comparable to fish, it's around 6x larger, and the authors didn't have ownership of the code. IIRC they used a simple bindgen build; our toolchain seems much more powerful.

ridiculousfish, hen-x, avarun42, pgilad, cryptoquick, bb010g, ulidtko, and timvisee reacted with thumbs up emoji

"might be hard to find persistent contributions"

Just wanted to chime in and say that I love the fish shell and have been using it every day for about 6 or 7 years now. If this goes through, I will absolutely be contributing to the ongoing port and future features. I'm a full time Rust engineer and my main project at work includes quite a lot of C FFI, so I'm no stranger to that. I know I'm just one person, but figured I'd show support

timvisee, abhi-glitchhg, pieps, Xiretza, and briankung-bpn reacted with thumbs up emojiabhi-glitchhg and briankung-bpn reacted with heart emojibriankung-bpn reacted with rocket emoji

FetchContent_Declare(

Corrosion

GIT_REPOSITORY https://github.com/ridiculousfish/corrosion

GIT_TAG fish

probably we want to have a SHA here, for reproducibility

Be-ing, kristof-mattei, and ulidtko reacted with thumbs up emoji

- Other languages considered:

- Java, Python and the scripting family are ruled out for startup latency and memory usage reasons.

- Go would be an awkward fit. fork is [quite the problem](https://stackoverflow.com/questions/28370646/how-do-i-fork-a-go-process/28371586#28371586) in Go.

- Other system languages (D, Nim, Zig...) are too niche: fewer contributors, higher risk of the language becoming irrelevant.

(FWIW Java startup time can be fixed by compiling a native image)

kristof-mattei, Virgiel, Xtrem532, clonardo, 007lva, 95th, xedrac, dzvon, coral, jharrilim, and 10 more reacted with thumbs down emojijxy, avarun42, lnicola, wizard-28, EdorianDark, Adzz, and austince reacted with laugh emoji

Member

Let me put a swift lid on that (and any other discussions of other programming languages, including swift).

Rust has the momentum, some of us already know it and others are interested in it.

This exists, it is reasonably far along, it works. The decision has essentially been made.

ulidtko reacted with thumbs up emojikrobelus, pythoneer, matklad, hudson-ayers, philipbjorge, not-matthias, Titaniumtown, blindside85, cemelo, Pratyush, and 13 more reacted with laugh emojimatklad, itsbalamurali, garusis, blindside85, cemelo, Pratyush, ssg, dryya, DianaNites, xedrac, and 8 more reacted with heart emoji

This comment was marked as off-topic.

## Timeline

Handwaving, 6 months? Frankly unknown - there's 102 remaining .cpp files of various lengths. It'll go faster as we get better at it. Peter (ridiculous_fish) is motivated to work on this, other current contributors have some Rust as well, and we may also get new contributors from the Rust community. Part of the point is to make contribution easier.

I'm eager to contribute as well, probably at modest capacity.
Seems like you have already overcome the biggest initial hurdles.
I'm always motivated to help people get up to speed with fish and Rust.

It took me 3 hours to port util.cpp, which has 200 lines.
With my new knowledge I could probably do it in 1 hour.
Extrapolating the 1-hour estimate to the remaining 60k lines gives us some 300 hours of work.
I'm sure that other modules are harder and might take longer but this number leaves me optimistic.

faho, Speykious, mhwdvs, amadejpapez, Arzte, and MrZLeo reacted with hooray emojiulidtko reacted with rocket emoji

The [autocxx guidance](https://google.github.io/autocxx/workflow.html#how-can-i-see-what-bindings-autocxx-has-generated) is helpful:

1. Install cargo expand (`cargo install cargo-expand`). Then you can use `cargo expand` to see the generated Rust bindings for C++. In particular this is useful for seeing failed expansions for C++ types that autocxx cannot handle.

2. In rust-analyzer, enable Proc Macro and Proc Macro Attributes.

Nowadays rust-analyzer enables proc macros by default (but LSP clients may override the default)

let fish_src_dir = format!("{}/{}", rust_dir, "../src/");

// Where cxx emits its header.

let cxx_include_dir = format!("{}/{}", target_dir, "cxxbridge/rust/");

While this may work, I warn you that this relies on unstable implementation details of Cargo and the cxx maintainer has adamantly refused to accept any more reasonable solution for integrating cxx with any C++ build system besides Bazel or Buck: dtolnay/cxx#462

Member

faho

commented

Jan 30, 2023

edited

Just throwing an idea out there: Given that this is a big underlying technical change and needs some larger changes in packaging, would it make sense to:

  1. Call fish-in-rust "fish 4"
  2. Support the last C++ series (3.6.x?) for a bit longer with bugfixes, added completions and such

This is, of course, a change in policy and more work, but I believe it may be warranted. These releases could be source-only (just tag a 3.6.3 and send out a mail about it), for those who are unable to install rust, especially distro packagers who find themselves looking at an incompatible rustc version.

It's of course possible I'm overestimating the difficulty here, maybe it's not needed because nobody has a problem switching to 4.0 immediately.

futile, martinetd, SergeyKasmy, danielphan2003, qdot, rben01, TheOnlyMrCat, zzhaolei, Slackadays, hlmtre, and 25 more reacted with thumbs up emojimbStavola, JacobTravers, ianchanning, gsora, and Yukaii reacted with heart emojiSlackadays, Chris--B, hen-x, JacobTravers, NatanFreeman, and Yukaii reacted with rocket emoji

Member

Building on that old NetBSD VM I have lying around currently fails with

running: "ar" "s" "/home/alfa/fish-shell/build-rust/./cargo/build/x86_64-unknown-netbsd/release/build/fish-rust-6df62e1ec51440cc/out/libfish-rust.a"
  exit status: 0
  cargo:rustc-link-lib=static=fish-rust
  cargo:rustc-link-search=native=/home/alfa/fish-shell/build-rust/./cargo/build/x86_64-unknown-netbsd/release/build/fish-rust-6df62e1ec51440cc/out
  cargo:rustc-env=AUTOCXX_RS=/home/alfa/fish-shell/build-rust/fish-autocxx-gen/rs

  --- stderr

  CXX include path:
    /home/alfa/fish-shell/build-rust/./cargo/build/x86_64-unknown-netbsd/release/build/fish-rust-6df62e1ec51440cc/out/cxxbridge/include
    /home/alfa/fish-shell/build-rust/./cargo/build/x86_64-unknown-netbsd/release/build/fish-rust-6df62e1ec51440cc/out/cxxbridge/crate
  /home/alfa/fish-shell/fish-rust/../src/proc.h:103:23: error: static_assert expression is not an integral constant expression
  /home/alfa/fish-shell/fish-rust/../src/proc.h:103:23: note: cast from 'void *' is not allowed in a constant expression
  Error: 
    × the include_cpp! macro couldn't be expanded into Rust bindings to C++:
    │ Bindgen was unable to generate the initial .rs bindings for this file.
    │ This may indicate a parsing problem with the C++ headers.

--- CMakeFiles/_cargo-build_fish-rust ---
*** [CMakeFiles/_cargo-build_fish-rust] Error code 101

Removing that static_assert (which is fine) makes it go on until it eventually ends in

[ 98%] Linking CXX executable fish
ld: libfish_rust.a(nix-e8ff18d973978598.nix.35ee0c1d-cgu.15.rcgu.o): in function `nix::sys::timer::Timer::get':
nix.35ee0c1d-cgu.15:(.text._ZN3nix3sys5timer5Timer3get17h045b7bbd1ff7f87dE+0x1b): warning: warning: reference to compatibility timer_gettime(); include <time.h> to generate correct reference
ld: libfish_rust.a(nix-e8ff18d973978598.nix.35ee0c1d-cgu.15.rcgu.o): in function `nix::sys::timer::Timer::set':
nix.35ee0c1d-cgu.15:(.text._ZN3nix3sys5timer5Timer3set17h47eed997322493a6E+0x4e): warning: warning: reference to compatibility timer_settime(); include <time.h> to generate correct reference

(and a bunch more nix::sys::timer and nix::sys::aio functions)

It also absolutely requires libclang, and will fail rather late in the build process.

Rust 1.67 was released literally two days ago. Compared to us sticking to C++11 that's a staggering difference in version support. Is it that much easier to get a new rust on old systems? Our typical targets are old CentOS and Debian.

If you want to use the distro version of Rust then you'll need to use a much older rust than latest stable. You can look it up for a particular distro, but Debian's rustc (for example) is often 6 months or more behind the latest stable. If you want to ship your program in a distro you usually need to build the program using that distro's version of the compiler, so I suspect you'd need to push your "minimum supported rust verison" (MSRV) farther into the past if your project is supposed to be part of distros.

If you don't need to be shipped in a distro, it's as easy as rustup update to update to the latest compiler. It's so simple to update the compiler that it's honestly hard to convince people in the rust community to have a library support anything other than the latest stable.

avarun42, lnicola, ishanjain28, ianchanning, sandalbanditten, Lucretiel, cryptoquick, Speykious, bb010g, jakibaki, and 4 more reacted with thumbs up emoji

You can look it up for a particular distro, but Debian's rustc (for example) is often 6 months or more behind the latest stable.

AFAIK Debian updates both rustc and other packages for a while, and then freezes all of them. So it does have the latest rustc for the duration of the development cycle, and so this should not be an issue for packages shipped by the distro itself.

Third-party repos, however, will need either their own newer rustc for the build, or have all the code written in Rust support an older compiler version, including all dependencies.

Just taking a quick look at the versions in https://packages.debian.org/sid/rustc:

  • buster 1.41 (2020-01-30)
  • bullseye 1.48 (2020-11-19)
  • bookworm 1.63 (2022-08-11)
  • sid 1.63

I'm not sure what you mean by "for the duration of the development cycle", but even sid falls behind "latest stable" on a regular basis until someone goes and updates it.

so I suspect you'd need to push your "minimum supported rust verison" (MSRV) farther into the past if your project is supposed to be part of distros.

You do not. See: BurntSushi/ripgrep#1019

Essentially, distros like Debian will just ship an older version of your program, just like they do for almost everything else.

jhpratt, Shnatsel, Be-ing, roland-rollo, josephglanville, tshirtman, andrewbanchich, not-matthias, zeenix, maxdeviant, and 40 more reacted with thumbs up emojimatklad, andrewbanchich, not-matthias, echelon, joshtriplett, blindside85, 95th, DianaNites, JacobTravers, xedrac, and 14 more reacted with heart emoji

Contributor

I've been alternating between obsessed with and simply always entertaining in the back of my mind the idea of a rust-powered fish for many years now (and have discussed that with other team members), so I'm approaching this with an open mind, but also with concerns. I must say that I knew this was coming for many months now from following @ridiculousfish on GitHub (heart), and have been wondering how it would the topic would be broached.

I passionately believe that rust is the way forward for writing any new code, especially in a cross-platform context. Quite apart from all the memory and concurrency safety it brings to the table, the general approach of the language and the community to always put correctness first and foremost has built a strong culture of correct, well-designed, and fairly maintainable software engineering. The package management is great, the team is fantastic, and support in the broader software world has been extremely positive and welcoming.

To go further, I have written my own fish-compatible rewrite (not port) of fish's tokenizer and was making progress on the AST and parser when I last had my "burning desire" to see something like fish but in rust. I was maintaining it on my private git server and not GitHub, but I've pushed it to GH in case anyone cares to take a look: https://github.com/mqudsi/velvet

Having said all that, I worry that rushing any part of this could be a death knell for the project. A quality rewrite (or port) takes time, on the magnitude of years rather than days. One good example to take inspiration from is tectonic, a xelatex port that started off with machine-rewritten code and then manually made progress on actual, idiomatic rust conversions... but even that was a greenfield project and not a first-party effort and so not saddled with some of the concerns we would be saddled by. The recommended module-by-module approach is definitely a great suggestion for a starting place, and I think it has great potential.

Off the top of my head, some concerns or points:

  • Compatibility, obviously. Fish runs on archaic hardware for fun. This would be the end of that. That's not a reason that should stand in the way of progress, but it's something we can't turn a blind eye to and would have to quite clearly signal that we're ok giving up all platforms that don't run (modern versions of) rust.
  • Idiomatic rust is too high-level for fish. You don't fork in rust, using the standard library interface doesn't give you access to poll/select/epoll/kqueue/whatever natively. async io is a horrible fake cludge with tokio/async_std faking it with thread pools or whatnot and performing far worse than truly native async implementations. We'd have to go that mostly alone to keep the same code we have (thinking "port" and not "rewrite" and aiming to preserve what we have).
  • Despite whether we call it a "port" or a "rewrite" there's going to be a heck of a lot of rewriting taking place. Subtle bugs will creep in. Subtle compatibility fixes that weren't properly documented or annotated will be lost.
  • We will be in a state of partial limbo until this completes one way or the other. We won't want to make any major changes to the C++ codebase while we're trying to get to feature parity in the rust one. Depending on how things unfold this could greatly hinder further progress in the project.
  • Not all core team members are fluent in rust. This seems unfair. (This isn't just something I'm saying in passing but rather something I've given a lot of thought to and broached with others before.)
  • C++ isn't really a "legacy" language (yet), no matter how much we wish the world would start treating it like that given all its warts and shortcomings.
  • I understand what @ridiculousfish meant by "Rust is what we need to turn on concurrent function execution" but respectfully disagree. A complete rewrite (or port, if you prefer) to another language is a great time to facilitate that, but it's hardly a requirement, though I guess it could functionally be considered one if you want to take into account correctness, maintainability, etc.
  • (Really minor compared to everything else, but if we're ultimately going to be a rust project, then I think it makes sense for all types coming from the rust codebase to use idiomatic rust naming conventions, but that would cause churn cpp-side. I suggest using idiomatic rust naming conventions and then using statements to alias them with cpp-style names in the cpp headers.)
jyn514, jeannekamikaze, luongthanhlam, ipostr08, korantu, carlstoker, hlmtre, Agapurnis, ChrisThrasher, joshuamegnauth54, and 37 more reacted with thumbs up emojiavarun42, SphericalKat, and axxop reacted with thumbs down emojiaxxop reacted with confused emojimikaelmello, grantshandy, 0xpr03, dryya, peterfaria-lula, sidkang, 0xSNIKITURTUL, Delta456, JacobTravers, hen-x, and 5 more reacted with heart emoji

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

## FFI

The boundary between Rust and C++ is referred to as the FII.

Typo: "as the FII" -> "as the FFI". The foreign function interface

ulidtko and LeoDog896 reacted with thumbs up emoji

Member

  • Idiomatic rust is too high-level for fish.

I think it's fine if a select few modules remain non-idiomatic Rust.

Subtle bugs will creep in. Subtle compatibility fixes that weren't properly documented or annotated will be lost.

Yeah this is a real threat. We need to be extra careful.
Unfortunately, rewrite commits are not easy to check for equivalence so code review takes a lot of effort. Happily we have people we can trust.
I think we have done a great job documenting odd behavior with comments, tests and a fairly clean commit history which helps.

We won't want to make any major changes to the C++ codebase while we're trying to get to feature parity in the rust one.

I think it's unlikely that we'll want to make far-reaching changes. As a quick estimate, over the last 4 years we only had

  • the new parser 4494414 (Merge branch 'parser_cleanup_3', 2020-07-04)
  • 13ab9e5 (Implement new event mechanism and migrate builtin and block output, 2019-02-17)
  • 6f22aad (Merge branch 'debounce', 2020-03-06)

all by @ridiculousfish who I think is also the only one maintaining long-lived branches

  • Not all core team members are fluent in Rust. This seems unfair.

Sure but I don't see them complaining? The language is not easy but once we have ported a substantial chunk of code, they can pick it up by osmosis.

Member

Not all core team members are fluent in Rust. This seems unfair.

Sure but I don't see them complaining?

So far both @zanchey and me have explicitly declined to complain.

EvanCarroll, Be-ing, zanchey, and eugenesvk reacted with thumbs up emoji

Contributor

For those of us interested in contributing to the port, is there a plan / guide for picking up components to work on? I don't want to step on toes etc. Is it likely sufficient to just pick a random C++ builtin, reimplement it, and file a PR (after this one lands)?

jguhlin and DianaNites reacted with eyes emoji

You don't fork in rust, using the standard library interface doesn't give you access to poll/select/epoll/kqueue/whatever natively.

If you're looking for idiomatic solutions, mio provides an abstraction on top of epoll, kqueue, and Windows equivalents, along with non-blocking read/write and socket operations. mio is best known as a core dependency of tokio, but it's also quite commonly used directly.

As a lower-level alternative, another very popular crate is nix, a thin wrapper for Unix APIs that provides more Rust-friendly type signatures but leaves the semantics alone. That includes the polling APIs as well as fork.

The standard library's I/O functionality is much more limited, but in Rust it's entirely idiomatic to bypass it.

tshepang, Lucretiel, Be-ing, alexxbb, DCNick3, cryptoquick, bb010g, Speykious, mon-jai, rrbutani, and 16 more reacted with thumbs up emoji

Lin2Jing4

added a commit to 2022-09-27T00-27-09-514Z/github-drama that referenced this pull request

Jan 31, 2023

Member

For those of us interested in contributing to the port, is there a plan / guide for picking up components to work on?

@Lucretiel You'll probably first want to wait until this PR lands, and then I'd guess it's fine to file an issue stating what you want to work on.

If there is a need to do anything more complicated, we can figure that out when it's needed.

If you're looking for idiomatic solutions

@comex At the moment we aren't. The plan is to first do a fairly straight port and then figure out all the details on how to make it idiomatic, whether that's through libraries or specific "patterns" or whathaveyou.

(also native Windows isn't really on the table at the moment - this is a unix shell, and the unix assumptions run deep)

Lucretiel, ben-clegg, chipbuster, phaalonso, and jguhlin reacted with thumbs up emoji

This comment has been hidden.

Lin2Jing4

added a commit to 2022-09-27T00-27-09-514Z/github-drama that referenced this pull request

Jan 31, 2023

This comment was marked as off-topic.

cxx = "1.0"

errno = "0.2.8"

inventory = { version = "0.3.3", optional = true}

lazy_static = "1.4.0"

This comment has been hidden.

This comment has been hidden.

Member

Definitely no nightly features here, and it's fine if this isn't perfect or idiomatic.

The plan is to first do a simple and straightforward port, so we can get away from C++. After that, we can discuss the minutiae.

This comment was marked as off-topic.

Lin2Jing4

added a commit to 2022-09-27T00-27-09-514Z/github-drama that referenced this pull request

Jan 31, 2023

The [autocxx guidance](https://google.github.io/autocxx/workflow.html#how-can-i-see-what-bindings-autocxx-has-generated) is helpful:

1. Install cargo expand (`cargo install cargo-expand`). Then you can use `cargo expand` to see the generated Rust bindings for C++. In particular this is useful for seeing failed expansions for C++ types that autocxx cannot handle.

cargo expand requires nightly rustc as it uses an unstable feature of rustc to perform the macro expansion. Rust-analyzer also has the "Expand Macro Recursively" command which you can invoke on a single macro invocation. This also works on stable as it uses rust-analyzer's own macro expander rather than the rustc one.

msyyces8x95, dryya, and lnicola reacted with thumbs up emoji

autocxx = "0.23.1"

cxx = "1.0"

errno = "0.2.8"

If you only need to read the errno value you could use std::io::Error::last_os_error().raw_os_error().unwrap(). If you also need to write it, you will indeed need a crate like errno.

This comment was marked as off-topic.

epage

commented

Jan 31, 2023

edited

@faho

So my preferred MSRV would be something like the one from Debian Stable/Oldstable, maybe even down to Ubuntu LTS. Of course, me not being a big rust person I have no idea how much pain that is - hence also my question about keeping the C++ track going for a bit longer.

While I know that this port is avoiding using various crates, focusing on a direct port, I believe I saw some crates already in use and I'm assuming that will grow as the amount of Rust code grows, especially once it hits the core of fish.

The user experience around dependency management with old versions of Rust isn't quite there yet which leads to a lot of debates around what MSRV to use (e.g. the libc thread). We have enumerated some potential improvements but I'm not aware of much work being done on these yet. Until then, a few crates will work on Debian Stable but most will only have an MSRV of 3-6 months and it can be a chore to use old versions.

# We need the build dir because cxx puts our headers in there.

# Corrosion doesn't expose the build dir, so poke where we shouldn't.

if (Rust_CARGO_TARGET)

set(rust_target_dir "${CMAKE_BINARY_DIR}/cargo/build/${_CORROSION_RUST_CARGO_TARGET}")

Please note that this is not always the correct path, depending on which Generator is used. Notably, it's not correct for Visual Studio generators and other Multi-config generators (e.g. Ninja-Multiconfig).

On a related note, have you considered using cxxbridge to generate the bindings from the CMake side?
On corrosion master there is a somewhat experimental function to add cxx bindings via cxxbridge. It's probably not production ready or anything, but it should be sufficient to showcase how to generate the bindings from the CMake side.

Notably, it's not correct for Visual Studio generators

We don't support Visual Studio (or really Windows at all - it's either wsl or cygwin).

other Multi-config generators (e.g. Ninja-Multiconfig)

This seems to be a "ninja but you can do debug and release in one" thing? I think it would be fair to not support that until cmake is retired. The plan is to hopefully do that before the next release.

Unless of course you know how to easily get the correct path?

Unless of course you know how to easily get the correct path?

Ignoring Visual Studio, the path currently is as below. This should be considered an implementation detail, but is also unlikely to change in the near future.

if(CMAKE_CONFIGURATION_TYPES)
        set (rust_target_dir  "${CMAKE_BINARY_DIR}/$<CONFIG>/cargo/build/${_CORROSION_RUST_CARGO_TARGET}")
else()
        set (rust_target_dir  "${CMAKE_BINARY_DIR}/cargo/build/${_CORROSION_RUST_CARGO_TARGET}")
endif()
faho reacted with hooray emoji

This comment was marked as off-topic.

- `const wcstring &` -> `&wstr`

- `const wchar_t *` -> `&wstr`

None of the Rust string types are nul-terminated. We're taking this opportunity to drop the nul-terminated aspect of wide string handling.

This comment has been hidden.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

As a lower-level alternative, another very popular crate is nix, a thin wrapper for Unix APIs that provides more Rust-friendly type signatures but leaves the semantics alone. That includes the polling APIs as well as fork.

I also believe that this is a core issue that is being worked on by the rust team at the moment with
this blog post may not be completely relevant but there is definitely a real effort coming this year for real forking natively.

Rust's `String` / `&str` types cannot represent non-UTF8 filenames or data using the default encoding scheme. That's why all string conversions must go through fish's encoding scheme (using the private-use area to encode invalid sequences). For example, fish cannot use `File::open` with a `&str` because the decoding will be incorrect.

So instead of `String`, fish will use its own string type, and manage encoding and decoding as it does today. However we will make some specific changes:

Contributor

It would be confusing (and incorrect) to use it, since OsStr on *nix is generally byte-oriented while fish's strings are currently Unicode-32, and we would be marshalling between the two.

Member

faho

commented

Jan 31, 2023

edited

Since this PR clearly escaped our little bubble, I feel like we should add some context, because I don't think everyone caught on to the joking tone of the opening message (check https://fishshell.com/ for similar writing - we are the shell for the 90s, after all), and really got what the idea here is.

People in various internet comment section also missed the fact that this was opened by the actual maintainer of the project, @ridiculousfish. It also wasn't really a surprise to the rest of the team, so we're all good here.

I'm not going to be adding a lot of code examples, because I want to focus on the broad overview of it all and the situation that we currently have, instead of getting stuck in minutiae.

Setting the stage

Fish is a fairly old codebase. It was started in 2005 by Axel Liljencrantz, who left the project some time in 2009.
He wrote it in C, and he decided to use wchar_t, presumably because that is how one assumed Unicode would work in 2005.
(It was also kept in a darcs repo, and some of our older commit still carry a darcs id in the git commit message)

After a few years of inactivity, fish was picked back up by Peter, who goes by @ridiculousfish (the name, I understand, is a complete coincidence).
He wrote a fork initially called "fishfish", and got Axel's okay to make it the "official" fish later on, this became fish 2.0.

ridiculousfish then ported fish to C++, mainly to get some data structures to replace the typical C write-your-own cruft.

A few other notable decisions:

  1. Fish should be available on servers, which run old LTS distros - this means we build our own packages for a variety of them.
  2. Fish should be easy to build and run, because it should be easy to send in patches - we want completion scripts by people who know a tool but aren't otherwise "programmers" or really know build systems, and we ideally want them to test these with what our git repo has to offer so they can file a nice PR.

The pain of the current tech

C++ is an old standardized language with multiple implementations, married at the hip to C, an even older even more standardizederer language. Any changes take ages to get to users so we can actually use it. We moved to C++11 in 2016 (some quick calculation shows that's 5 years after the standard was released), and we're still on C++11 because the pain of upgrading is larger than the promises of C++14/17. We needed to backport compilers for our packages until, I believe, 2020.

Having multiple implementations means you always hit the worst of any of them, because you can't dictate which implementation your users use. So we have to deal a lot more with cmake than we would like, sometimes for things as awkward as "which header is this function in".

C++'s string handling is subpar, and it's much too easy to fall into passing raw wchar_t * around (and we don't have access to string_view and that just enables even more use-after-free bugs!). This is annoying, because a shell is almost entirely string handling and unix api wizardry.

Fish also uses threads, for things like the autosuggestion and syntax highlighting, and we would like to have more "threads"/"concurrency"/"run-thing-in-backgroundness" (please insert the correct term here, I have never been good at these things). One large project has been to run multiple fish builtins and functions "at the same time", to enable things like backgrounding functions (ideally without using "subshells" because those are an annoying environment boundary that shows up in surprising places in other shells), and to simply be able to pipe two builtins into each other and have them actually process both ends of the pipe "simultaneously".

C++ offers few guarantees on what can be accessed from which thread. @ridiculousfish has been trying to crack this for years, and hasn't been confident enough in his solution. We want a tech stack that helps us here, and C++ doesn't.

The other general issues with C++ (header files, memory safety, undefined behavior, compiler errors are terrible, templates are complicated) are well-known at this point so I'm not going to rehash them. We know them, we have them, we hate them.

I don't really have a lot to say about cmake - the language isn't great and finding dependencies is a pain. We have worked around more issues in it than I would like.

Some of this is, undoubtedly, "self-inflicted" (wchar is, on the whole, a bad choice), for various definitions of that term, and it might be possible that some C++ fairy could make it work better, but the reality is that that fairy never materialized. No C++-Navi told us to "Hey! Listen! Templates!" and I forgot my ocarina elsewhere.

The gist of it is that C++ has caused us quite some grief, and we're done with it, and so, we have decided to leave it and everything related to it behind.

Why Rust specifically?

In short: Because it seems to be up to the task, because we have people who know it on the team, because it has momentum and because it promises to help us with the threading problem.

Other languages either have some specific pain point (go, python, java) or simply aren't known to us - this includes D, Zig, Nim and waaaayyy too many to list. That means asking us to use them means getting everyone to learn them and so far none of us have been interested.

Another big point: This PR exists. We already have a start. That may be "unfair", and we haven't democratically opened this up to our users, but at the end of the day the programming language is mostly an implementation detail and so it should be decided by the implementors.

We're not making an Objective Review For All Uses For Everyone, we are choosing what works for us. And so, social questions matter a lot. On that note rust also has a bunch of buzz and mindshare that other languages lack - would we have had people queuing up to help if we had gone with D? I doubt it.

Of course, where the programming language isn't an implementation detail, we'll work on it. That's why I have been asking a bunch of questions related to packaging in this thread, and that is why we have spoken about platform support.

jbiason, exploide, ahhhh6980, Tmpod, Omrigan, zolrath, neh, DaviDeMeo, serban, thepudds, and 14 more reacted with thumbs up emojimqudsi, azzamsa, NTmatter, scturtle, krshrimali, rustatian, ckoehler, steveklabnik, slanterns, masklinn, and 40 more reacted with heart emojitimvisee, samhh, azzamsa, scturtle, krshrimali, steveklabnik, Slackadays, rami3l, futile, darksv, and 19 more reacted with rocket emoji

@ulidtko ulidtko

left a comment

Am submitting late after @faho clarifications clap

Did a review — nothing particularly severe spotted — just a bunch of minor nitpicks / suggestions for improvement (as usual). Resolve as you will pray

Overall, good code, LGTM! Didn't test the PR. (but daily user of fish)

I think Rustdoc won't understand the (doxygen?) syntax \returns, \p foo in doc-comments; refrained from commenting on in beside here. It's not exactly a priority I guess.


FWIW CMake is nice compared to other C++ surrogates of Cargo. Anyone seriously tried GNU Autotools, Boost.Build v2 / bjam, Google's "Build Tools"?.. They're all far worse than CMake; which at least has a decent documentation.

But I won't defend C++, it's despicable. Garbage dumpster fire of a language. Couldn't care less of the social "reasons" (most loved / most popular BS) — there're concrete technical reasons why C++ sucks where Rust doesn't.

Am not a previous contributor (apart from light discussions & small patches to neighbouring repos) — but with this transition, may as well become one! Welcome to @ping me on fish+Rust stuff, I'll respond with ~days latency.

endif()

# Tell Cargo where our build directory is so it can find config.h.

corrosion_set_env_vars(${fish_rust_target} "FISH_BUILD_DIR=${CMAKE_BINARY_DIR}" "FISH_AUTOCXX_GEN_DIR=${fish_autocxx_gen_dir}" "FISH_RUST_TARGET_DIR=${rust_target_dir}")

I understand that the CMake script is intended as short-lived and soon-to-go-away — but this overlong 170-char line can be easily split:

Suggested change
corrosion_set_env_vars(${fish_rust_target} "FISH_BUILD_DIR=${CMAKE_BINARY_DIR}" "FISH_AUTOCXX_GEN_DIR=${fish_autocxx_gen_dir}" "FISH_RUST_TARGET_DIR=${rust_target_dir}")
corrosion_set_env_vars(${fish_rust_target}
"FISH_BUILD_DIR=${CMAKE_BINARY_DIR}"
"FISH_AUTOCXX_GEN_DIR=${fish_autocxx_gen_dir}"
"FISH_RUST_TARGET_DIR=${rust_target_dir}"
)

Fish will mostly _not_ use Rust's `String/&str` types as these cannot represent non-UTF8 data using the default encoding.

fish's primary string types will come from the [`widestring` crate](https://docs.rs/widestring). The two main string types are `WString` and `&wstr`, which are renamed [Utf32String](https://docs.rs/widestring/latest/widestring/utfstring/struct.Utf32String.html) and [Utf32Str](https://docs.rs/widestring/latest/widestring/utfstr/struct.Utf32Str.html). `WString` is an owned, heap-allocated UTF32 string, `&wstr` a borrowed UTF32 slice.

In markdown source, I find out-of-line links nicer to read and edit:

Suggested change
fish's primary string types will come from the [`widestring` crate](https://docs.rs/widestring). The two main string types are `WString` and `&wstr`, which are renamed [Utf32String](https://docs.rs/widestring/latest/widestring/utfstring/struct.Utf32String.html) and [Utf32Str](https://docs.rs/widestring/latest/widestring/utfstr/struct.Utf32Str.html). `WString` is an owned, heap-allocated UTF32 string, `&wstr` a borrowed UTF32 slice.
fish's primary string types will come from the [`widestring` crate](https://docs.rs/widestring). The two main string types are `WString` and `&wstr`, which are renamed [Utf32String][] and [Utf32Str][]. `WString` is an owned, heap-allocated UTF32 string, `&wstr` a borrowed UTF32 slice.
[Utf32Str]: https://docs.rs/widestring/latest/widestring/utfstr/struct.Utf32Str.html
[Utf32String]: https://docs.rs/widestring/latest/widestring/utfstring/struct.Utf32String.html

## Development Tooling

The [autocxx guidance](https://google.github.io/autocxx/workflow.html#how-can-i-see-what-bindings-autocxx-has-generated) is helpful:

Suggested change
The [autocxx guidance](https://google.github.io/autocxx/workflow.html#how-can-i-see-what-bindings-autocxx-has-generated) is helpful:
The [autocxx guidance][] is helpful:
[autocxx guidance]: https://google.github.io/autocxx/workflow.html#how-can-i-see-what-bindings-autocxx-has-generated

— and the link can be moved to the bottom of ## Development Tooling section

# This file is automatically @generated by Cargo.

# It is not intended for manual editing.

version = 3

Will suggest a tiny tweak:

  • add **/Cargo.lock linguist-generated=true to .gitattributes

The effect of it will look like this (different lockfile, same idea):

image

I.e. GH will hide (collapse by default) any diffs in the lockfile, making future PRs a bit easier to review.

/// Parse the given \p src as an integer.

/// If mradix is not None, it is used as the radix; otherwise the radix is inferred:

/// - Leading 0x or 0X means 16.

/// - Leading 0 means 8.

/// - Otherwise 10.

/// The parse result contains the number as a u64, and whether it was negative.

fn fish_parse_radix<Chars>(ichars: Chars, mradix: Option<u32>) -> Result<ParseResult, Error>

There's no arg called src

Suggested change
/// Parse the given \p src as an integer.
/// If mradix is not None, it is used as the radix; otherwise the radix is inferred:
/// - Leading 0x or 0X means 16.
/// - Leading 0 means 8.
/// - Otherwise 10.
/// The parse result contains the number as a u64, and whether it was negative.
fn fish_parse_radix<Chars>(ichars: Chars, mradix: Option<u32>) -> Result<ParseResult, Error>
/// Parse the given \p ichars as an integer.
/// If mradix is not None, it is used as the radix; otherwise the radix is inferred:
/// - Leading 0x or 0X means 16.
/// - Leading 0 means 8.
/// - Otherwise 10.
/// The parse result contains the number as a u64, and whether it was negative.
fn fish_parse_radix<Chars>(ichars: Chars, mradix: Option<u32>) -> Result<ParseResult, Error>

topic_t::sighupint => self.sighupint,

topic_t::sigchld => self.sigchld,

topic_t::internal_exit => self.internal_exit,

_ => panic!("invalid topic"),

ditto

Suggested change
_ => panic!("invalid topic"),

pub fn any_valid(&self) -> bool {

let mut valid = false;

for gen in self.as_array() {

if gen != invalid_generation {

valid = true;

}

}

valid

}

Suggested change
pub fn any_valid(&self) -> bool {
let mut valid = false;
for gen in self.as_array() {
if gen != invalid_generation {
valid = true;
}
}
valid
}
pub fn any_valid(&self) -> bool {
self.as_array()
.iter()
.any(|&gen| gen != invalid_generation)
}

// // Whoof. Thread Sanitizer swallows signals and replays them at its leisure, at the point

// // where instrumented code makes certain blocking calls. But tsan cannot interrupt a signal

// // call, so if we're blocked in read() (like the topic monitor wants to be!), we'll never

// // receive SIGCHLD and so deadlock. So if tsan is enabled, we mark our fd as non-blocking

// // (so reads will never block) and use select() to poll it.

double-comment

Suggested change
// // Whoof. Thread Sanitizer swallows signals and replays them at its leisure, at the point
// // where instrumented code makes certain blocking calls. But tsan cannot interrupt a signal
// // call, so if we're blocked in read() (like the topic monitor wants to be!), we'll never
// // receive SIGCHLD and so deadlock. So if tsan is enabled, we mark our fd as non-blocking
// // (so reads will never block) and use select() to poll it.
// Whoof. Thread Sanitizer swallows signals and replays them at its leisure, at the point
// where instrumented code makes certain blocking calls. But tsan cannot interrupt a signal
// call, so if we're blocked in read() (like the topic monitor wants to be!), we'll never
// receive SIGCHLD and so deadlock. So if tsan is enabled, we mark our fd as non-blocking
// (so reads will never block) and use select() to poll it.

/// The topic monitor class. This permits querying the current generation values for topics,

/// optionally blocking until they increase.

/// What we would like to write is that we have a set of topics, and threads wait for changes on a

/// condition variable which is tickled in post(). But this can't work because post() may be called

/// from a signal handler and condition variables are not async-signal safe.

/// So instead the signal handler announces changes via a binary semaphore.

/// In the wait case, what generally happens is:

/// A thread fetches the generations, see they have not changed, and then decides to try to wait.

/// It does so by atomically swapping in STATUS_NEEDS_WAKEUP to the status bits.

/// If that succeeds, it waits on the binary semaphore. The post() call will then wake the thread

/// up. If if failed, then either a post() call updated the status values (so perhaps there is a

/// new topic post) or some other thread won the race and called wait() on the semaphore. Here our

/// thread will wait on the data_notifier_ queue.

type topic_bitmask_t = u8;

the doc-comment should attach to topic_monitor_t, not to topic_bitmask_t

// Clear wakeup bit and set our topic bit.

let mut newstatus = oldstatus;

newstatus &= !STATUS_NEEDS_WAKEUP; // note: bitwise not

newstatus |= topicbit;

consider core::ops::Not et al... explicit method calls can remove the syntactic confusion of ! meaning logical (not bitwise) negation.

Suggested change
// Clear wakeup bit and set our topic bit.
let mut newstatus = oldstatus;
newstatus &= !STATUS_NEEDS_WAKEUP; // note: bitwise not
newstatus |= topicbit;
// Clear wakeup bit and set our topic bit.
let newstatus = oldstatus.bitand(STATUS_NEEDS_WAKEUP.not()).bitor(topicbit);

There're of course multiple ways to write this — but a single line plays well with its comment // Clear wakeup bit and set our topic bit.

@@ -0,0 +1,47 @@

[package]

You may want to use a cargo workspace. This allows sharing the Cargo.lock and target dirs between all crates in the cargo workspace. You can do this by adding the following to this Cargo.toml:

[workspace]
members = ["path/to/crate1", "path/to/crate2"]

You can also use * as wildcard. And in this specific case I think you can omit the members key entirely as widestring-suffix is used as dependency of this crate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK