5

Progress report on rustc_codegen_cranelift (Okt 2022)

 1 year ago
source link: https://bjorn3.github.io/2022/10/12/progress-report-okt-2022.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Progress report on rustc_codegen_cranelift (Okt 2022)

Oct 12, 2022

There has a ton of progress since the last progress report. There have been 303 commits since then. @afonso360 has been contributing a ton to improve Windows and AArch64 support. (Thanks a lot for that!)

Achievements in the past four months

Windows support with the MSVC toolchain

Windows support with the MSVC toolchain has been added by @afonso360. This requires a Cranelift change to add COFF based TLS support, a rewrite of the bash scripts for testing in rust (as windows doesn’t have bash), adding inline stack probing to Cranelift (stack probing is necessary on Windows to grow the stack) and finally a couple of minor changes to tests to make them run on Windows. There are still a couple of issues though. For example the JIT mode just crashes. In addition Bevy gets miscompiled causing it to crash at runtime. An investigation into this is ongoing.

Abi fixes

Gankra’s abi cafe (previously abi-checker) now gets run on CI. This uncovered a couple of ABI issues between cg_clif and cg_llvm. Some were the fault of cg_clif and others had to be fixed in Cranelift.

AArch64 support

Linux on AArch64 now passes the full test suite of cg_clif. It is not tested in CI, so it is possible that support will regress in the future.

Basic s390x support

Basic support for IBM’s s390x architecture has been added by @uweigand. There is no testing on CI and there are still some test failures.

  • #1260: Ignore ptr_bitops_tagging test on s390x
  • issue #1258: s390x test failure due to unsupported stack realignment
  • issue #1259: Enabling s390x on CI

Multi-threading support

The LLVM backend has supported multi-threading during compilation from LLVM IR to object files since 2014. While the frontend is not parallelized, this can still give a non-trivial perf boost. Cg_clif until recently didn’t support this, causing it to take longer to compile especially on machines with many cores. After doing significant refactorings all over cg_clif for about two weeks I was able to implement multi-threading support in cg_clif too. It was a lot of effort, but it was well worth it. There are almost no cases where cg_llvm is faster than cg_clif now.

The perf results (warning: long image)

wall time on the rustc perf suite when compared to cg_llvm which shows almost all benchmarks having a significant improvement
  • #1264: Refactorings for enabling parallel compilation (part 1)
  • #1266: Refactorings for enabling parallel compilation (part 2)
  • #1271: Support compiling codegen units in parallel

Inline assembly

While working on implementing multi-threading I was able to remove the partial linking hack that was used for supporting inline assembly and incremental compilation at the same time. This hack was incompatible with macOS. Now that it is no longer necessary inline assembly works on macOS too.

  • e45f600: Remove the partial linking hack for global asm support
  • f76ca22: Enable inline asm on macOS

Portable simd

I implemented a couple of intrinsics used by core::simd. Only simd_scatter, simd_gather and simd_arith_offset are missing now. Note that a large portion of core::arch is still unimplemented.

  • #1277: Implement a couple of portable simd intrinsics

Challenges

Many vendor intrinsics remain unimplemented. The new portable SIMD project will however likely exclusively use so called “platform intrinsics” of which there are much fewer, compared to the LLVM intrinsics used to implement all vendor intrinsics in core::arch. In addition “platform intrinsics” are the common denominator between platforms supported by rustc, so they only have to be implemented once in cg_clif itself and in fact most have already been implemented. Cranelift does need a definition for each platform when native SIMD is used, but emulating “platform intrinsics” using scalar instructions is pretty easy.

  • issue #171: std::arch SIMD intrinsics

Cleanup during stack unwinding on panics

Cranelift currently doesn’t have support for cleanup during stack unwinding.

Distributing as rustup component

There is progress towards distributing cg_clif as rustup components, but there are still things to be done. https://github.com/bjorn3/rustc_codegen_cranelift/milestone/2 lists things I know of that still needs to be done.

Contributing

Contributions are always appreciated. Feel free to take a look at good first issues and ping me (@bjorn3) for help on either the relevant github issue or preferably on the rust lang zulip if you get stuck.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK