2

Progress report on rustc_codegen_cranelift (July 2023)

 1 year ago
source link: https://bjorn3.github.io/2023/07/29/progress-report-july-2023.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Progress report on rustc_codegen_cranelift (July 2023)

Jul 29, 2023

It has been quite a while since the last progress report. A ton of progress has been made since then, but I simply didn’t get around writing a new progress report. There have been 639 commits since the last progress report. This is significantly more than the last time given how long there has been since the last progress report. As such I skimmed the commit list to see what stood out to me. I may have missed some important things.

You can find a precompiled version of cg_clif at https://github.com/bjorn3/rustc_codegen_cranelift/releases/tag/dev if you want to try it out.

Achievements in the past nine months

Perf improvements

Debug assertions were accidentally enabled for the precompiled dev releases. Disabling them significantly improved performance from a ~13% improvement of cg_clif over cg_llvm to a ~39% improvement on one benchmark. Local builds have not been affected by this issue.

  • #1347: Build CI dist artifacts without debug assertions

A lot of vendor intrinsics have been implemented. The regex crate now works on AVX2 systems without cg_clif’s hack to make is_x86_feature_detected!() hide all features other than SSE and SSE2. This hack doesn’t work when the standard library is compiled using cg_llvm as will be the case when cg_clif gets distributed with rustup.

  • #1297: Implement some AArch64 SIMD intrinsics
  • #1309: Implement simd_gather and simd_scatter
  • #1378: Implement all vendor intrinsics used by regex on AVX2 systems
  • e4d0811: Implement _mm_srli_epi16 and _mm_slli_epi16
  • c09ef96: Implement _mm_shuffle_epi8
  • #1380: Implement a whole bunch more x86 vendor intrinsics

Build system rework

The build system has seen a significant rework to allow using it to test a precompiled cg_clif version and to allow vendoring of everything for offline builds. This was a requirement to testing cg_clif in rust’s CI. A PR is open to run part of cg_clif’s tests in rust’s CI.

  • #1291: Move downloaded test project to downloads/
  • #1298: Introduce CargoProject type and use it where possible
  • #1300: Rename the build/ directory to dist/
  • #1302: Allow specifying where build artifacts should be written to
  • #1338: Avoid clobbering build_system/ and ~/.cargo/bin
  • #1339: Many build system improvements
  • #1340: Push up a lot of rustc and cargo references
  • #1341: Refactor sysroot building
  • #1374: Allow building and testing without rustup
  • 5b3bc29: Allow testing a cranelift backend built into rustc itself
  • 134dc33: Fix testing with unstable features disabled
  • #1357: Support testing of cg_clif in rust’s CI
  • rust#112701: Run part of cg_clif’s tests in CI (not yet merged)

Inline assembly

const operands for inline_asm!() and global_asm!() are now supported. sym operands work in some cases, but if rustc decides to make the respective function private to the codegen unit it is contained in, you will get a linker error as inline asm ends up in a separate codegen unit while rustc thinks it ends up in the same codegen unit.

  • #1350: Implement const and sym operands for inline asm
  • #1351: Implement const and sym operands for global asm

s390x support tested in CI

@afonso360 contributed CI support for testing s390x.

  • #1304: Add S390X CI Support

Archive writer

As I already pointed out in a previous progress report I had been working on switching out the archive writer from a fork of rust-ar to a rewrite of LLVM’s archive writer. This work has since been completed. The LLVM backend still uses LLVM’s original version because a couple of regressions were found in the integration with rustc. I plan to fix those issues and switch the LLVM backend to the rust rewrite some time in the future.

  • #1155: Remove the ar git dependency
  • rust#97485: Rewrite LLVM’s archive writer in Rust

Benchmark improvements

Release builds of simple-raytracer are now benchmarked too. Release builds are slower but should still be faster than the LLVM backend. At the same time the resulting executables are about 20% faster and for simple-raytracer faster than LLVM in debug mode.

CI runs now also show the benchmark results if you scroll down on the overview page of the workflow run. See for example https://github.com/bjorn3/rustc_codegen_cranelift/actions/runs/5645453142.

  • #1373: Benchmark clif release builds with ./y.rs bench
  • 448b7a3: Record GHA step summaries for benchmarking

Challenges

While core::simd is fully supported through emulation using scalar operations, many platform specific vendor intrinsics in core::arch are not supported. This has been improving though with the most important (as far as the regex crate and its dependencies are concerned) x86 vendor intrinsics implemented.

  • issue #171: std::arch SIMD intrinsics

Cleanup during stack unwinding on panics

Cranelift currently doesn’t have support for cleanup during stack unwinding. I’m working on implementing this and integrating it with cg_clif.

Distributing as rustup component

There is progress towards distributing cg_clif as a rustup component. For example a decent amount of SIMD vendor intrinsics are now implemented and there is an open PR to run part of cg_clif’s test suite on rust’s CI. There are still things to be done though. https://github.com/bjorn3/rustc_codegen_cranelift/milestone/2 lists things I know of that still need to be done.

Contributing

Contributions are always appreciated. Feel free to take a look at good first issues and ping me (@bjorn3) for help on either the relevant github issue or preferably on the rust lang zulip if you get stuck.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK