A deeper look into the GCC Rust front-end

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!

Philip Herron and Arthur Cohen presented an update on the "gccrs" GCC front end for the Rust language at the 2022 Kangrejos conference. Less than two weeks later — and joined by David Faust — they did it again at the 2022 GNU Tools Cauldron. This time, though, they were talking to GCC developers and refocused their presentation accordingly; the result was an interesting look into the challenges of implementing a compiler for Rust.

Herron started by saying that he initially found the project too difficult; the Rust language was simply too volatile to try to develop a compiler for it. So he gave up for a while. He kept getting questions about when the work would be done, though, so he eventually restarted the project. The language has been mostly stable since 2015, so the task has gotten a little easier.

There are a few goals for the gccrs project beyond simply compiling Rust code. The work needs to end up in the GCC mainline once it's ready. It should reuse as much of the GNU toolchain as possible. There is also an effort to make the gccrs code as easy as possible to backport to older versions of GCC. Finally, advanced features like link-time optimization should be supported for Rust code.

The first step toward those goals was to create a parser for the language, then to start implementing Rust's data structures. Then came traits and generics; those features are complex, he said, but they are also at the core of how the language works. Control flow, and especially the match expression came next; after that was macro expansion. Const generics are in progress now, he said, while work on intrinsics and built-ins is just beginning. No work has been done on borrow checking; it is not needed to generate valid Rust code, so it can come later. Work on running the Rust test suite is also being done.

Another in-progress task is compiling the libcore library. This library not only has a number of important functions, it also defines many of the low-level features of the language. Without it, Herron said, "you can't do much". Current work is targeting an older version of libcore and is "getting there".

A look inside gccrs

One way in which the Rust front-end differs from many others in GCC is in its use of a special abstract syntax tree structure. It is needed to support features like macro expansion and name resolution. This tree is a sort of high-level, internal representation of a Rust program; at that point in the compilation, there is no distinction between functions and methods, and all macros have been expanded. It's used for type checking and error verification; once that's done, it can be translated and handed to the GCC mid-layer.

Cohen took over at this point to talk about macro expansion. Rust macros differ significantly from those supported by C or C++. They have typed arguments, can include both statements and expressions, have visibility modifiers, and more. Rust macros can use both repetition and recursion with results that are, he said, "cool but abstract". They support Kleene operators, and their specification requires follow-set ambiguity restriction which, he said, "is as scary as it sounds".

As a (relatively) simple example, he put up a macro that just computes the sum of its arguments:

    macro_rules! add {
        ($e:expr) => { $e };
	($e:expr, $($es:expr).*) => { $e + add!($($es).*) };

Invocation of this macro can be as simple as:

    add!(1);  // Yields 1

But it can also be more complex:

    add!(1, add!(2, 3), five(), b, 2 + 4);

It gets more complex from there. Rust macros, he said, enable the creation of complex domain-specific languages. It's a nice feature, but it also means that "Rust" is actually several languages in one, and all of them have to be implemented to actually have a Rust compiler.

Herron returned to talk about the type system and why it drove the creation of a separate internal representation. Rust's type system has a number of complex features, not all of which are well documented; he had to spend a fair amount of time digging through the rustc code to figure it all out. First on the list of features is name shadowing, which allows (and even encourages) frequent redeclaration of variables with the same name; shadowing "works for Rust" but wouldn't for many other languages, he said.

A trickier aspect is type inference; Rust allows the declaration of variables with neither a type nor an initializer, with the expectation that the compiler will eventually figure something out. He put up this sequence of code:

    let a;  		// No type or initializer
    a = 123;    	// a is some type of integer

    let b: u32 = 3;	// b is a 32-bit integer
    let c = a + b;	// 32-bit math all around now

The gccrs internal representation makes this sort of type inference work, he said.

Then, there is the interesting concept of the Rust never type. In Rust it is valid to write code like:

    let a = return;   	// a = !

The return statement does what one would expect, but the statement also has the result of assigning the never type (denoted "!") to the variable a. A more realistic example might be a return statement in one arm of a match expression. In any case, it is then legal to write code like:

    let b = a + 1;  	// b = ! + integer
    let a = 123;	// ! can be coerced to other types

The never type is an unstable feature. Cohen jumped in to say that nobody would ever write code like the above, but that the never type enables interesting things.

Other challenges mentioned by Herron include automatic dereferencing of struct fields, another undocumented behavior that took some time for him to figure out. Monomorphization also took some work and a fair amount of special-case handling. Cohen mentioned the (somewhat) object-oriented features of Rust and the extra checks they require. Visibility, in particular, is interesting; pub makes an object visible to the entire binary into which it is linked, while pub(crate) limits visibility to the current crate, and and pub(super) makes an item visible to the parent module. Managing all of this gets hard, he said. A Rust compiler must also implement unsafe, of course, which disables a lot of the checks that the compiler makes.

Generating code

Faust talked briefly about challenges at the code-generation stage. The bulk of this work is translating the gccrs internal representation into the tree structures used by the GCC backend. Some structures, like if and loops, are relatively straightforward. Others are not.

He specifically called out the match expression, which he described as "switch statements on a lot of steroids — probably illegal ones". The simple cases can just map to a switch statement, but the whole point of match is that it need not be simple. Matches involving tuples, for example, must try matching a single element at a time, which is something that the GCC internal representation wasn't designed to do. Arm guards (essentially an extra if controlling whether a specific match occurs) also complicate things, since the variables set by the match must be bound before the guard expression can be executed.

Gccrs now has a good module for const evaluation; it was derived from the C++ evaluator by a Google Summer of Code student. The rustc developers recently had to update their compiler to fix a const-evaluation bug, but, much to the satisfaction of its developers, gccrs was already handling that case correctly.

So, Herron continued, when will gccrs be ready? It can mostly compile libcore now, and things work. There are other core libraries, including liballoc, that are yet to be done, but that should be easier, he said. On the other hand, Cohen said, the code that implements procedural macros is going to be harder; it forces the compiler to act as a server, sending tokens to a separate libproc executable. That means implementing a remote procedure call server in the compiler front-end.

Then, Herron said, there is borrow checking, which is an inherent part of the language. Without borrow checking, gccrs will not be a Rust compiler, and it currently does not have one. The plan here is to use Polonius (which is being developed for rustc) and avoid duplicating all of that work.

As a sort of postscript, Herron mentioned that he has been talking with the Rust-for-Linux developers about compiling kernel code. Rust versioning is based on the notion of "editions", which form the core of its compatibility guarantees. But the kernel code cannot rely on such guarantees now due to its use of a large number of unstable features, some of which have "no clear path" toward stabilization. Creating a useful compiler is hard, Herron said, when there is no language standard. The gccrs developers are working toward adding kernel modules to their test cases, but properly supporting kernel development may take some time. At the close of the session, Mark Wielaard asked whether the kernel is alone in its use of unstable features; the answer was that "everybody uses them".

[Thanks to LWN subscribers for supporting my travel to this event.]

(Log in to post comments)

Welcome to LWN.net

A look inside gccrs

Generating code

Recommend

Top 200 Linkedin Creators Worldwide

英特尔将为外部客户及产品线引入内部代工模式，或在季度财报公布前大规模裁员

美商海盗船推出EX100U移动固态硬盘：容量最大4TB，最高1600MB/s传输速度

@aire-ux/aire-wizard - npm

How to Prove You Know a Secret Without Giving It Away | Quanta Magazine

“京东敢死队”与上海围城

郭明錤：苹果的下一个MacBook生产中心可能是泰国

Dragonfly - Take flight with first hyperscooter | Product Hunt

Overflow Stories - Create interactive, self-guided tours of your designs | Produ...

Meta发布万元VR，网友看完直呼：为啥不买Xbox+PS5+Switch

About Joyk