2

How Rust 1.64 Became 10-20% Faster On Windows?

 1 year ago
source link: https://tomaszs2.medium.com/how-rust-1-64-became-10-20-faster-on-windows-3a8bb5e81d70
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

How Rust 1.64 Became 10-20% Faster On Windows?

If you are a Rust developer you have most likely celebrated the news that the version 1.64 released on 22/09/2022 is 10-20% faster on Windows.

It is one of the most important changes in that release aside of stabilization of some APIs.

But how one merge request made it possible to account to such a performance improvement?

The person behind the merge request is Rémy Rakic aka lqd (pronounced liquid), a French software engineer regularly contributing changes to the Rust compiler.

On 12/05/2022 he submitted a new merge request illustrated with these performance test results:

1*S_4P-3q8AutqlCrw5yiEYA.jpeg

As a proof he was able to improve performance consistently across all kinds of tests up to 18.92%. On hundreds of tests, all showed a significant improvement, while only two showed a slight decrease:

1*wDZmzD7NlFlcdL7H9AbMVw.jpeg

Such results are a Holy Grail of every performance hunter.

On 11/07/2022 Jakub Beránek announced the good news after two months of heavy work done by lqd to pass tests and fine tune the merge request.

Another two months was needed to make it to the release.

It only proves how much effort it required to provide the result. But what was it about?

Profile-guided optimization

Rust compiler supports profile guided optimization. It is a set of techniques used to prepare application, figure out how an application is executed, and based on the data, optimize the application to execute the most important code as fast as possible.

The first phase is called an instrumentation, second training, and third optimization.

The app is enchanted with points of gathering data, for example to know what function is executed, and what is the order of execution.

Than, the app is called a lot of times to gather as much data, as possible.

At the end, a set of various techniques is used to figure out, how the code can be improved to boost the performance.

Inlining

One of such techniques is known as inlining. A technique made from an observation that having too much functions in the compiled application slows it down due to the overheat caused by finding, executing, and processing them.

Inlining recornizes patterns where one function calls another function often, and than inlines that required function into the former one. Practically the technique does not increase build size too much, but offers an important performance benefit. If you want to learn more about it, check out an article by Ankit Astana who describes it nicely in an illustrated way.

Machine Code Layout

To understand that technique we need to understand that our Rust code is compiled to the assembly code. The code executed by the processor. The processor likes to execute assembly code line by line. Because when code does not contain flow instructions, it can load the lines all at once, and use all it’s power to execute the block as fast as possible.

Since that said, even if don’t know about it, the way we write flow statements impacts how the assembly code will look like.

Of course it is not entirely possible to write code always being aware of what is the hot and cold code, what is the branch that will be executed the often.

That is where profile guided optimization also becomes handy. A set of very smart machine Code layout rules optimize the code to generate the longest chunks of code processor can digest. Causing again an improvement in Rust performance “just” by reorganizing the assembly code layout. You can read more about the technique in a nice article by Sergey Slotin.

Register allocation

The third important optimization technique revolves around register allocation.

Register is the fastest storage for data. It takes several nanoseconds for the processor to access the data from there.

However, the number of registers is extremely small. We are not talking about hundreds, but about eight, or a little bit more floating point registers. There are also general registers, but also not more than that.

Register allocation is a complicated set of algorythms and rules what data and when to out into the registers, for the processor to have them at hand when needed.

Rust compiler by handling finely register allocation is able to improve performance.

If you are interested in this optimization technique check out the Wikipedia page.

Performance-guided optimization and Windows

Since you now know some Performance-guided optimization techniques.

Rust compiler actually supports them. But there is one problem:

PDO was up until now available only on Linux.

It makes sense, because Rust may be used mostly by Linux users.

However, it made rustc not being able to offer the safe performance on other systems, like Windows.

What liquid did during all of this time he spent on the merge request, was enabling all of these techniques on Windows powered systems.

The level of dexterity to pull it off is beyond the charts making his merge request one of the most amazing ones I have seen lately. The well organized commit history is also a great paper to dig deeper into compiler coding.

Faster everything

The ramifications of compiler improvements such as the described Rust 1.64 PGO for Windows go far beyond the speed.

PGO techniques improve effectiveness of processing power. It means lower cost of processing data for datacenters, but also faster and cheaper devices that take less power and that last longer after being charged.

Things of a rising importance in the times we again realize how precious energy is.

Thanks for reading the article. If you want to read more stories like that, clap, subscribe via email, upvote and share it.

Brought to you by Tom Smykowski

Source: Merge Request


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK