Rustaceans at the border

Rustaceans at the border

[LWN subscriber-only content]

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!

Support for developing in the Rust language is headed toward the kernel, though just when it will land in the mainline is yet to be determined. The Rust patches are progressing, though, and beginning to attract attention from beyond the kernel community. When two languages — and two different development communities — come together, the result can be a sort of cultural clash. Some early signs of that are appearing with regard to Rust in the kernel; if the resulting impedance mismatches can be worked out, the result could be a better development environment for everybody involved.

The latest round of Rust patches was posted by Miguel Ojeda on March 17. This time around, Rust support has moved forward to version 1.59.0 of the Rust language, which has stabilized a couple of important (for the kernel) features. The patches add a new module abstracting access to hardware random-number generators. A CString type has been added for C strings. The spinlock implementation has been improved. All told, the patch series, which can be found in the linux-next repository, adds over 35,000 lines of code and documentation; it is a considerable body of work.

There has been no public discussion on just when these patches might be deemed ready to go into the mainline kernel. Rust support is still considered "experimental" even by its developers; that is likely to remain the case for some time (even after this work is merged into the mainline) until the language proves itself for kernel development.

Clearly, though, some developers are beginning to play with it — and they are not all traditional kernel developers. Recently, Nándor Krácser asked on the Rust-for-Linux mailing list about the possibility of including Rust modules from the crates.io repository into the kernel build. This request, seemingly, is not just for small stuff:

Currently I'm experimenting with different crates which I would like to use in my module, serialization libraries, math libraries. etc, even complex ones, are really hard to pull in as a direct source library (copy the code to the module), and if they have a transitive dependency that complicates things even more.

Shortly thereafter, Chris Suter showed up with a similar request. Rust developers working with kernel modules, it seems, want more functionality than the current kernel crate provides to them.

This should not be entirely surprising. Like many newer languages, Rust is closely tied to a language-specific package-management system and associated central repository; in this case, the Cargo package manager and crates.io. Developers in such languages quickly become accustomed to pulling in new modules (and any dependencies they may have) with a simple command, and to having the build system make dependencies magically appear when building a new program. For these developers, the idea of working in an environment where complex libraries are not obtainable with a few keystrokes starts to have a distinct lack of appeal after a while.

The kernel does not work in this way, though. To those of us who didn't grow up with that kind of development environment, it looks like a recipe for bloat, bugs, and security problems. Depending on central repositories opens up a project to problems like the famous leftpad incident or, worse, the deliberate insertion of malicious software. A lack of attention to API compatibility leads to a thicket of version requirements and dependency-resolution problems so complex that machine-learning systems are emerging to deal with them. Plus it all just looks so undisciplined and messy.

At least some of the criticisms of this mode of development are valid, but it's also not hard to detect a bit of Stockholm syndrome as well. For many of us, for much of our careers, building a new program from source was likely to involve a lengthy cycle of "try to build, figure out which dependency it wants now, install the dependency" iterations — and recursive iterations at that when the dependencies turn out to have missing dependencies of their own. This exercise helped us to understand our systems better and must somehow have helped us to build better moral character, so we can't understand why Kids These Days just don't want to live that way.

The kernel community seems more than usually likely to have developers who are resistant to newer methods of development. The kernel has to stand alone, and its developers keep a firm grip on its dependencies. The kernel repository contains all of the code needed to build a working kernel; developers can be expected to install a limited set of tools to do the build, but the idea of installing external libraries to build into the kernel would not go over well.

So when developers see a shopping list like the one posted by Suter:

Like I said, I'm interested in futures. Why it's useful: async Rust is arguably more common and easier to use than other forms of multi-threaded processing. Other crates that I'd like: anyhow, bincode, byteorder, log, once_cell, pin-project, rand, serde, slab, static_assertions, uuid plus some more esoteric ones.

The first temptation will be to either run and hide or to respond in a way that may not be compliant with anybody's code of conduct.

There are some good reasons for this. As Greg Kroah-Hartman pointed out, code that has been written to be useful in user space almost certainly does not work within the constraints imposed on kernel code. "Async Rust" knows nothing about kernel threads or how context switching is done in the kernel, for example. Kernel code must be extremely careful in how it allocates memory, must not use floating-point arithmetic, cannot store large data structures on the stack, and cannot use unbounded recursion, among many other rules. Most user-space code, which was not written with these rules in mind, will fare poorly in this environment. For this reason, Kroah-Hartman said that any functionality desired by Rust programs must be specially written and provided in the dedicated kernel crate.

The Rust-for-Linux developers understand this situation and are not envisioning adding the ability to pull in modules with a tool like Cargo. So it is interesting that a long-time kernel developer, Kent Overstreet, was the one to argue for a different approach. "The world is changing", he said, and perhaps it is time for the kernel community to change with it as well. There are numerous situations where it can be beneficial to run code in both user and kernel space, he said, and the fact that doing so is currently painful is a problem for developers on both sides:

The solution to problems like these are to stop thinking that kernelspace and userspace _have_ to be completely different beasts - they really don't have to be! and start encouraging different thinking and tooling improvements that will make our lives easier.

It is true that the boundary between kernel and user space has become more porous over the years. Various subsystems provide hooks that allow formerly kernel-specific tasks to be carried out in user space instead, while user space can use BPF to run code inside the kernel. But the two environments are still quite different, and code meant to run on one side generally cannot run on the other.

There has not been a lot of effort put into thinking about how to reduce that divide; perhaps it really is time for that to change. The Rust language might just be the environment in which this transformation could happen. As Overstreet put it:

Rust's conditional compilation & generics story is _much_ better than what we've had in the past in C and C++, meaning writing Rust code that works in both userspace and kernelspace is much saner than in the past.

If an initiative like this were to work, it could greatly reduce the barrier to entry for future kernel developers while making a lot of useful code available to the kernel community. It would be a different kernel project than the one we know now, but it might be a more fun and more productive one.

Interesting things tend to happen when immigrants show up in a new land. They can often create a backlash among those who are already there — the new people dress funny and their cooking smells weird, after all, and some of them even have a crab as their mascot. But they can also bring energy and ideas that shake up their new home and make it richer for everybody involved. It may just be that we will see something like that happen if and when a crowd of Rust developers descends upon the kernel community. The end result could be difficult to recognize — and perhaps better than anything we had before.

(Log in to post comments)

Rustaceans at the border

Posted Apr 14, 2022 19:24 UTC (Thu) by rvolgers (subscriber, #63218) [Link]

> "Async Rust" knows nothing about kernel threads or how context switching is done in the kernel, for example. Kernel code must be extremely careful in how it allocates memory, must not use floating-point arithmetic, cannot store large data structures on the stack, and cannot use unbounded recursion,

The Rust response to this would probably be "can't we just... solve that?"

Various embedded Rust folks have been doing work on things like static stack usage analysis. Tight stack requirements are not unique to the Linux kernel by any means, it is clearly desirable to have insight and control over that.

And as for memory allocation, again, people write Rust code which has no support for memory allocation at all, using libraries which are on crates.io. And they're not always super custom libraries, sometimes it's literally just a matter of enabling the "no_std" feature flag, and you can use a bog-standard Rust library (with a somewhat more limited API, of course).

More complex things like interrupt context seem more likely to remain "here be dragons" territory (just like signal handlers are, to some degree, in userspace). But Rust is hardly unique in that; not all kernel C functionality is safe to use there either.

TLDR: Some of the arguments being used smack of prejudice against Rust ecosystem libraries as being by definition userland-focused. Many are, but many are not, and some adapt well to either use case.

Rustaceans at the border

Posted Apr 14, 2022 19:47 UTC (Thu) by atnot (subscriber, #124910) [Link]

> sometimes it's literally just a matter of enabling the "no_std" feature flag, and you can use a bog-standard Rust library

To add to that, some of the requested libraries like pin_project and serde to my knowledge do not even really exist at runtime at all. There are a lot of utility crates like that which contain a bunch of fancy macro and type definitions, at most a few lines of which will actually end up in the final binary.

Rustaceans at the border

Posted Apr 14, 2022 22:13 UTC (Thu) by farnz (subscriber, #17727) [Link]

On top of that, this is a case where a perfect solution isn't needed; if automation can permit 70% of the crates that would work well in kernel space into the kernel, and provide a way for a human to audit and certify the remaining 30% of those that would work as accepted for kernel use, that's a good outcome. Which, in turn, means that a conservative approximation is good enough - rejecting 90% of crates as "not kernel compatible for $reason", accepting 7% as "kernel compatible", and leaving 3% as "not sure - needs a human involved" would work just fine as a solution.

Rustaceans at the border

Posted Apr 14, 2022 22:33 UTC (Thu) by ssokolow (guest, #94568) [Link]

sometimes it's literally just a matter of enabling the "no_std" feature flag, and you can use a bog-standard Rust library (with a somewhat more limited API, of course).

Rust feature flags are additive, so the convention is to declare an on-by-default feature flag named std which toggles the no_std attribute.

(no_std being equivalent to GCC's -nostdlib.)

Rustaceans at the border

Posted Apr 15, 2022 0:05 UTC (Fri) by tialaramex (subscriber, #21167) [Link]

Note that -nostdlib is getting rid of almost everything, whereas Rust's no_std only gets rid of stuff from std itself (much of what you think of as Rust's standard library was only re-exported from std and actually lives in core or alloc).

So e.g. suppose you've got a slice (maybe an array, but however you got it, some contiguous memory) of Things and you'd like to sort them. In C without the standard library you're out of luck, code it yourself, but in Rust lacking std only means you don't have a nice stable allocating merge sort, you do still get a perfectly usable (albeit not always trivially what you needed) in-place unstable sort.

https://rust-for-linux.github.io/docs/core/primitive.slice.html#method.sort_unstable

When C was invented such things would be too heavy, but today Rust's compiler and linker are certainly smart enough that if you never actually perform sort_unstable the code to implement it is omitted from your binary so the "price" of Rust's more comprehensive library is only that the documentation is a little larger and for that price you avoid the unsettlingly common (even in Linux) discovery that six people have re-implemented some useful idea, meaning the kernel carries not one but six copies of the code, and worse at least one of them is actually faulty.

Rustaceans at the border

Posted Apr 15, 2022 23:46 UTC (Fri) by Hello71 (subscriber, #103412) [Link]

> When C was invented such things would be too heavy, but today Rust's compiler and linker are certainly smart enough that if you never actually perform sort_unstable the code to implement it is omitted from your binary

afaik, unix linkers have pruned unnecessary object files from the final link for basically as long as unix has existed. this is the origin of needing to specify -l flags in dependency order. this doesn't help with dynamic linking, but rust has no magic pixie dust there either.

Rustaceans at the border

Posted Apr 16, 2022 1:00 UTC (Sat) by nybble41 (subscriber, #55106) [Link]

> unix linkers have pruned unnecessary object files from the final link for basically as long as unix has existed

Yes, but Rust goes farther. It doesn't just prune unused object files as a whole but individual functions, global data, trait implementations, and even code paths, taking advantage of static analysis across module boundaries. The closest equivalent in C would be link-time optimization, which is relatively new and not enabled by default.

Rustaceans at the border

Posted Apr 16, 2022 11:26 UTC (Sat) by smurf (subscriber, #17840) [Link]

Any C compiler worth its salt already can put each function into its own section, which the linker then skips if it's not referenced. This scheme has been supported by gcc/ld since, umm, whenever. It's hardly as comprehensive as LTO but gets the job done well enough for the kernel and, frankly, most other codebases that aren't heavily obfuscated C++.

Rustaceans at the border

Posted Apr 17, 2022 2:48 UTC (Sun) by willy (subscriber, #9762) [Link]

You might want to check whether CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is set in your kernel before making such confident assertions.

Rustaceans at the border

Posted Apr 14, 2022 19:24 UTC (Thu) by atnot (subscriber, #124910) [Link]

I think Kent Overstreet hinted at another good point in the message:

> We already have large swaths of code being imported from userspace into the kernel as-is - e.g. zstd [...]

Presumably, this is in reference to this event last November:

> Hi Linus,
>
> I am sending you a pull request to add myself as the maintainer of zstd and
> update the zstd version in the kernel, which is now 4 years out of date,
> to the latest zstd release. [...]
(https://lore.kernel.org/all/20211109013058.22224-1-nickrt...)

It sounds like it would already be pretty desirable in general to create a more formal, more automated version of the process of making vetted external dependencies available to the kernel, not just for Rust, but also for C.

Rustaceans at the border

Posted Apr 16, 2022 1:39 UTC (Sat) by flussence (subscriber, #85566) [Link]

The kernel's got a quarter of a gigabyte of binary blob sitting in the amdgpu part of the tree already, I think they've surrendered their authority to be picky at this point.

Rustaceans at the border

Posted Apr 16, 2022 1:52 UTC (Sat) by davmac (guest, #114522) [Link]

Where is that in the tree exactly? (In 5.15.32 The whole of the amdgpu directory has 45M, the containing amd directory totals 328M but most of that seems to be include files)

Rustaceans at the border

Posted Apr 17, 2022 10:05 UTC (Sun) by khim (subscriber, #9252) [Link]

Said include files are precisely these “binary blobs” people are talking about.

They include hundreds of thousands of flags (maybe millions by now?) which do god-knows-what and are not documented anywhere. The code which toggles these is, essentially, a decompiled binary blob.

This is still kind-of better than what nVidia is doing (you can execute that mess on the RISC-V without binary translator, e.g.) but only marginally so.

You couldn't even prove that said code doesn't affect the kernel! The comments which are there strongly hints that code does what it is supposed to do, but you have absolutely no way to verify if said comments are true or not!

What's worse: it's not clear if or how one can improve that situation: AMD releases new architectures too fast for anything else. If there would even be someone who may decide to actually verify what all that code is doing and AMD would be supportive of that effort… I'm not even sure they have the required information internally! They are literally doing the best effort they could, this does not mean we understand what the heck is happening there!

Rustaceans at the border

Posted Apr 18, 2022 5:55 UTC (Mon) by nhaehnle (subscriber, #114772) [Link]

You haven't actually pointed at anything?

Last I checked, our binary blobs were in the firmware repository, not the kernel. IIRC the kernel contains a pre-assembled copy of the shader trap handler, but the trap handler source is right next to it and LLVM's assembler can be used to rebuild a binary.

The header situation is unfortunate, but it is the closest thing to the hardware team's source of truth that could be open sourced. It's also decidedly not binary blobs -- there is no code there, just register definitions.

Rustaceans at the border

Posted Apr 19, 2022 5:58 UTC (Tue) by flussence (subscriber, #85566) [Link]

Nvidia produces a PNG with bundled libpng shim and gets flamed mercilessly for decades over it.

AMD throws a massive obfuscated XPM over the wall, and gets praised by the tech splogs as a messiah of FOSS for giving gcc oh so much valid input to chew on.

Intel, for all their faults, just gives people the damn SVG.

Rustaceans at the border

Posted Apr 19, 2022 6:12 UTC (Tue) by mjg59 (subscriber, #23239) [Link]

What's obfuscated about the AMD driver? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/... is a huge file, but it reads exactly like I'd expect a set of register definitions to read - there's just a *lot* of registers.

Rustaceans at the border

Posted Apr 19, 2022 20:53 UTC (Tue) by zdzichu (subscriber, #17118) [Link]

What's the point in including definitions of all the registers, while actual code uses only some of them? It's obfuscating which registers are important. On the other hand, the actual code is not obfuscated...

Rustaceans at the border

Posted Apr 20, 2022 3:48 UTC (Wed) by ssokolow (guest, #94568) [Link]

Are you really arguing in favour of less documentation?

The Andrew Schulman Programming Series ("Undocumented DOS", "DOS Internals", "The Undocumented PC", "Undocumented Windows", "Windows Internals", etc.) would like a word with you.

Rustaceans at the border

Posted Apr 17, 2022 0:58 UTC (Sun) by developer122 (subscriber, #152928) [Link]

I'd also like to point out, nearly 100% of the code in the kernel already gets downloaded from an external repository: Linus' git.

If there were actually the proper infrastructure in place, useful rust crates could be pulled down as sub-gits or whatever along with firmware and everything else. Whether it's a good idea to auto-mirror new versions from crates.io, or how dependencies should be handled is an open question, but downloading a ton of code when you start to build a kernel isn't a new idea at all.

Rustaceans at the border

Posted Apr 14, 2022 19:39 UTC (Thu) by Gaelan (subscriber, #145108) [Link]

Most user-space code, which was not written with these rules in mind, will fare poorly in this environment.

It's worth nothing that user-space Rust will fare better than user-space C in this regard. Rust's standard library is split into three pieces: core, alloc, and std. core runs pretty much anywhere; alloc contains the interfaces that require malloc() (or something similar); and std contains the interfaces that require an operating system. Kernelspace Rust obviously can't use std (and uses a custom fork of alloc), but there's a large body of "no_std" Rust code on crates.io which should uses only core and should, in principle, run fine in kernelspace.

This already happens in practice: no_std crates are extensively used by people writing Rust for devices like microcontrollers, which (due to their small memory) are also likely to be sensitive to concerns about stack space, recursion, etc.

While there will obviously be crates that would never be useful in kernel space, and it's certainly wise to be suspicious of introducing dependencies on third-part package maintainers, there's still a lot of code out there which could be very useful in the kernel.

Rustaceans at the border

Posted Apr 14, 2022 19:59 UTC (Thu) by mb (subscriber, #50428) [Link]

I think the question is more like:
Should we depend our kernel build on external sources such as crates.io?

Yes, we should allow no_std crates to be used from kernel code.
(after careful review for things like floating point, etc..)

But how should we integrate the crate?
Should we only declare the version in the kernel's Cargo.toml and fetch the latest compatible SemVer at kernel build time? That would result in kernels with the same version having different content.
Should we pin a specific version via Cargo.lock or similar?
Should we copy the crate into the kernel tree to be independent of crates.io?

Rustaceans at the border

Posted Apr 14, 2022 20:29 UTC (Thu) by djc (subscriber, #56880) [Link]

Keeping the Cargo.lock file in the Git repository is pretty common for Rust applications, we do that at work.

I don't see the point in Rust for Linux forbidding the use of crates.io crates. Yes, you'd probably want to enforce that only no_std crates are pulled in (I don't know if that's currently supported), but otherwise it seems like a great way of sharing functionality with other parts of the ecosystem.

I can see how pulling in an async runtime would be a bigger deal, but at the same time being able to write async code instead of callback hell seems like it might be important for making code easier to understand the same way it does in userspace.

Rustaceans at the border

Posted Apr 14, 2022 22:48 UTC (Thu) by ssokolow (guest, #94568) [Link]

Keeping the Cargo.lock file in the Git repository is pretty common for Rust applications, we do that at work.

I think it'd be more likely they'd want to use cargo vendor (which automatically vendors the specified dependencies) so the kernel repository can use third-party crates but rely on crates.io only as a means of standardizing/automating the rote steps in their existing workflow for manually OKing updates to vendored dependencies.

I can see how pulling in an async runtime would be a bigger deal, but at the same time being able to write async code instead of callback hell seems like it might be important for making code easier to understand the same way it does in userspace.

They'd almost certainly want to write their own executor which is just an API adapter for the relevant kernel machinery.

That's the whole point of Rust's async system being designed as it is. To avoid tying everyone to a single executor.

(Yeah, most people use tokio because the APIs provided beyond the executor itself are still working toward standardization, but those are generally std things like userspace I/O anyway, so that's not relevant to kernel or embedded use-cases.)

Rustaceans at the border

Posted Apr 14, 2022 22:42 UTC (Thu) by josh (subscriber, #17465) [Link]

I absolutely *don't* think we should pull in third-party crates over the network. I do think we should have a means of "vendoring" unmodified upstream crates that work in the kernel, without having to rewrite them.

Rustaceans at the border

Posted Apr 15, 2022 0:32 UTC (Fri) by IanKelling (subscriber, #89418) [Link]

> I absolutely *don't* think we should pull in third-party crates over the network.

I strongly agree.

For multiple reasons, including the fact that when I go to https://crates.io I get a blank page which says I should enable javascript. Please don't make kernel development require remote code execution on my machine.

Rustaceans at the border

Posted Apr 15, 2022 0:55 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

I just want to be clear on this - you'll download kernel source code from the internet and type "make", resulting in a great deal of code running on your system, but you're not ok with javascript running inside a strongly sandboxed environment?

Rustaceans at the border

Posted Apr 15, 2022 8:39 UTC (Fri) by tux3 (subscriber, #101245) [Link]

Presumably a possible defense for the more general version of this position is that the review process is very different, between web pages in general and the kernel.

While I personally run most JavaSscript without a second though, some websites that are not crates.io sometimes run undesirable JS - cryptominers, supercookies, what have you. Sometimes there is third-party JS code that has clever ways of escaping the sandboxes.
I am told Firefox's content process lost connection to the X11 server just a month ago. Although the sandbox is improving as well, hardness may sometimes be a relative thing.

In contrast, I will trust Debian to update many millions of lines of code straight from the internet into a running system, much of it which could enter my $HOME.
If the kernel were to pull in more external libraries, like it did with zstd, there's more than a couple git remotes that I'd feel I could pull from without being gifted any crypto miner or other code that might try to poke at a sandbox.

Rustaceans at the border

Posted Apr 15, 2022 8:49 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

I have no reason to believe that kernel code gets significantly more review than crates.io does - I agree that the web as a whole doesn't meet that bar, but frankly most software I pull from Debian has also had much less review than the kernel does (https://lists.debian.org/debian-devel/2003/02/msg00771.html is an example of this not going well) so it feels like what's missing is a way to express what level of trust I place in any provider of code I end up executing rather than just to assert that websites that use Javascript are unacceptable .

Rustaceans at the border

Posted Apr 15, 2022 9:08 UTC (Fri) by tux3 (subscriber, #101245) [Link]

I agree with that. As for crates.io, I have respect for the work they do and I'm happy to run their code (though I have not read it).
This may be getting off-topic, but now I'm curious if you have anything specific in mind when you write about expressing levels of trust — would that look like further sandboxing?

Rustaceans at the border

Posted Apr 15, 2022 9:27 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

Great question! I spent a while looking into whether it was feasible to apply different LSM profiles (SELinux/Apparmor/whatever) to dpkg depending on where the package was downloaded from, and unfortunately the architecture doesn't make that terribly possible. From the web perspective, I think that probably comes down to extension-level handling at the moment? In an ideal universe we'd have infrastructure to tie any given piece of javascript back to an upstream repo and make a trust decision based on things like licensing and review assurances, but that feels like kind of a lot of work.

Rustaceans at the border

Posted Apr 15, 2022 13:51 UTC (Fri) by IanKelling (subscriber, #89418) [Link]

Hah good point. I do generally try to also run make within a sandbox too though. Websites should not requiring people to download and run a program to simply read a page of text. Crates.io is almost entirely used for text that could be simple html.

Rustaceans at the border

Posted Apr 19, 2022 8:28 UTC (Tue) by LtWorf (subscriber, #124958) [Link]

> but you're not ok with javascript running inside a strongly sandboxed environment?

The fact that there is a CVE a day or so for chromium and firefox contraddicts your statement about "strongly sandboxed".

Rustaceans at the border

Posted Apr 19, 2022 8:50 UTC (Tue) by mjg59 (subscriber, #23239) [Link]

Very few of those CVEs involve sandbox escapes.

Rustaceans at the border

Posted Apr 19, 2022 9:29 UTC (Tue) by LtWorf (subscriber, #124958) [Link]

Only one is needed though.

Rustaceans at the border

Posted Apr 19, 2022 9:45 UTC (Tue) by mjg59 (subscriber, #23239) [Link]

Sure, but only one local privilege escalation bug in Linux is needed for any non-Javascript attack vector and there's more of those than there are Chrome sandbox escape chains.

Rustaceans at the border

Posted Apr 19, 2022 13:43 UTC (Tue) by LtWorf (subscriber, #124958) [Link]

But I trust anything coming from apt much more than anything coming from some website.

Rustaceans at the border

Posted Apr 19, 2022 13:59 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link]

> But I trust anything coming from apt much more than anything coming from some website.

The conversation isn't really about what you personally trust but what ought to be considered trusted by people and that should be based on empirical data. For the most part, distributions don't vet code from upstream for security issues and sometimes distribution specific patching can even introduce security holes (A notorious example is https://www.debian.org/security/2008/dsa-157).

Rustaceans at the border

Posted Apr 19, 2022 14:01 UTC (Tue) by farnz (subscriber, #17727) [Link]

Your link is truncated - I'm guessing you mean this OpenSSL predictable RNG security advisory?

Rustaceans at the border

Posted Apr 19, 2022 14:30 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link]

> Your link is truncated - I'm guessing you mean this OpenSSL predictable RNG security advisory?

Correct. Thanks

Rustaceans at the border

Posted Apr 19, 2022 16:58 UTC (Tue) by amacater (subscriber, #790) [Link]

This is a favourite stick to drag up to beat Debian with: what's more interesting is to actually go back and realise what happened when it came to light. The work by Luciano Bello was first class: Debian's response was clear and relatively immediate - explain the problems with OpenSSL and thereby OpenSSH fully, create a tool to deny-list problem keys, explain what needed regenerating and why.

The mistake came as a result of actually querying with the upstream maintainers what should be done and doing it inappropriately once too often. I haven't seen other incidents handled as well by other teams and other distributions - let alone by commercial software. It's worth looking at as something being handled well at the time and in retrospect, not necessarily trotted out at every opportunity 14 years later

Rustaceans at the border

Posted Apr 19, 2022 17:36 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link]

> This is a favourite stick to drag up to beat Debian with

As I specifically noted already, it was just an example of a distribution specific vulnerability that is more well known (and it has some significance due to OP's reference to apt but mostly incidental). Your defensive reaction to that specific example does nothing to address the broader point. Replace that example with another distribution example if it helps you, here you go:

https://nvd.nist.gov/vuln/detail/CVE-2007-5962

We can certainly find more across distributions since everytime backporting or distribution specific patching happens (even as simple as a permissions change in the filesystem), there is deviations from upstream that introduces some potential risk of bugs including security vulnerabilities that don't exist upstream. So the broader point is that humans are infallible and occasions make mistakes (not even counting malicious attackers) and we shouldn't automatically rely on distributions to provide us a better security compared to well vetted upstream projects. Atleast in some cases, they do worse.

Rustaceans at the border

Posted Apr 19, 2022 17:39 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link]

Correction: humans are not infallible.

Rustaceans at the border

Posted Apr 19, 2022 18:11 UTC (Tue) by amacater (subscriber, #790) [Link]

Well vetted upstream projects is the difficulty. "With enough eyeballs, all bugs are shallow" - not necessarily if the eyelids (or codebases) happen to be effectively closed ... and see the discussions round the security of Chrome/Chromium regularly, for example.

The case of Rust in the kernel is similar to the problems of vendored code necessary to keep some ecosystems running: Node is the obvious one that causes problems to Debian and others.

It's possible that there's just an instinctive awareness of potential problems to come.
There's _so_ much code there that's intertwined and impenetrable and we've all been subject on occasion to being force-marched forwards by large scale change in someone else's ecosystem that we can't control - see, for example init system flamewars or Python 2 -> Python 3 (or a.out to ELF if your memory is long enough).

Rustaceans at the border

Posted Apr 19, 2022 18:28 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link]

> Well vetted upstream projects is the difficulty

To be clear, the original context here was visiting crates.io website compared to installing a distro package.

> The case of Rust in the kernel is similar to the problems of vendored code necessary to keep some ecosystems running

Vendored code already exists in the Linux kernel. The Rust mechanisms for tracking vendoring is strictly better than the ones we have for vendored C based projects in the kernel.

Rustaceans at the border

Posted Apr 15, 2022 6:41 UTC (Fri) by qyliss (subscriber, #131684) [Link]

It's not a solution to any of the broader points being discussed, but I recommend https://lib.rs/, an alternative frontend to crates.io, which does not require JavaScript (and also has a much nicer interface).

Rustaceans at the border

Posted Apr 18, 2022 3:40 UTC (Mon) by ssokolow (guest, #94568) [Link]

Bear in mind that crates.io, the package index, and crates.io, the web-based browser frontend for it, are separate systems.

Think of it like GitHub or GitLab or BitBucket or Gitea. When using git with one of those sites or cargo with crates.io, you only need to enable JavaScript if you want to publish to the site and, once you've set things up and retrieved your API key(s), you don't need to log in.

Heck, comparing cargo to git is an understatement. With subcommands like search, publish, and owner, it's more like hub. (A wrapper for git which adds subcommands like git issue create for GitHub-proprietary APIs so you probably never need to use the web UI once you've set up your credentials.)

Beyond that, they do have plans to enable server-side rendering for the browser frontend... it's just that the team who handle the frontend is a bit understaffed so it's taking a while to rework it.

Rustaceans at the border

Posted Apr 15, 2022 8:23 UTC (Fri) by marcH (subscriber, #57642) [Link]

> "vendoring" unmodified upstream

Problem: everyone knows "copy/paste/diverge" sounds bad.

Solution: keep inventing new synonyms to disguise it. Like when $BIGCORP changes name after some scandal.

The real solution is of course some form of branching / forking == copy/paste under (version) control. Probably what the author of a git tool written in rust had in mind :-)

Rustaceans at the border

Posted Apr 15, 2022 16:14 UTC (Fri) by pj (subscriber, #4506) [Link]

I think it's pretty common these days to 'vendor' code via git submodules, which seems to be what you're advocating for.

Rustaceans at the border

Posted Apr 15, 2022 17:35 UTC (Fri) by smurf (subscriber, #17840) [Link]

"git submodule" doesn't vendor anything. The code is still pulled from a remote repository, it's just pinned to a specific version. Which is exactly what you want to avoid spurious external changes that introduce more-or-less-subtle security problems.

"Real" vendoring, aka copy/paste/ignore, disconnects the copy entirely from its source. (Thanks but no thanks.)

Rustaceans at the border

Posted Apr 16, 2022 7:08 UTC (Sat) by riking (subscriber, #95706) [Link]

The 'cargo vendor' subcommand does copy-paste-track by saving the copied version to the Cargo.lock file.

Rustaceans at the border

Posted Apr 15, 2022 17:12 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

What we do is we have a separate git history for the import that is subset from upstream (remove test suites, test data, docs, sources we don't care about, etc.) and commit that on the respective tracking branch. One can even transform the import to do things like sqlite's amalgamate to make a smaller set of sources. This then gets `-Xsubtree` merged into the main tree and is checked that:

- changes to this directory only come from such a merge
- the merge came from the "right" branch (tracked by its root commit)
- the merge did not modify the subtree in the merge commit (yay)

We also check that *all* such changes come in through this mechanism and we track our patches in a fork of upstream that we tag for each import for posterity.

Rustaceans at the border

Posted Apr 16, 2022 8:21 UTC (Sat) by jsakkine (subscriber, #80603) [Link]

This is more like a constraint set by the kernel process rather than an opinion :-) It's in some ways analogous to OOT modules ("sort of" same different). We cannot import unreviewed code to the kernel. It's a brickwall.

Rustaceans at the border

Posted Apr 15, 2022 15:01 UTC (Fri) by cesarb (subscriber, #6266) [Link]

> Should we copy the crate into the kernel tree to be independent of crates.io?

As mentioned in a comment above, there's precedent with things like zstd, which is also fully copied into the kernel tree instead of being kept as an external package. So I'd expect every crate included in the kernel, with the exception of the ones which come with the compiler (i.e. the "core" crate), to be vendored directly into the kernel tree.

Rustaceans at the border

Posted Apr 16, 2022 9:39 UTC (Sat) by jsakkine (subscriber, #80603) [Link]

I think this whole thing is in the scope of kernel maintainers to decide because they have to take the responsibility of using external crates. You can have an opinion but unless you are one who that has to deal with its consequences, it unfortunately does not count that much at all.

Rustaceans at the border

Posted Apr 16, 2022 14:55 UTC (Sat) by jsakkine (subscriber, #80603) [Link]

As long as pulling them does not require Internet connection :-) One must be able to build kernel without downloading external code. Importing in some cases is of course considerable but that must be done case by case.

Rustaceans at the border

Posted Apr 16, 2022 9:33 UTC (Sat) by jsakkine (subscriber, #80603) [Link]

Linux is used also a wide range of other targets than just microcontrollers. How can you deduce that a external component does not break *any* of the uses? How can you take responsibility to as a kernel maintainer to take responsibility of the code that lives outside the kernel tree, which you do not have any control?

Finally: how are you expected to fix a kernel bug if the bug is caused by an external component?

Rustaceans at the border

Posted Apr 16, 2022 12:10 UTC (Sat) by excors (subscriber, #95769) [Link]

Linux doesn't really run on microcontrollers at all - the point is that the technical constraints for kernel software on a regular desktop CPU are very similar to the constraints for application software on a microcontroller, in contrast to the very different constraints for application software on desktop CPUs. (E.g. they can't crash and expect an OS to clean up after them, so they need to think more carefully about handling heap allocation failures, and/or avoid using the heap entirely. They typically have small stacks, so they need to avoid large stack allocations or recursion. They need to handle interrupts, which are a peculiar kind of concurrency where you can't use normal mutexes. There's no pthreads. They often can't use an FPU. Code size is important. Etc.)

So there's a good chance that libraries designed for microcontroller applications will be technically suitable for the Linux kernel. Of course they'd still need to be code-reviewed and maybe adapted slightly for the specific environment they're being used in, and there's social questions about who will maintain the code and what their priorities are, but the same applies to new code that's written exclusively for Linux. Starting from an existing well-designed well-tested library should achieve the same quality with less work than starting from scratch.

Rust's no_std doesn't mean a library is necessarily suitable for microcontrollers/kernels, but it does mean it avoids some common features that would make it definitely unsuitable, so it seems a good way to filter libraries before reviewing them in more depth.

Rustaceans at the border

Posted Apr 16, 2022 14:59 UTC (Sat) by jsakkine (subscriber, #80603) [Link]

Off-topic: to cut hairs it does. μClinux was previously a fork but was integrated to the mainline already for 2.6 kernel.

Rustaceans at the border

Posted Apr 14, 2022 20:09 UTC (Thu) by mb (subscriber, #50428) [Link]

One (IMO) major problem with letting each kernel module choose to fetch random crates from crates.io is that it will result in multiple incompatible versions of the same crate being built into the kernel.

E.g. the module wants foo-1.0 and a dependency of the module wants foo-2.0. And another module requires foo-3.0. Therefore, three versions are included.

The kernel as a whole must agree on which crate versions to use.

Rustaceans at the border

Posted Apr 14, 2022 22:23 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

To an extent you'll see that happen anyway when Rust-based modules are compiled separately from the kernel. Rust code compiled for the kernel will be specialized based on how it's used in the kernel, and the same Rust code compiled for a loadable module will be specialized for the module. In Rust that can even affect things like structure layout, unless the structure has been declared #[repr(C)].

In general this isn't a problem so long as you don't try to cross kernel/module boundaries through the (unstable) Rust ABI. The module should expose C entry points and only share #[repr(C)] data with outside code—even other code written in Rust. Even if a module happens to pull in a different version of some dependency than the kernel, the dependency will be constrained to that module and will not affect the rest of the system.

Rustaceans at the border

Posted Apr 15, 2022 19:36 UTC (Fri) by bartoc (subscriber, #124262) [Link]

You actually don't have to agree on what version to use, since the build system makes sure that you see the definitions you were expecting and everything is mangled such that you can't screw it up.

It is not good to have multiple versions of stuff for other reasons, like code auditing, and code size, and compile time

Rustaceans at the border

Posted Apr 14, 2022 21:49 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

I don't foresee people whose mascot is a penguin having too much trouble welcoming people whose mascot is a crab.

As a C programmer I will say that Rust's standard library (not all of which, admittedly, the kernel gets, see other comments about the stack of core -> alloc -> std of which the kernel will receive core and a somewhat custom alloc) has much more than I was used to from the C standard library and in many cases also utility libraries like Glib.

Programs where in C I'd have reached for some third party utility libraries at least, often in Rust the standard library is more than enough. Yet it's still a very hospitable place to do the sort of low-level programming Linux is all about. For example, Rust explicitly has a char type which reflects Unicode scalar values, but it pragmatically offers predicates like is_ascii_hexdigit() both on the char type and on u8 (the unsigned byte). If you actually have bytes (say from a character device), and the only thing you care about is whether those bytes are ASCII hex digits, with no interest in whether they might be part of an Emoji or a Korean word, or anything fancy like that, then Rust doesn't waste your time.

Rustaceans at the border

Posted Apr 16, 2022 6:45 UTC (Sat) by bartoc (subscriber, #124262) [Link]

It's pretty embarrassing that C doesn't have is_ascii_digit and friends tbh.

Someone should write a paper, I bet it could get accepted.

Rustaceans at the border

Posted Apr 16, 2022 16:31 UTC (Sat) by tialaramex (subscriber, #21167) [Link]

Arguably Rust got lucky here. In the 1990s locale APIs were all the rage, and perhaps a new Rust language would have adopted locale-sensitive APIs for all this stuff as C did. After all C is also a low-level bit-banging type language, and yet nevertheless isdigit() insists on being locale sensitive.

By the early 21st century, there was more of a note of caution. An API which tells you whether this code is arguably a digit of some sort in the character encoding somebody (mis)configured on your server is rarely what you needed, while an API which says ASCII 0 through 9 are digits only is often useful. If you'd given away the latter in order to offer the former you looked a bit daft. Today, Rust offers only two things here, is this an ASCII digit, or (exclusively on its char type which represents Unicode's "scalar values" only) is this in Unicode's digit class? Got EUC-JP? Big-5? Too bad, decode them into Unicode and use our Unicode APIs.

If you'd done that in 1995 there'd be howls of outrage at what seems to be cultural imperialism at work, today not so much.

Rustaceans at the border

Posted Apr 16, 2022 19:22 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

> If you'd done that in 1995 there'd be howls of outrage at what seems to be cultural imperialism at work, today not so much.

I think this was because in 1995, surrogate pairs didn't yet exist and everybody still thought of "Unicode" as a 16-bit encoding, which is completely inadequate for encoding all of CJK (without considering the fact that you also want to have enough room left over for all of the other writing systems in the world). But even in 1996, when they did theoretically have the code space for all of CJK, they hadn't done Extension A yet, let alone all of the subsequent CJK extensions, so a whole heck of a lot of characters were not actually encoded.* And then you have the Han unification controversy etc., so it's really not too surprising that people in the 1990's were not sure that Unicode was going to work out as well as it did.

* In 1989, the Consortium took the position that these characters were not "useful" enough for this to be a real problem in practice, which is how they managed to conclude that 16 bits would be enough in the first place. Unsurprisingly, if you tell a user "you can't enter your name because it uses an obscure character," people get upset.

Rustaceans at the border

Posted Apr 17, 2022 10:23 UTC (Sun) by khim (subscriber, #9252) [Link]

I wonder what would have happened in the alternate history where IBM-360 wouldn't impose 8bit byte and people would continue with 36bit words. In such a world we may still be using 36bit systems (very few devices even today need more than 64GiB RAM) and Unicode would be simple 18bit encoding, etc.

But of course by now 8bit byte is too entrenched to do such a switch.

Rustaceans at the border

Posted Apr 19, 2022 2:01 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

> very few devices even today need more than 64GiB RAM

Eh, it depends what you use it for. If you have a bunch of dedicated machines running something like memcached, you probably do want to shove an unusually large amount of RAM into them. The same goes for NAS systems or anything else that has a lot of RAM pressure. I'm not sure if you'd actually hit 64 GiB per device - that probably depends on other factors like the size and shape of your traffic, the number and speed of CPU cores, etc., but OTOH there are RAM-heavy systems that are difficult to horizontally scale (e.g. traditional RDBMS's), so I could imagine scenarios where you might end up deploying such a thing, at least as a stopgap until you figure out how to do replication without having glaring consistency problems all over the place.

Rustaceans at the border

Posted Apr 19, 2022 6:34 UTC (Tue) by flussence (subscriber, #85566) [Link]

I feel like Unicode could easily fit all the current semantics into 16 bits of codepoints with lots of room to space, if it were redone today as RISC (consistent use of combining sequences) instead of CISC (e.g. the entire precomposed CJK glyphs area, all the latin-with-extra-squiggles codeplanes).

Still wouldn't be anywhere near simple to use, but that's how written communication is.

Rustaceans at the border

Posted Apr 19, 2022 11:06 UTC (Tue) by ssokolow (guest, #94568) [Link]

Unfortunately, it's not as simple as it sounds.

Wikipedia's commentary on Han Ideographs (Chinese, Japanese Kanji, Korean Hanja, etc.) being only offered precomposed is "However, attempts to do this for character encoding have stumbled over the fact that Chinese characters do not decompose as simply or as regularly as Hangul does."

...and I remember reading that the precomposed Latin stuff is necessary to guarantee that text strings used as opaque lookup keys (eg. filesystem paths) wouldn't get altered when round-tripping between a legacy encoding and Unicode, regardless of the circumstances.

Rustaceans at the border

Posted Apr 20, 2022 1:16 UTC (Wed) by flussence (subscriber, #85566) [Link]

I can accept that first point, but regarding latin I believe macOS munges every filename through NFD, so it's already a lost cause.

Rustaceans at the border

Posted Apr 20, 2022 3:42 UTC (Wed) by ssokolow (guest, #94568) [Link]

That's fine. Programs are supposed to assume that the filesystem may change under them. That's the cost of a shared resource which doesn't use Microsoft Visual SourceSafe-style locking overkill.

What's important is that, if the OS APIs give you a string identifier, your internal string processing can round-trip what you were given without altering it.

For comparison, I imagine that using the Windows version of Python's os.path.normcase for purposes other than in-process equality comparisons would cause i18n issues since it uses Python's internal .lower() method and thus the Unicode case-conversion tables baked into that version of Python while NTFS lookups use a case-folding table baked into the NTFS partition at the time it was formatted to ensure that Unicode updates can't introduce case-equivalence collisions for already-existing paths.

Rustaceans at the border

Posted Apr 22, 2022 10:24 UTC (Fri) by smurf (subscriber, #17840) [Link]

Assuming that there even is a legacy encoding that has composing codepoints *and* the corresponding composed characters.

And even if there is, you could mark the offenders, e.g. by placing a combining grapheme joiner U+034F between them.

IMHO the real reason is that, at the time, font rendering engines were not clever enough to show alternate glyphs for composed characters whose naïve supposition of their constituent parts simply doesn't work. (As in, all accented/umlauted/whatever'd capital letters.)

That, or the precedence of Latin-1 with its mountain of composed characters proved too strong and nobody even thought about solving the problem some other way until it was too late.

That, or the problem was deemed unfixable because instead of expanding Han-encoded texts by 50% (three-byte UTF-8 instead of two-byte words) you'd blow them up by >250% (two bytes for radical A, two for radical B, at least one for either marking the end of a glyph or a joiner; more if there's a radical C involved) which would not have been acceptable at the time. After all, at the time Weird Al chastised Microsoft that "in case you haven't noticed, four-gig drives don't grow on trees".

Rustaceans at the border

Posted Apr 22, 2022 13:48 UTC (Fri) by khim (subscriber, #9252) [Link]

> That, or the precedence of Latin-1 with its mountain of composed characters proved too strong and nobody even thought about solving the problem some other way until it was too late.

It's not even about the “precedence of Latin-1”. It's about the simple practical need to keep parts of your data in Unicode and parts in some other encoding with constant conversions between these.

It took years (about 10 to 20 years, in fact) before people, finally, stopped using legacy encodings.

If Unicode would have been impossible (or very hard and inefficient) to use in that fashion then it would have never taken off.

Size considerations were also quite real: Japan persisted for years with ISO-2022-JP both because roundtrip there is not perfect and also because it made documents 50% larger.

The only big issue with Unicode was initial assumption that 16-bit would be enough, after all: that prompted thus useless and very costly trip to USC-2 then UTF-16 and then, finally, to UTF-8.

USC-2 made sense but UTF-16 has all the problems of UTF-8 without giving you any benefits.

If people realized earlier that USC-2 wouldn't work then all that hoopla with two kinds of functions in Java, endless bugs with UTF-16 in browsers and other such things could have been avoided.

But oh, well, we can't change the path, can only adopt UTF-8 for the future.

Rustaceans at the border

Posted Apr 19, 2022 19:40 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

There are more than 100000 unique Han characters. They can usually be decomposed into simpler characters (radical + phonetic), but even with that simplification it's going to get uncomfortably close to 2^16 code points.

Rustaceans at the border

Posted Apr 17, 2022 21:19 UTC (Sun) by bartoc (subscriber, #124262) [Link]

well, the real issue is that getting a feature into C is quite involved, and I can understand nobody wanting to go through all the trouble (including possible travel with airfare and hotel costs and so on) just to standardize a one line function.

the real problem with locales (besides them not really working with variable width encodings, and being based on code units) is that programmers do NOT expect the behavior of many if these functions to change out from under them (printf is locale sensitive!). This is not just beginner users either! When the C++ committee standardized formatting (via std::format) for dates and times they accidentally made it local sensitive by basically saying “interpret the format string as strftime would”, whoops. (the std::format model is locale invariant by default with special specifiers to do locale things, and the ability to pass in a locale object if you wanna use that instead of the global one)

C locales are so totally insufficient for actual internationalization that having everything be locale sensitive basically only results in non-user-facing stuff being mangled. I hope you like your log analytics misclassifying output from all your machines in countries with a different date order than your developers! Its totally insane.

Even if they were useful for localization the actual specification is essentially “do whatever you want, unless its the C locale”, its really, really bad. And in practice implementers do just phone in locales because they aren't really useful for anything anyway. They should be deprecated and removed (or “removed” by specifying that all locales are equivalent to the C locale)

Then theres the multiple attempts at standard C encoding conversion routines, all of which are broken.

Even if you stick to Unicode you can get into trouble with cursed/unexpected unicode translation formats, GB18030 is the worst (and the only really bad one in somewhat common usage)

Rustaceans at the border

Posted Apr 19, 2022 15:21 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

We're diverting pretty far from the original topic, but as to it being difficult to get features in C, there isn't any actual requirement.

Consider the BSD sockets API. Why is that commonplace? Did some JTC1 sub-committee sign off on it and then all our operating systems got the same API? No, it's just the right shape and so everybody adopted it and any "standards" are subsequent and simply documenting what was de facto already the case about networking.

Rustaceans at the border

Posted Apr 14, 2022 22:32 UTC (Thu) by NHO (subscriber, #104320) [Link]

As a Gentoo user, I build kernel sometimes and customize the configuration to my hardware.

Sometimes, I make a mistake and end up with no network. If I can't rebuild my system in that situation, because kernel needs network for build, then that?
There is a local mirror of Gentoo repo, source included, in Russia. But our internet censorship organization could get some garbage in their heads because some ancient judge from Nowhere, Ural mountains added crates.io to the list of sites to be blocked.
Debian prevents network access when building packages, no?

What I'm saying is that any solution that breaks kernel build in airgapped sandbox is bad.

Rustaceans at the border

Posted Apr 14, 2022 22:55 UTC (Thu) by ssokolow (guest, #94568) [Link]

I remember them saying they had no plans to use Cargo for the build system but, if they decide to use it for dependency handling, I imagine they'd use cargo vendor to automate what they already do with C libraries they use.

(i.e. Keep a copy in the kernel repository and manually pull in updates from upstream every now and then.)

This cargo subcommand will vendor all crates.io and git dependencies for a project into the specified directory at <path>. After this command completes the vendor directory specified by <path> will contain all remote sources from dependencies specified. Additional manifests beyond the default one can be specified with the -s option.

The cargo vendor command will also print out the configuration necessary to use the vendored sources, which you will need to add to .cargo/config.toml

Rustaceans at the border

Posted Apr 14, 2022 23:13 UTC (Thu) by gerdesj (subscriber, #5446) [Link]

I've run Gentoo on 10s of systems for well over over a decade. There are plenty of other, far more straightforward, ways to break a Gentoo box! For starters, emerge will pull down the dependencies and build them first and work up the chain to the final result. Unless you go mad with eclean, you'll have all the bits on site.

I do understand where you are coming from though. It all sounds a bit ... devops. Before you know it we'll be piping curled configs through configure/make.

Rustaceans at the border

Posted Apr 15, 2022 5:19 UTC (Fri) by q_q_p_p (subscriber, #131113) [Link]

If they want kernel in rust, they should write kernel in rust like google does with fuchsia.
Rust is such a cancer... attaching to popular projects and destroying them - why don't they do this with llvm, huh ? because they would have to deal with at least dependency hell, proper compiler boostrapping procedure, etc.

Rustaceans at the border

Posted Apr 15, 2022 5:45 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> If they want kernel in rust, they should write kernel in rust like google does with fuchsia.

Fuchsia is written mainly in C++.

> Rust is such a cancer... attaching to popular projects and destroying them - why don't they do this with llvm, huh ? because they would have to deal with at least dependency hell, proper compiler boostrapping procedure, etc.

Cargolift.

Rustaceans at the border

Posted Apr 15, 2022 5:47 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Sorry, Cranelift: https://github.com/bytecodealliance/wasmtime/tree/main/cr...

Rustaceans at the border

Posted Apr 15, 2022 6:41 UTC (Fri) by q_q_p_p (subscriber, #131113) [Link]

LLVM: https://github.com/llvm/llvm-project
I can use the same argument about the kernel, just use redox (cranelift) if you don't like C: https://gitlab.redox-os.org/redox-os/redox and leave kernel(LLVM) alone ?

I was wrong about fuchsia though...

Rustaceans at the border

Posted Apr 15, 2022 6:44 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

The decision to use Rust will be done by actual Linux developers.

And nobody is forcing you to use the mainline Linux. If you so dislike Rust, then you are totally free to fork Linux and maintain your own Rust-less fork.

Rustaceans at the border

Posted Apr 15, 2022 10:04 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

> attaching to popular projects and destroying them

This is the kind of claim that needs hard citations to be anything other than a content-free "look at me see how much I hate Rust" signal flag.

Rustaceans at the border

Posted Apr 15, 2022 11:30 UTC (Fri) by fazalmajid (guest, #158016) [Link]

Totally agree. The Rust project’s cavalier attitude to language and compiler stability, the absurd compiler bootstrapping and limited platform support makes it throughly unsuitable for the Linux kernel., and that’s also why I won’t touch it despite the many benefits. Not to mention their belief that “curl something}bash” is acceptable practice.

Rustaceans at the border

Posted Apr 15, 2022 12:51 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

> The Rust project’s cavalier attitude to language and compiler stability

I'm curious: What's "cavalier" about their attitude to language and compiler stability at this time?

For this question, I'm only interested in claims backed by concrete examples, relating to officially designated stable features of the compiler and language, postdating version 1.0 of the toolchain.

Rustaceans at the border

Posted Apr 15, 2022 18:14 UTC (Fri) by nix (subscriber, #2304) [Link]

Oh yeah, the system in which things are tested under nightly feature guards, getting stabilized and available to ordinary code only when ready, and only then broken if unsound or after an edition change (and the old versions continue to work under older editions), and the way that feature additions are validated against all of crates.io to make sure they don't break things before going in... it's just so totally cavalier! Nothing like the elegant and refined C world, where, say, the introduction into POSIX of getline() and its consequent promotion into the _POSIX_C_SOURCE glibc feature flag (formerly it was only visible under _GNU_SOURCE) could not possibly break dozens of unrelated packages that just happened to use that as an identifier with no way to fix it other than to change all of them.

(That's not to say Rust can't improve. My personal if-wishes-were-unicorns perfect-world wish: if Rust grew the same amazing system Swift has to allow generic types with a stable ABI without heaps of monomorphization code bloat, that would be fabulous. It is an incredibly intricate system though, so I can see why it hasn't happened, at least not yet: it certainly can't be used unchanged but would need some redesign, and I'm not even sure this is possible. I know others have mused about this before, because Rust's lack of a nice way to do the shared library stable-ABI thing is probably its biggest remaining downside compared to other, lesser languages like C. C++, of course, has the exact same flaw, but unlike Rust it's much harder to apply the same cunning Swift tricks to it.)

Rustaceans at the border

Posted Apr 15, 2022 19:05 UTC (Fri) by LoganDark (guest, #158019) [Link]

At least C++ has the benefit of "well USUALLY if you use the same C++ compiler it's compatible with pretty much all C++ libraries compiled by that compiler". Sometimes it's even compatible with other compilers. With Rust it's "the ABI is 100% undefined and changes basically every commit, you can't use generic types at all because they're not present at all in the library", etc.

At least Rust has the ability to declare C functions.

Rustaceans at the border

Posted Apr 15, 2022 22:06 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

I see you've never used binary-only C++ libs.

Rustaceans at the border

Posted Apr 17, 2022 21:56 UTC (Sun) by bartoc (subscriber, #124262) [Link]

That Usually is because of monumental sacrifices in implementation quality (I am a C++ standard library maintainer). A regex that is slower than spinning up a java vm, doing the regex, and then shutting it down. A hash table that is 2x slower than it should be. An insane work stealing thread implementation that just exists to get around big kernel locks that were removed 20 years ago. A mutex that is huge and deadlocks under resource starvation. A buffered IO stream that buffers like 16 characters. A deque (linked list of buckets) with a bucket size optimized for caches from 1995. A span type that is always passed on the stack because we forgot to add the attribute to say we want the sane ABI, the one we cant make the default because that would break ABI. The only data structures that are pretty OK are the vector and the r-b tree. The lists are not horrible either, but “generic” linked lists are not terrifically useful things.

The ability to provide a stable ABI is great, but C++ has gotten into a position where the first “real” release of your library has a stable ABI and boy is that not a good thing.

Even with all the care that we take not to break abi we still break it accidentally in subtle ways from time-to-time, as do other implementations. The way C++ libraries are designed and the way the standard library in particular is specified are at complete odds with the ability to make any kind of future changes without breaking ABI. You can do such libraries, but they look like Qt, POCO, or a COM library rather than your typical boost/standard library.

Swift gets this very much right. I wish it integrated with a jit of some kind to allow slowly migrating to the “unstable abi” versions of stuff though (and switching back when the dependency changes)

Fun tidbit: I recall that for Windows Phone 7 apps were distributed as bytecode in the compiler’s intermediate format and compiled to a final executable on the phone during installation.

Rustaceans at the border

Posted Apr 17, 2022 10:35 UTC (Sun) by khim (subscriber, #9252) [Link]

> If Rust grew the same amazing system Swift has to allow generic types with a stable ABI without heaps of monomorphization code bloat, that would be fabulous

It's not just Swift. Ada uses similar system, too. And it supported it for much, much, MUCH longer.

> it certainly can't be used unchanged but would need some redesign, and I'm not even sure this is possible

Rust-the-language is certainly strict enough to support it. Rustc-the-compiler… nope.

This would be a multi-year effort and someone has to fund it. But it's definitely possible and I really wish someone would fund it.

Sadly Google is not one who needs or wants such a thing (they use static linking, mostly), Apple already has Swift… who can fund it, I wonder? Maybe Microsoft?

But yeah, it would be a nice and elegant solution for the stable ABI. Rust developers are thinking about small, very limited subset, but I really wish someone would fund development of real, polymorphic traits instead.

Rustaceans at the border

Posted Apr 19, 2022 10:40 UTC (Tue) by farnz (subscriber, #17727) [Link]

Amazon are the other entity that might choose to fund it - they're getting heavily into Rust as a language for AWS, and having a stable ABI for a Rust AWS Lambda runtime would be a neat thing.

Rustaceans at the border

Posted Apr 15, 2022 6:47 UTC (Fri) by qyliss (subscriber, #131684) [Link]

I'd expect one of the big problems here to be the build system. Right now, Rust for Linux integrates Rust into the kernel build system, rather than using Cargo. I think the attraction of using lots of third-party crates would quickly fade if developers had to write Makefiles for any dependency they wanted to add (very few Rust libraries build without a build.rs file doing something fancy somewhere in their dependency graph), but Cargo really does not like co-operating with other build systems. This has made life difficult for any sort of generic build or meta-build system to add Rust support — Meson, gn, Soong, Bazel, Nix, etc.

It comes up over and over when big established projects try to adopt Rust, but no progress has been made on it yet as far as I'm aware. I think it would require some very fundamental changes to how Cargo works.

Rustaceans at the border

Posted Apr 15, 2022 11:46 UTC (Fri) by LoganDark (guest, #158019) [Link]

If this were my project (and maybe 0.1% the size), I'd migrate the rest of the existing codebase to use Cargo and Rust build scripts, rather than the other way around (trying to get Cargo to fit into makefiles/something else), since other parts of the kernel are probably going to eventually be rewritten in Rust as well.

That doesn't exactly sound feasible here, but it's a nice thought. imho, it would be an almost-all-around win for Cargo to become more interoperable with existing build systems, but it could also encourage laziness or tricky setups.

Rustaceans at the border

Posted Apr 16, 2022 7:16 UTC (Sat) by pkolloch (subscriber, #21709) [Link]

One thing that would helped me immensely:

Have a cargo sub command that builds a crate in isolation. With a command line option for the directory of the output artifacts. And being able to specify all the dependency outputs as input.

Rustaceans at the border

Posted Apr 16, 2022 8:35 UTC (Sat) by atnot (subscriber, #124910) [Link]

What would that provide over just using rustc directly?

Rustaceans at the border

Posted Apr 16, 2022 17:45 UTC (Sat) by pkolloch (subscriber, #21709) [Link]

There is a lot of plumbing that cargo does:

setting env variables
running build scripts
setting features
setting something like a symbol prefix to avoid clashes
linker flags
... and a lot of other details

It's surprisingly hard to replicate. I know because I helped doing it for nix with contributions to buildRustCrate in nixpkgs and creating crate2nix.

Rustaceans at the border

Posted Apr 15, 2022 7:31 UTC (Fri) by tlamp (subscriber, #108540) [Link]

> "Async Rust" knows nothing about kernel threads or how context switching is done in the kernel, for example.

Hmm, but it doesn't know that stuff for the user space either. Async is basically sugaring to wrap handling a pinned future type, each runtime, and future type (with a poll funciton, and a pending and ready state), can map that to their actual, underlying execution model of choice. I mean, that's why the Rust ecosystem can have multiple async runtime libaries like tokio, async-std or smol.

https://rust-lang.github.io/async-book/02_execution/01_chapter.html

Rustaceans at the border

Posted Apr 18, 2022 3:32 UTC (Mon) by Paf (subscriber, #91811) [Link]

Ok, but none of those implementations know anything about the kernel. We’ll have to add one and if we don’t it won’t work - that’s the point being made. Can’t generally bring in user space stuff without issues.

Rustaceans at the border

Posted Apr 15, 2022 8:30 UTC (Fri) by seanyoung (subscriber, #28711) [Link]

Jonathan Corbet, I must say I am disappointed with the reporting on rust. Rust is an entirely new language with many interesting properties, memory safety for example and just how expressive the language is.

Instead we get an article which almost uniquely focusses on the ability to pull in external crates, which is always going to be controversial for a kernel.

A lot of us kernel hackers will need to learn how to use rust in the kernel. How about something a bit more in-depth?

Rustaceans at the border

Posted Apr 15, 2022 8:35 UTC (Fri) by beagnach (subscriber, #32987) [Link]

> A lot of us kernel hackers will need to learn how to use rust in the kernel. How about something a bit more in-depth?

Let me google that for you...

https://www.google.com/search?q=site%3Alwn.net%20rust&...

Rustaceans at the border

Posted Apr 15, 2022 8:46 UTC (Fri) by seanyoung (subscriber, #28711) [Link]

Nothing in depth to see there.

Rustaceans at the border

Posted Apr 15, 2022 13:29 UTC (Fri) by pbonzini (subscriber, #60935) [Link]

https://lwn.net/Articles/869428/ might have something for you.

Rustaceans at the border

Posted Apr 15, 2022 10:39 UTC (Fri) by Karellen (subscriber, #67644) [Link]

Um, that's the subject that the kernel mailing list threads he was reporting on were about? As in, that's a subject that kernel developers are currently discussing, which is why it's worth summarising now.

What?

Rustaceans at the border

Posted Apr 15, 2022 12:02 UTC (Fri) by kay (subscriber, #1362) [Link]

I'm in contrast are completely happy with the lwn covering of rust. If I want to learn developing in Rust I wouldn't expect lwn as place to go Ym2C

Rustaceans at the border

Posted Apr 15, 2022 13:11 UTC (Fri) by xav (subscriber, #18536) [Link]

Seconded; we see lots of Python technical articles but I don't think there's much chance to have some parts of the kernel written in it anytime soon ... it'd be nice to have at least some example of where Rust shines for kernel dev (some obvious ones: getting rid of void*, and getting rid of "if (myptr != NULL)").

Rustaceans at the border

Posted Apr 15, 2022 16:47 UTC (Fri) by thoughtpolice (subscriber, #87455) [Link]

That's because it's the topic of the mailing list discussion.

You can quite literally find dozens of articles about using Rust for things like kernel/bare metal programming if that's what you want. It's actually so easy I'm not sure why you're bothering to complain here. But regardless of complaints, it's not what the topic of the discussion is about, though.

Rustaceans at the border

Posted Apr 21, 2022 2:23 UTC (Thu) by ssokolow (guest, #94568) [Link]

A couple of things listed in that second-last URL (The Little Book of Rust Books) that I'd like to call out as relevant:

I generally recommend Learning Rust With Entirely Too Many Linked Lists as the next step after https://www.rust-lang.org/learn because it helps to solidify in your mind what ownership and borrowing actually mean for implementing data structures.
Learn Rust the Dangerous Way is helpful for learning Rust from the perspective of a C programmer by translating some C into equivalent but ugly and un-idiomatic Rust code that the compiler can't help much to verify and then cleaning it up.

Rustaceans at the border

Posted Apr 15, 2022 19:32 UTC (Fri) by bartoc (subscriber, #124262) [Link]

At the very least they probably don't wanna allow multiple versions of packages, which is very common in rust projects (but people don't realize it, because they are hidden behind an up-to-date dependency).

Cargo doesn't really like being the passenger in the build process, it likes to drive things itself, and that's probably not gunna fly in the kernel either, which means they need to interact with the rust compiler itself, and that interface is not well documented.

At a higher level I think this is the sort of thing that could cause rust to stabilize their ABI unintentionally, without the core rust developers having a choice in the matter.

Rustaceans at the border

Posted Apr 15, 2022 20:20 UTC (Fri) by shemminger (subscriber, #5739) [Link]

Seems like getting a secure language with the insecurity of an external package system would be a bad tradeoff.

Rustaceans at the border

Posted Apr 15, 2022 23:14 UTC (Fri) by ssokolow (guest, #94568) [Link]

Which is why Cargo provides multiple mechanisms for allowing projects to choose a balance that works for them. For example:

By default, it generates a lockfile that stores SHA256 hashes to ensure that an attempt to slip in an unapproved change will fail the fetch.
The cargo vendor command automates the process of vendoring your dependencies so you can have the benefits of a dependency manager without having to rely on an external source for the code.
Should you so choose, Cargo supports overriding package sources to map them to a mirror you control.

Ensuring security shouldn't be any more difficult than with an external C codebase like zstd that you periodically import into your repo.

Rustaceans at the border

Posted Apr 16, 2022 8:30 UTC (Sat) by jsakkine (subscriber, #80603) [Link]

The alloc crate in Rust does not seem to scale to kernel's requirements at all. How do you handle OOM at the call site? How do you reach `kmem_cache_*`, when your needs are more specialized than `kmalloc()`?

Rustaceans at the border

Posted Apr 16, 2022 9:08 UTC (Sat) by ssokolow (guest, #94568) [Link]

I haven't kept up on the details but, last I heard, the plan was to address that by providing their own fork/adaptation of alloc that breaks some APIs. The kernel's innards would be no_std anyway, so the main purpose in not breaking alloc APIs would be keeping things familiar for developers.

Rustaceans at the border

Posted Apr 16, 2022 15:39 UTC (Sat) by tialaramex (subscriber, #21167) [Link]

Here's the (documentation for the) Rust for Linux alloc crate: https://rust-for-linux.github.io/docs/alloc/index.html

As you will notice, in Rust for Linux the alloc crate's APIs lack functions such as new() which are infallible, and instead only provide the fallible try_new() function, whereas in conventional Rust you can choose.

Rustaceans at the border

Posted Apr 16, 2022 17:46 UTC (Sat) by jsakkine (subscriber, #80603) [Link]

Fallible allocation is only a small portion of the problem.

Also, what is required even for the drivers are access directly kmem_cache and gfp_t flags.

Rustaceans at the border

Posted Apr 17, 2022 11:35 UTC (Sun) by tialaramex (subscriber, #21167) [Link]

If it's "required even for the drivers" then presumably both the C and Rust drivers presented for comparison here:

https://lwn.net/Articles/863459/

... illustrate that ?

Of course almost a year has passed since that was written, so Rust and Rust for Linux has made further improvements, but presumably you don't mean that these things somehow became mandatory in the last 12 months.

Rustaceans at the border

Posted Apr 16, 2022 10:43 UTC (Sat) by jsakkine (subscriber, #80603) [Link]

I've added quite a few comments to this one but here's one more: when is Rust compiler/language going to start retain full backwards compatibility between the so called editions?

I don't see future of Rust in kernel before this happens. In order to reach mainline Linux, Rust needs to have major changes to the more "conservative" side on how the language and compiler are developed.

Rustaceans at the border

Posted Apr 16, 2022 11:04 UTC (Sat) by mpr22 (subscriber, #60784) [Link]

> when is Rust compiler/language going to start retain full backwards compatibility between the so called editions?

Do you have a concrete example of a case where the compatibility rules documented in https://doc.rust-lang.org/edition-guide/editions/index.html are being broken?

If so, has someone filed a bug report, and what was the response?

In the alternative, if you aren't claiming those rules are being broken, but rather that those rules do not satisfy your demands, perhaps you could clarify what the deficiency is.

Rustaceans at the border

Posted Apr 16, 2022 14:19 UTC (Sat) by jsakkine (subscriber, #80603) [Link]

That's trivial. It's hard to orchestrate changes around the mainline kernel if a change to the language that is not backwards compatible. Direct quote:

"There are times, however, when it is useful to be able to make small changes to the language that are not backwards compatible."

This is unacceptable.

Rustaceans at the border

Posted Apr 16, 2022 16:06 UTC (Sat) by mpr22 (subscriber, #60784) [Link]

The paragraph following the one that contains the sentence you're reacting to says:

> When we want to release a feature that would otherwise be backwards incompatible, we do so as part of a new Rust edition. Editions are opt-in, and so existing crates do not see these changes until they explicitly migrate over to the new edition. This means that even the latest version of Rust will still not treat async as a keyword, unless edition 2018 or later is chosen. This choice is made per crate as part of its Cargo.toml. New crates created by cargo new are always configured to use the latest stable edition.

So if you write all your code in Rust 2018, and label it as being in Rust 2018, it should continue to be treated as Rust 2018 by the toolchain for as long as you keep it labelled as being in Rust 2018, and any failure by the toolchain to do so is a bug which should be reported post-haste.

And then a bit further on:

> The most important rule for editions is that crates in one edition can interoperate seamlessly with crates compiled in other editions. This ensures that the decision to migrate to a newer edition is a "private one" that the crate can make without affecting others.

So your code built as Rust 2018 can interoperate with crates that were built as Rust 2021 or Rust 2015 (Rust 2015 being the initial stable edition associated with v1.0 of the toolchain, and the edition your code is treated as if you do not explicitly specify an edition); in pursuit of this goal, they even created a "raw identifiers" mechanism when defining Rust 2018, to allow a crate which declared public interfaces using identifiers that were valid under Rust 2015 but have become keywords in Rust 2018 to continue to present those public interfaces under the same name.

Rustaceans at the border

Posted Apr 16, 2022 19:31 UTC (Sat) by jsakkine (subscriber, #80603) [Link]

I would like to see a clear explanation how the maintenance flow would work with stable and long-term kernels and all these editions. The oldest long-term still in maintenance was released in late 2016. This looks like initially a nightmare for backporting and bisecting stuff. I admit that this might be because my lack of experience with Rust but that's why Rust community should be able to explain this so that you cannot misunderstand instead of just referencing Rust documentation.

Rustaceans at the border

Posted Apr 16, 2022 22:53 UTC (Sat) by beagnach (subscriber, #32987) [Link]

You'll probably get a better answer in lkml, since all of this is a work in progress with issues like this still being figured out by some very dedicated people who (if you bother to follow the discussions) actually do care very much about stability, compatibility and so on.

Your chances of getting a good reply would probably be increased by reducing the amount of hostility and condescension in your tone.

Rustaceans at the border

Posted Apr 19, 2022 5:53 UTC (Tue) by jsakkine (subscriber, #80603) [Link]

I did subscribe to that now. Continuing over there. Thanks for pointing out. I did not know that vger list already exists.

Rustaceans at the border

Posted Apr 17, 2022 10:56 UTC (Sun) by khim (subscriber, #9252) [Link]

> I would like to see a clear explanation how the maintenance flow would work with stable and long-term kernels and all these editions.

Only the kernel can answer that. C and C++ also have editions and they also are not 100% compatible. Yet kernel contemplates them, anyway.

Rust editions happen every 3 years, their incompatibility is usually pretty limited and there are guide which helps you go from one edition to another.

They are also enabled on the per-crate basis and it's explicitly permittable to link together crates compiled for different editions.

Thus I would assume kernel folks would stick to oldest editions which are used in still supported versions of the kernel. Or maybe would allow use of the new ones in modules which are not support in older kernels.

Frankly, in practice editions are such a minor issue that it's not even worth discussing: yes, these changes are backward-incompatible but they are extremely minor (holf-dozen to dozen changes every three years) and mostly superfluous (as in: code written for the new edition can still be converted to old editions and in very mechanical fashion). A small nuisance instead of a major PITA.

Much more significant are non-edition style changes. These are numerous, happen every six week and while they are backward-compatible people embrace them quickly which means you can not easily backport new versions of important crates to old version of kernel without bumping the used version of rust compiler first.

You are barking on a wrong tree.

Rustaceans at the border

Posted Apr 17, 2022 22:40 UTC (Sun) by ssokolow (guest, #94568) [Link]

non-edition style changes

I think calling them style changes is a bit misleading. Generally, the stuff that bumps the minimum supported Rust version is the addition or stabilization of new standard library types/functions that people are then quick to jump on because they significantly reduce some boilerplate that has become near enough to ubiquitous to justify their creation.

(If you want to see the kind of things I'm talking about, caniuse.rs provides a quick overview.)

For example, one of the things introduced in Rust 1.56 was adding the rust field to Cargo.toml so libraries can specify the minimum supported Rust version in a machine-readable way.

Rustaceans at the border

Posted Apr 16, 2022 16:08 UTC (Sat) by tialaramex (subscriber, #21167) [Link]

> This is unacceptable

It seems that Linus Torvalds does not agree, since the kernel did in fact take a "small change to the language that is not backwards compatible" by going from C89 to C11.

Unlike this change, Rust's Editions aren't actually language version changes, as you'd have discovered if you read more than a few sentences into that document. The allowed changes can alter the written syntax but not the abstract syntax. Thus, code from Rust 2015, Rust 2018 and Rust 2021 can be expressed together in a single abstract syntax inside the compiler, even though some things you were allowed to do in Rust 2015 are prohibited in Rust 2021, and some things in Rust 2021 were impossible in Rust 2015 and the way you spell some of the things that are allowed still changed if you decided to adopt a newer edition.

For example, in C11 _Bool became a keyword, and in Rust 2021 async is a keyword. So, there's no way to name the identifier _Bool in C11, it's just gone, you'll be told this was anyway reserved and so you were wrong to use it (but the kernel uses various reserved names all over the place and does not respect C's rule about reserved names). Whereas you can name the identifier async in Rust 2021, it's just spelled r#async

Rustaceans at the border

Posted Apr 16, 2022 21:02 UTC (Sat) by jsakkine (subscriber, #80603) [Link]

Hmm... I would not compare something where the breaking change happens for the first time in 30 years, for legit reasons, to the situation with Rust. Quite a bad comparison IMHO.

Rustaceans at the border

Posted Apr 16, 2022 22:57 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

gcc has _plenty_ of non-backward-compatible changes. Try compiling the kernel with gcc 3, for example.

As it is, Linux has a minimum version requirement for gcc (or clang) that gets raised pretty often. I believe it's at 5.1 right now: https://lore.kernel.org/lkml/20210910234047.1019925-2-nde...

So in practice, the _current_ Linux policy is worse than Rust. Instead of a well-defined edition, the kernel depends on an informal compiler version.

Rustaceans at the border

Posted Apr 17, 2022 11:15 UTC (Sun) by khim (subscriber, #9252) [Link]

> Hmm... I would not compare something where the breaking change happens for the first time in 30 years, for legit reasons, to the situation with Rust. Quite a bad comparison IMHO.

It's actually a very good comparison. The situation when you upgrade GCC and kernel stops working were so numerous that RedHat even had a special kgcc package for a time. And Android used a separate compiler just for kernel for a long time, too.

Granted, it's not because C definition, as a language, changes, but because clang/gcc and kernel disagree about certain minor features of the standard… but from a practical POV it's the same thing.

In theory, the same thing can happen in Rust, too… but in practice, because safe code doesn't contain UB (except when you exploit bugs in the compiler) it's much less of a problem. I don't think I have ever heard about the case where someone upgraded the compiler and Rust program started misbehaving. Sometimes (rarely) there are compilation issues, but they are usually easy to fix.

But with Rust we have an issue similar to C++ in the last century: it's incomplete. Certain important features are not implemented yet. Yes, that doesn't affect the code in way where you can not use new compiler to compile old code but the fact that these features are slowly-but-surely are added to the language means that many crates force you to use pretty new version of a compiler.

That is serious issue in practice. Please think what you plan to do about that, forget about editions, these are minor things.

Rustaceans at the border

Posted Apr 17, 2022 13:52 UTC (Sun) by tialaramex (subscriber, #21167) [Link]

> Please think what you plan to do about that

We're not talking about a feature set here, just one number which goes up. If you have Rust 1.72 then by definition all the stable stuff from Rust 1.71, 1.70, 1.69, and so on is available. You may not care very much whether checked_div() is const (Rust 1.52), but if you want IntoIterator implemented for arrays (Rust 1.53), you'll get const checked_div into the bargain.

The feature switches I think could be more contentious because at the same time some people see value in enabling some feature switch, other people will have begun to depend on it not being present in some cases. But this is already something you see for C compiler flags.

Rustaceans at the border

Posted Apr 17, 2022 14:12 UTC (Sun) by khim (subscriber, #9252) [Link]

The problem is not the “some number goes up”. The problem is that Rust is still getting major features like const generics or GATs.

And because these are significant, major features people often adopt in the matter of weeks and months. Some “conservative” crates only use features which are six months old, but not all crates which kernel may need or want would be “conservative”.

And even then six months is not much, kernel is accustomed for times measured in years. GCC 5.1 is still supported and that was released seven years ago!

That impedance mismatch would be much bigger problem than any hypothetical issue with Rust editions.

Yes, in theory Rust editions can wreak total chaos every three years. In practice that's a tempest in a teapot: they come rarely enough and changes are minor enough that this formal incompatibility rarely becomes a problem in practice.

But the requirement to use six months old compiler can be real PITA for may kernel users.

Rustaceans at the border

Posted Apr 17, 2022 23:13 UTC (Sun) by ssokolow (guest, #94568) [Link]

That said, if you'd like some concrete data, How often does Rust change? by Steve Klabnik is good.

Rustaceans at the border

Posted Apr 17, 2022 22:10 UTC (Sun) by bartoc (subscriber, #124262) [Link]

Rust “UB” is a smaller category than c/c++ because theres no formal standard for the language with multiple implementers. Not all ub in C is assumed to never happen by compilers (for ex: most of the preprocessor and lexer UB)

The other thing to remember is that the C standard can be obscenely vague about what implementations are allowed to do, the C++ standard tries much harder.

Rustaceans at the border

Posted Apr 19, 2022 15:44 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

> Rust “UB” is a smaller category than c/c++ because theres no formal standard for the language with multiple implementers.

It's not about multiple implementations, at most that would lead to having Implementation Defined Behaviour which is much less scary, but in many cases there wouldn't be any difference since the additional implementations would just do the same thing.

C++ in particular is riddled with _intentional_ Undefined Behaviour. But even in the mundane C standard library trivial functions like abs() have Undefined Behaviour.

Rustaceans at the border

Posted Apr 19, 2022 12:34 UTC (Tue) by farnz (subscriber, #17727) [Link]

An important point here is that while Rust 2021 code will not compile with a Rust 2015-only compiler, a future Rust compiler is expected to be able to compile all past editions, and to support interface-level interoperability between the editions at the crate level. Note that the crate is the unit of compilation in Rust, just as the translation unit is the unit of compilation in C.

So, within a crate, I must stick to one edition; but within a project such as the kernel, I can freely mix editions (because I'd expect the kernel to be made of several crates, not to be one giant crate - just as in C I'd expect the kernel to be in multiple files spread around several directories, and not in a single giant source file). The only limitation I have is that I cannot use recent editions without a bump to the MSRV; but the same applies to C, where I can't use the full language without allowing for the bugs in older versions of GCC.

Rustaceans at the border

Posted Apr 17, 2022 17:45 UTC (Sun) by wtarreau (subscriber, #51152) [Link]

I particularly despise automatic downloading of dependencies during build, and it's unrelated to the language. Many of us have already been bit by not being able to rebuild an old piece of software just because some dependencies were missing. For embedded systems as well as for some mission critical systems, you often need to be able to rebuild the same code 5 years later, to reproduce an equivalent system consecutive to a disaster, or to reproduce a production bug in a test environment. The kernel is one of such components you need to be able to reproduce, and it's really vital to be able to keep a local copy of *all* your dependencies and never rely on some $site that happily displays "this domain is for sale", or a download repository that displays "404 (as many have seen after trying to apply minimal fixes to very old debians for example after the repos were moved)".

Sure it's "more convenient for us developers" but developers' convenience is exactly what brings important costs in field (resource and problems). What made the success of linux is not just that it was opensource, but also that it was very reliable and reasonably easy to enter into. A reasonable balance needs to remain here. We don't need to see random code being injected all the time just to add a line on a resume like we're seeing dummy bug reports trying to be presented as security issues to get a CVE ID to the reporter.

I would like that the developers continue to make a little bit of effort to make sure that the code that builds today will build exactly the same in 10 years if I download the same tarball and build on the same system with the same tools. Right now the kernel stands by that critical promise because it's entirely self-contained. If some want other approaches, at least they should provide scriptable methods to retrieve everything and make sure one can always rebuild exclusively from these downloads.

Rustaceans at the border

Posted Apr 17, 2022 23:08 UTC (Sun) by ssokolow (guest, #94568) [Link]

I think the "random code being injected all the time just to add a line on a resume" being undesirable is the important thing.

Cargo isn't NPM and they provide multiple layers of protection against kernel code failing to build ten years from now:

Crates.io is designed so that only the admins can remove things and they don't do that unless either they're legally compelled to or the thing in question is out-and-out malicious. Developers can cargo yank crate versions with security vulnerabilities to prevent new things from depending on them, but they're still available to be downloaded by any project that has them in the lockfile that Cargo automatically maintains, and you're encouraged to commit your lockfiles to version control alongside your code.
The cargo vendor command automates what the kernel devs currently do with things like zstd, providing something similar to the status quo, but while still preserving machine-readable info on where they came from that allows you to streamline monitoring for updates and auditing them for manually approved inclusion.
They care enough about forward compatibility and their v1.0 stability promise that, when they're considering making a change to the compiler or standard library in line with "We reserve the right to fix compiler bugs, patch safety holes, and change type inference in ways that may occasionally require new type annotations.", they've got a bot named Crater that can build and run the test suites for a chosen subset of the crates on crates.io, with "every single one" being a valid option. (Thank goodness for Microsoft donating the Azure time. It's a resource-expensive process.)
C has _Bool because too many people typedef'd bool. Rust has namespaces and editions so they can introduce new keywords like async and add new things to the prelude (the set of identifiers that get pulled into scope by default) without breaking existing code, because you have to opt into newer editions... and you can mix editions in the same build. They're just different frontends to produce the same underlying AST.
The compiler, standard library, etc. have an extensive test suite that gets run on every pull. (Another thing that would have become either prohibitively expensive or cripplingly slow with the growing number of platform targets without Microsoft donating Azure time.)
There are ongoing efforts to improve things further.

Rustaceans at the border

Posted Apr 19, 2022 17:48 UTC (Tue) by jsakkine (subscriber, #80603) [Link]

In this discussion I realized that it is better to experiment with the code rather than complain from the side. I did learn about "cargo vendor" command today, and it looks like a valid option for selected components. And I'm experimenting on bringing some of the stuff that I'm maintaining in the kernel to Rust. I'm also now following the vger list, which I also learned about thanks to this article.

One thing that might become obstacle is the state of Rust front-end support in GCC because it is hard to imagine that this could end up to mainline without being able to compile it fully with stock GCC. Also, in order to use Rust ever in the core would probably require full coverage of arch/*. But I guess the latter is not end of the world as far as drivers are concerned.

Aggregate valuable and interesting links.
Joyk means Joy of geeK