3

There's No Such Thing As "Implicitly Atomic"

 11 months ago
source link: https://belkadan.com/blog/2023/10/Implicity-Atomic/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

There's No Such Thing As "Implicitly Atomic"

If I have an aligned machine-word-sized variable (Int) and I store to it from Thread A, then I know Thread B might see the old value instead of the new value (because of per-processor caching, or the compiler “hoisting” a load to earlier in the function). But there’s no way, on a modern processor, that Thread B sees a mix of the old and new value, right? That can only happen with wider values, or unaligned values, that the code may update non-atomically, right?

This question is paraphrased from the Swift forums, though I’m not linking to it cause it’s in the middle of a larger thread and it’s something people might reasonably ask anyway. My response, lightly edited, is below; it is Swift-oriented but also applies to C, C++, and Rust.

I agree that “torn” values, with a mix of the old and new value, are very unlikely in this case—all modern processors that I know of do guarantee that reading or writing an aligned, machine-word-sized value happens as a single unit, meaning no “tearing”. However, you’ve left out the possibility that there’s more going on than just “read memory” and “write memory”. The silliest example is if the word-sized value is an instance variable, or global or static variable, in which case Swift’s dynamic exclusivity checks will kick in and potentially complain.

EDIT: Even that isn’t guaranteed. If you don’t say “atomic”, the compiler might decide to split up a store to optimize for speed or code size! This isn’t hypothetical; Greg Parker ran into this with libobjc.

But let’s assume you’re doing a direct access through an UnsafePointer, the closest you can get in Swift to emitting a single aligned load or store instruction. Do you have a guarantee of the behavior you described above? No, you still don’t. In fact, there’s an immediate way today to make that break: turn on Thread Sanitizer.1 (The way you tell Thread Sanitizer that it’s okay for other threads to read a stale value is to use relaxed atomic operations, though note that even that may not be good enough depending on what you’re trying to do.2)

But maybe you say Thread Sanitizer is an artificial scenario that doesn’t count? For portable and futureproof code, that’s still not good enough. The OS, or hardware, or anything really, is permitted to track additional information about memory; in fact, we know it does this kind of thing on a per-page basis on most operating systems to support permission protections and virtual memory. So it could in theory do something like TSan for every program, making pages or sub-regions of pages “thread-associated”, and catching any attempt to access them across threads, for both performance and correctness reasons. This sounds like an outlandish amount of work and overhead to enable for every process by default, but arm64e exists, even if it’s only being used in limited scenarios. Other memory-safety-focused ABIs/architectures like CHERI exist as well.3 So I wouldn’t say it’s impossible, even if today’s mainstream4 x86_64 and arm64 processors and operating systems don’t do anything like that.

Finally, undefined behavior is undefined. I don’t want to make undefined behavior a bogeyman that will deliberately produce wrong answers, but the compiler is within its rights to say “I can prove this load races with that store and therefore I can load whatever value I want”. (This usually happens when the compiler is assuming a certain combination of conditions can’t possibly happen and therefore it should save code size rather than emit code that “should” be dead.)

So no, do not use a single non-atomic machine-word load or store to communicate across threads without any other form of synchronization in Swift. Or C. Or Rust.

  1. Awkwardly, the link from this blog post to the Apple page about TSan is broken, because Apple has since reorganized that section of the developer docs. The new docs don’t have an easily-linkable page, though. That said, there are lots of third-party explainers for using TSan within Xcode, as well as plenty about using TSan on the command line with C or C++, and a few even for Rust. ↩︎

  2. Thanks to zwarich for pointing this out. Atomics are hard, because C and Swift and Rust expose extremely subtle operations for Maximum Speed. ↩︎

  3. I could have linked directly to the official CHERI website but Gankra manages to explain it way more concisely and in a much more friendly manner. ↩︎

  4. “mainstream” added to satisfy Gankra. (Thank you.) ↩︎

This entry was posted on October 04, 2023 and is filed under Technical. Tags: Swift, Compilers


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK