Data race spans by saethlin · Pull Request #2646 · rust-lang/miri · GitHub
source link: https://github.com/rust-lang/miri/pull/2646
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Contributor
oli-obk left a comment
r=me after a doc nit
Outdated Show resolved
Outdated Show resolved
Outdated Show resolved
@@ -172,7 +173,7 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriInterpCxExt<'mir, 'tcx> { | ||
// Each complete happens-before the end of the wait |
||
if let Some(data_race) = &this.machine.data_race { |
||
data_race.validate_lock_release(&mut init_once.data_race, current_thread); |
||
data_race.validate_lock_release(&mut init_once.data_race, current_thread, current_span); |
Would it make sense to move the current_span
calls behind the fn call? That seems nicer than putting the burden on all callers. Basically instead of passing current_thread and current_span it could pass &Machine
?
EDIT: Ah, that doesn't work since it overlaps with data_race
. Hm... I guess we can go ahead with separate arguments for now, but if you have a nice idea for how to avoid having to pass so many arguments, that would be great.
src/concurrency/vector_clock.rs
Outdated Show resolved
Outdated Show resolved
| |
||
LL | ... *pointer.load(Ordering::Relaxed) |
||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
||
help: The Allocate on thread `<unnamed>` is here |
Nit: is "is" the right verb? One of them happened in the past. Do we know which one? Can we make the message "Earlier, the X on thread Y happened here" and "Now, the X' on thread Y' happened here"? In fact isn't the latter span redundant with the main error span?
Contributor
Author
saethlin Dec 21, 2022
Hah I was also not sure what words to use here. This is a data race, so strictly does it make any sense to say that one of these operations happens before the other? They happen in a particular order in the interpreter, but should diagnostics be commenting on the interpreter or the program semantics (or lack thereof)?
I chose present tense because I imagine this diagnostic as a response to the current user experience, where someone would rightfully ask "But Miri, where is the read/write/allocate/deallocate?" so now Miri says "The read/write/allocate/deallocate is here"
(As I describe this I realize it is not consistent with how I worded the Stacked Borrows diagnostics. Ah well.)
With vector clocks, there could actually be considerate amounts of "real time" between the two operations, even in an observable way. But you also have a point.
What about the fact that one of these two spans is always redundant since it is the same as where the error points? I think it'd be better to just say "the other access was over there" or so?
Contributor
Author
saethlin Dec 22, 2022
Yeah, I've tried to reduce the duplication here, but I feel backed into a corner again by the limited diagnostics API. Doesn't change the fact that this is still much better than before.
This seems potentially confusing, since after the help
we have a backtrace that attaches to the "main span", not the "other access span". I think that's why I think it is good to say something about the "earlier conflicting access". Also if both accesses are of the same kind, saying "the Write" is not very informative.
So what about
error: Undefined Behavior: Data race detected between (1) Atomic Store on thread `<unnamed>` and (2) Read on thread `<unnamed>` at ALLOC. Access (2) just happened here
--> $DIR/atomic_write_na_read_race1.rs:LL:CC
|
LL | *atomic_ref.get_mut()
| ^^^^^^^^^^^^^^^^^^^^^
|
help: and access (1) occurred earlier here
--> $DIR/atomic_write_na_read_race1.rs:LL:CC
|
LL | atomic_ref.store(32, Ordering::SeqCst)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE:
= note: inside closure at $DIR/atomic_write_na_read_race1.rs:LL:CC
Yeah it is annoying that the same text is repeated for the error title and label -- that would be #2200, which will be very annoying work since all the ui tests will need adjustment.
I agree there is potential for confusion with the backtrace. I often find the Stacked Borrows errors a bit visually confusing, and have often wanted some other way to format all this information. I'll try to keep an eye on how this plays out.
I like this numbering strategy overall, I'm just not sure about calling these an "access". That word has a particular meaning in Stacked Borrows, and feel like calling an allocation or deallocation an access might be confusing to newcomers (even though it is technically correct to use the word). I think we can just drop the word without loss of clarity?
Also it occurs to me that we're now comparing a "local" span with the topmost frame, so things could look a bit odd if the race occurs through a standard library function that isn't #[track_caller]
. I can't find any examples of this, so it may be that the way people write data races, this just isn't a concern. Just mentioning this so you're aware of it, I think we shouldn't cook up a solution until we have a real example.
Also it occurs to me that we're now comparing a "local" span with the topmost frame, so things could look a bit odd if the race occurs through a standard library function that isn't #[track_caller].
The main backtrace will also skip track_caller
frames by default so I think one needs to pass -Zmiri-backtrace=full
to get anything weird?
I pushed an attempt to clarify the backtrace situation a bit. What do you think?
I think the clarification is good. It's a marginal improvement, but an improvement.
Here is an example of a program that produces the flavor of weird output I'm referring to:
use std::sync::atomic::AtomicPtr;
fn main() {
let mut v: Vec<i32> = Vec::with_capacity(2);
let ptr = AtomicPtr::new(v.as_mut_ptr());
let handle = std::thread::spawn(move || {
let mut v = unsafe { Vec::from_raw_parts(ptr.into_inner(), 0, 2) };
v.push(0);
});
v.push(0);
let _ = handle.join();
}
error: Undefined Behavior: Data race detected between (1) Write on thread `main` and (2) Write on thread `<unnamed>` at alloc1617. (2) just happened here
--> /home/ben/.rustup/toolchains/miri/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1839:13
|
1839 | ptr::write(end, value);
| ^^^^^^^^^^^^^^^^^^^^^^ Data race detected between (1) Write on thread `main` and (2) Write on thread `<unnamed>` at alloc1617. (2) just happened here
|
help: and (1) occurred earlier here
--> demo.rs:9:5
|
9 | v.push(2);
| ^^^^^^^^^
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE (of the first span):
= note: inside `std::vec::Vec::<i32>::push` at /home/ben/.rustup/toolchains/miri/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1839:13: 1839:35
note: inside closure
--> demo.rs:7:9
|
7 | v.push(1);
| ^^^^^^^^^
The wording of our diagnostic pretty clearly asks the user to compare the topmost span of the backtrace and the first help
span, but the topmost span can be dislocated from the user code by a fair bit if it is buried inside a standard library data structure or algorithm. For example, if a data race is triggered by the PartialOrd
impl used in the depths of slice::sort
. (as I write this I'm realizing I could have cooked up a more extreme example, ah well...)
Oh I see. Yeah... I think this PR here is still an improvement, feel free to file an issue to track this problem and possible ideas for how to fix it.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK