7

Data race spans by saethlin · Pull Request #2646 · rust-lang/miri · GitHub

 1 year ago
source link: https://github.com/rust-lang/miri/pull/2646
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Contributor

oli-obk left a comment

r=me after a doc nit

src/concurrency/data_race.rs

Outdated Show resolved

src/concurrency/data_race.rs

Outdated Show resolved

src/concurrency/data_race.rs

Outdated Show resolved

@@ -172,7 +173,7 @@ pub trait EvalContextExt<'mir, 'tcx: 'mir>: crate::MiriInterpCxExt<'mir, 'tcx> {

// Each complete happens-before the end of the wait

if let Some(data_race) = &this.machine.data_race {

data_race.validate_lock_release(&mut init_once.data_race, current_thread);

data_race.validate_lock_release(&mut init_once.data_race, current_thread, current_span);

Would it make sense to move the current_span calls behind the fn call? That seems nicer than putting the burden on all callers. Basically instead of passing current_thread and current_span it could pass &Machine?

EDIT: Ah, that doesn't work since it overlaps with data_race. Hm... I guess we can go ahead with separate arguments for now, but if you have a nice idea for how to avoid having to pass so many arguments, that would be great.

src/concurrency/vector_clock.rs

Outdated Show resolved

src/diagnostics.rs

Outdated Show resolved

|

LL | ... *pointer.load(Ordering::Relaxed)

| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

help: The Allocate on thread `<unnamed>` is here

Nit: is "is" the right verb? One of them happened in the past. Do we know which one? Can we make the message "Earlier, the X on thread Y happened here" and "Now, the X' on thread Y' happened here"? In fact isn't the latter span redundant with the main error span?

Contributor

Author

@saethlin saethlin Dec 21, 2022

Hah I was also not sure what words to use here. This is a data race, so strictly does it make any sense to say that one of these operations happens before the other? They happen in a particular order in the interpreter, but should diagnostics be commenting on the interpreter or the program semantics (or lack thereof)?

I chose present tense because I imagine this diagnostic as a response to the current user experience, where someone would rightfully ask "But Miri, where is the read/write/allocate/deallocate?" so now Miri says "The read/write/allocate/deallocate is here"

(As I describe this I realize it is not consistent with how I worded the Stacked Borrows diagnostics. Ah well.)

With vector clocks, there could actually be considerate amounts of "real time" between the two operations, even in an observable way. But you also have a point.

What about the fact that one of these two spans is always redundant since it is the same as where the error points? I think it'd be better to just say "the other access was over there" or so?

Contributor

Author

@saethlin saethlin Dec 22, 2022

Yeah, I've tried to reduce the duplication here, but I feel backed into a corner again by the limited diagnostics API. Doesn't change the fact that this is still much better than before.

This seems potentially confusing, since after the help we have a backtrace that attaches to the "main span", not the "other access span". I think that's why I think it is good to say something about the "earlier conflicting access". Also if both accesses are of the same kind, saying "the Write" is not very informative.

So what about

error: Undefined Behavior: Data race detected between (1) Atomic Store on thread `<unnamed>` and (2) Read on thread `<unnamed>` at ALLOC. Access (2) just happened here
  --> $DIR/atomic_write_na_read_race1.rs:LL:CC
   |
LL |             *atomic_ref.get_mut()
   |             ^^^^^^^^^^^^^^^^^^^^^
   |
help: and access (1) occurred earlier here
  --> $DIR/atomic_write_na_read_race1.rs:LL:CC
   |
LL |             atomic_ref.store(32, Ordering::SeqCst)
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
   = note: BACKTRACE:
   = note: inside closure at $DIR/atomic_write_na_read_race1.rs:LL:CC

Yeah it is annoying that the same text is repeated for the error title and label -- that would be #2200, which will be very annoying work since all the ui tests will need adjustment.

Contributor

Author

@saethlin saethlin Dec 22, 2022

edited

I agree there is potential for confusion with the backtrace. I often find the Stacked Borrows errors a bit visually confusing, and have often wanted some other way to format all this information. I'll try to keep an eye on how this plays out.

I like this numbering strategy overall, I'm just not sure about calling these an "access". That word has a particular meaning in Stacked Borrows, and feel like calling an allocation or deallocation an access might be confusing to newcomers (even though it is technically correct to use the word). I think we can just drop the word without loss of clarity?


Also it occurs to me that we're now comparing a "local" span with the topmost frame, so things could look a bit odd if the race occurs through a standard library function that isn't #[track_caller]. I can't find any examples of this, so it may be that the way people write data races, this just isn't a concern. Just mentioning this so you're aware of it, I think we shouldn't cook up a solution until we have a real example.

Also it occurs to me that we're now comparing a "local" span with the topmost frame, so things could look a bit odd if the race occurs through a standard library function that isn't #[track_caller].

The main backtrace will also skip track_caller frames by default so I think one needs to pass -Zmiri-backtrace=full to get anything weird?

I pushed an attempt to clarify the backtrace situation a bit. What do you think?

Contributor

Author

@saethlin saethlin Dec 23, 2022

edited

I think the clarification is good. It's a marginal improvement, but an improvement.

Here is an example of a program that produces the flavor of weird output I'm referring to:

use std::sync::atomic::AtomicPtr;
fn main() {
    let mut v: Vec<i32> = Vec::with_capacity(2);
    let ptr = AtomicPtr::new(v.as_mut_ptr());
    let handle = std::thread::spawn(move || {
        let mut v = unsafe { Vec::from_raw_parts(ptr.into_inner(), 0, 2) };
        v.push(0);
    });
    v.push(0);
    let _ = handle.join();
}
error: Undefined Behavior: Data race detected between (1) Write on thread `main` and (2) Write on thread `<unnamed>` at alloc1617. (2) just happened here
    --> /home/ben/.rustup/toolchains/miri/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1839:13
     |
1839 |             ptr::write(end, value);
     |             ^^^^^^^^^^^^^^^^^^^^^^ Data race detected between (1) Write on thread `main` and (2) Write on thread `<unnamed>` at alloc1617. (2) just happened here
     |
help: and (1) occurred earlier here
    --> demo.rs:9:5
     |
9    |     v.push(2);
     |     ^^^^^^^^^
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
     = note: BACKTRACE (of the first span):
     = note: inside `std::vec::Vec::<i32>::push` at /home/ben/.rustup/toolchains/miri/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1839:13: 1839:35
note: inside closure
    --> demo.rs:7:9
     |
7    |         v.push(1);
     |         ^^^^^^^^^

The wording of our diagnostic pretty clearly asks the user to compare the topmost span of the backtrace and the first help span, but the topmost span can be dislocated from the user code by a fair bit if it is buried inside a standard library data structure or algorithm. For example, if a data race is triggered by the PartialOrd impl used in the depths of slice::sort. (as I write this I'm realizing I could have cooked up a more extreme example, ah well...)

Oh I see. Yeah... I think this PR here is still an improvement, feel free to file an issue to track this problem and possible ideas for how to fix it.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK