多线程编程的困境：Sync, Send, and 'Static

薛定谔的喵

一个人NB的不是标签

It is not hard to write concurrent programs, but it is hard to write concurrent programs correctly.

多线程编程不困难，困难的是正确的编写多线程程序。也就是说程序员可以编写出多线程的程序，但是某一天（也许是很久以后）会遇到奇怪的bug，并且尝试各种方式去调试。最后发现居然是多线程带来的bug。那么我们如何才能更好地正确地编写多线程程序呢？

多线程程序最重要的一点是要避免race condition！

Race condition有dead lock，data race等。Rust编程语言声称可以避免data race，本文试图分析一下它是如何实现这一目的。即使你不用Rust进行编程，学习它避免data race的方法，也可以从中受益！

多线程的简单bug

首先我们看看下面的程序，生成10个线程去加1，从而希望得到是10

#include<thread>
#include<iostream>
#include<vector>
int main() {
    int sum = 0;
    auto f = [&](){
        sum += 1;
    };
    std::vector<std::thread> vec;
    for (int i = 0; i < 10; i++) {
        vec.push_back(std::thread(f));
    }    
    for (auto &t: vec) {
        t.join();
    }
    std::cout<<"sum is "<< sum<<std::endl;
}

使用g++ -std=c++17 bug.cpp -pthread 编译程序，并运行 a.out,输出是10，如果你运气好输出会不是10 。为了查看输出不是10的情况，我们可以通过如下命令不停地运行，来查看结果

while true; do ./a.out; sleep 5; done; #bash script,适用于Linux和Mac

我的机器大概运行了30次，有一次是9。

有经验的程序员一眼就会看出来，这里的sum 不是线程安全的，要换成std::atomic 。一行代码就可以解决。

这个例子主要是为了说明多线程程序，因为运行时线程的执行顺序不一定，如果读写共享的数据并且不保护好，那么就会有bug。解决的办法也很简单（看上去）：

不共享数据（或者共享只读数据）。
保护好共享的数据。
保证数据被访问的时候是valid的（如果有GC的语言则不需要程序员去保证，GC已经帮我们搞定了）。

本文分析Rust是如何通过编译期检查来保证程序遵循了这三点要求，从而增强对多线程编程的认识。

Rust为了保证程序不出现这样的bug，将三个办法放入了类型系统里面，这样编译器可以在compile time将bug禁止掉，防止bug进入产品中。

不共享数据或者共享只读数据

不共享数据就是不要在多线程中共享变量，这个容易理解，但是实践中容易把曾今不共享的变量共享了，因为多人合作的程序中，有些规定并不是所有人都熟知。Rust 对此在编译器做了保证，防止不共享的变量被共享了，详细如何保证见下文对Send的分析。

共享只读数据就是在多线程中，只能对变量进行读操作，不能进行写操作。

Rust是如何通过它的类型系统来保证数据共享的时候，只对数据进行只读操作呢？

答案是Rust类型系统里面规定了，引用在任何时候只能满足下面其中一个条件

有任意个只读引用
仅有一个写引用

这样就会防止数据在一个线程被写，而在另外的线程被读取或者写。如果违反这样的操作，编译器会甩给我们类似下面的错误

error[E0502]: cannot borrow `var` as immutable because it is also borrowed as mutable
  --> src/main.rs:44:13
   |
40 |     s.spawn(|_| {
   |             --- mutable borrow occurs here
41 |         var += 1;
   |         --- first borrow occurs due to use of `var` in closure
...
44 |     s.spawn(|_| {
   |       ----- ^^^ immutable borrow occurs here
   |       |
   |       mutable borrow later used by call
45 |         println!("A child thread borrowing `var`: {:?}", var);
   |                                                          --- second borrow occurs due to use of `var` in closure

上面的错误信息重要的一句是`cannot borrow `var as immutable because it is also borrowed as mutable，因为变量var在一个线程被修改，在另外一个线程又被读取。Rust不允许这样的代码通过编译，从而避免了这样的bug混入程序中。

Sync 和 Send，'static

除了任何时候只能有多个只读引用或者有一个可修改变量的引用（不能同时共存）的限制外，Rust对多线程的保证的奥妙，还在于 Sync和Send, 'static，这三个S。下面让我们一一来看看它们的功能。

Send

Send 是Rust里面的trait。我们可以理解trait就是类似于Java里面的接口，比如clone接口。

Send在多线程程序中起到了什么作用呢？

一是当我们不共享数据的时候，我们可以直接将变量move给生成的线程，这样，数据就被这个线程独有了。比如

    let v = vec![1, 2, 3];

    let handle = thread::spawn(move || {
        println!("Here's a vector: {:?}", v);
    });

    handle.join().unwrap();

vector v 在主线程创建以后，直接move给了生成的线程，那么除了那个线程，没有其他的地方可以使用这个vector。如果其他地方使用这个vector（比如，我们在handle.join().unwrap() ）前面尝试打印vector，Rust就会报错：

 error[E0382]: borrow of moved value: `v`
  --> src/main.rs:45:20
   |
38 |     let v = vec![1, 2, 3];
   |         - move occurs because `v` has type `Vec<i32>`, which does not implement the `Copy` trait
39 | 
40 |     let handle = thread::spawn(move || {
   |                                ------- value moved into closure here
41 |         println!("Here's a vector: {:?}", v);
   |                                           - variable moved due to use in closure
...
45 |     println!("{}", v[1]);
   |                    ^ value borrowed here after move

报错信息 `borrow of moved value: `v`` 警告我们不能使用一个被move的变量。因为数据被move说明它不能被共享，如果我们继续使用，这可能有bug，所以编译器不让编译通过。

而数据要在线程之间被move需要满足Send trait。如果我们move的变量不满足Send，那么Rust将禁止程序编译通过。比如Rc<usize>，如果move它，在多线程编程中就会报错，比如：

error[E0277]: `Rc<i32>` cannot be sent between threads safely
   --> src/main.rs:52:22
    |
52  |           let handle = thread::spawn(move || {
    |  ______________________^^^^^^^^^^^^^_-
    | |                      |
    | |                      `Rc<i32>` cannot be sent between threads safely
53  | |             //sum +=1
54  | |             println!("sum is {}", sum);
55  | |
56  | |         });
    | |_________- within this `[closure@src/main.rs:52:36: 56:10]`
    | 
   ::: /Users/name/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/mod.rs:607:8
    |
607 |       F: Send + 'static,
    |          ---- required by this bound in `spawn`

报错的重要一句话是 `Rc<i32>` cannot be sent between threads safely

为什么在线程之间move不满足Send的变量，Rust会报错？因为如果一个变量如果不满足Send，那么说明它在线程间move后，如果继续使用，会有undefine behavior。比如上面报错的Rc。因为Rc不是多线程安全的引用计数，如果我们允许它move到其他线程里面，它的引用计数在多线程就会不准确，从而造成bug。对应Rc并且线程安全的引用计数是Arc。

所以Send保证，在线程间我们只能move，也就是send，多线程安全的变量，不能move线程不安全的变量。这样避免了我们将线程不安全的变量在线程间move从而造成bug。

Sync

当我们需要又读又写共享的数据时，就要保护被共享的数据。而一个数据只有被保护了才可以被用于多线程中。

而Rust是如何保证共享的读写数据被保护了呢？答案是Rust要求共享的被读写的数据满足Sync trait。（trait的概念见上文的解释）如果数据不满足Sync trait，但被共享于多线程中，编译器就会报错。为什么一个数据满足了Sync，就说明它在多线程中被保护了呢？

因为只有被保护的数据才会满足Sync，而被保护了，就说明它可以安全地在多线程间共享。比如当我们使用RefCell用于多线程，报错如下

error[E0277]: `RefCell<usize>` cannot be shared between threads safely
   --> src/main.rs:73:22
    |
73  |         let handle = thread::spawn( move || {
    |                      ^^^^^^^^^^^^^ `RefCell<usize>` cannot be shared between threads safely
    | 
   ::: /Users/x/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/thread/mod.rs:607:8
    |
607 |     F: Send + 'static,
    |        ---- required by this bound in `spawn`

主要的信息是`RefCell<usize>` cannot be shared between threads safely

因为RefCell里面的数据结构没有被保护，所以不能用于多线程中。我们需要使用Mutex对数据进行保护，才能将数据用于多线程中读和写。所以需要将RefCell<usize>改成Mutex<RefCell<usize>>

Sync 和Send的关系很微妙，Sync可以理解为是Send的辅助之一，它的严格定义是，一个类型T是Sync 当且仅当&T是Send。如果想要了解更详细地信息请看std::marker::Sync - Rust

'static

Send和Sync，规定了多线程中，只能使用线程安全的数据。当我们使用没有GC(Garbage Collection）的编程语言时，多线程程序还要注意保证被共享的数据在被访问的时候是有效的。如果访问无效的数据，比如Use-after-free，那么就会有Undefined Behavior。Rust是如何通过类型系统做到这一点的呢？

答案就在Rust的Ownership系统。在Rust里面每一个数据都具有owner，当owner不存在的时候，数据就被释放/析构。所以Rust规定如果数据被用于多线程中，那么线程要么单独own这个数据，也就是我们前文所说的数据被move到线程中；要么数据具有'static的生命周期。

什么是'static的生命周期呢？'static的生命周期表示这个变量需要存活多久就可以存活多久。满足这个条件的有三种情况。第一种情况是，这个数据存活得跟包含它的程序一样长，也就是数据在程序退出的时候才会被释放/析构。比如static的变量，它们在程序被load的时候存在，在程序被unload的时候释放；比如literal string，它们保存在程序的二进制代码中。第二种情况是，这个变量是owned type。什么是owned type呢？顾名思义，就是这个变量是完全占有(own)这个数据，比如String， Vector，还有primitive type, usize, i64等等。所以编写多线程程序的时候，可以选择使用owned type将变量move到线程里面（这就是第一点不共享数据）；当数据是share onwership的时候，就使用引用计数，结合Send和Sync来进行多线程编程。'static生命周期的第三种情况是如果变量包含引用，那么它只包含其他'static生命周期的引用。显然，虽然这种情况，变量不拥有对应的数据，但是引用是'static，那么它也可以存活得想要多长就多长。

如果不满足'static的生命周期，Rust就会报错。比如当我们在多线程中读取局部变量时候，Rust会报错如下

error[E0373]: closure may outlive the current function, but it borrows `sum`, which is owned by the current function
  --> src/main.rs:71:37
   |
71 |         let handle = thread::spawn( || {
   |                                     ^^ may outlive borrowed value `sum`
72 |            println!("sum is {}",  sum);
   |                                   --- `sum` is borrowed here
   |
note: function requires argument type to outlive `'static`
  --> src/main.rs:71:22

报错的主要信息是closure may outlive borrowed value sum，就是在提示我们局部变量被用于多线程中，这个局部变量有可能被释放，而线程却会访问它。怎么修复它？要么我们将sum move到线程里面，要么我们使用引用计数，将sum 分配在heap中，(或者使用scoped thread，本文不讨论这种用法）。这样我们就可以保证它在被线程使用的时候还继续存活着。具体修改后的正确代码请看本文最后的示例。（closure是Rust里面难理解的概念之一，值得单独一篇文章进行分析）

从上文的分析，可以看到多线程编程中，我们需要做好下面三点（如果GC语言，则只需要满足前面两点）

不共享数据（或者共享数据，但只读共享）。
保护好共享的数据，如果数据被读和写。
保证数据被访问的时候是valid的

Rust为了保证程序员做到这三点，将它们放入了类型系统中，这样Rust就可以在compile time帮程序员发现可能的bug。Compile time的检查，让程序员可以更可靠地编写多线程程序。但相信读者也注意到这种方法，让学习曲线变陡峭了，相对于有GC的语言也变得更繁琐一些（我们需要许多wrapper type，比如Arc来提供这些保证），也会拒绝编译一些逻辑上正确但不满足类型系统要求的程序。如果想了解更多关于Rust，请移步阅读Rust那些难理解的点(持续更新）

How Rust Achieves Thread Safety
Fearless Concurrency with Rust | Rust Blog
std::marker::Sync - Rust
https://github.com/pretzelhammer/rust-blog/blob/master/posts/common-rust-lifetime-misconceptions.md#2-if-t-static-then-t-must-be-valid-for-the-entire-program
Send and Sync in The Rustonomicon
The race in the The Rustonomicon

本文用到的程序代码示例如下，如果你会Rust，建议自己运行一遍和修改，从而更好地理解里面的错误信息，以及它们传达的内容。

Rc不能在多线程中，因为它不是满足Send


    let mut handles = vec![];
    let sum_rc = Rc::new(3);

    for _ in 0..10 {
        let sum_rc = sum_rc.clone();
        let handle = thread::spawn( move || {
           println!("sum Rc is {}",  sum_rc.as_ref());

        });
        handles.push(handle);
    }
    for handle in handles {
        handle.join().unwrap();
    }
    println!("sum is {}", sum_rc.as_ref());

error[E0277]: `Rc<i32>` cannot be sent between threads safely
   --> src/main.rs:73:22
    |
73  |           let handle = thread::spawn( move || {
    |  ______________________^^^^^^^^^^^^^__-
    | |                      |
    | |                      `Rc<i32>` cannot be sent between threads safely
74  | |            println!("sum ref cell is {}",  sum_rc.as_ref());
75  | |
76  | |         });
    | |_________- within this `[closure@src/main.rs:73:37: 76:10]`

RefCell不能shared by多线程，因为它不是Sync的

// 以下代码会报错，提示不能shared也就是需要满足Sync trait    
let mut handles = vec![];
    let sum_ref_cell = Arc::new(RefCell::new(3));

    for _ in 0..10 {
        let sum_ref_cell = sum_ref_cell.clone();
        let handle = thread::spawn( move || {
           println!("sum RefCell is {}",  sum_ref_cell.as_ref().borrow());

        });
        handles.push(handle);
    }
    for handle in handles {
        handle.join().unwrap();
    }
    println!("sum is {}", sum_ref_cell.as_ref().borrow());

error[E0277]: `RefCell<i32>` cannot be shared between threads safely
   --> src/main.rs:88:22
    |
88  |         let handle = thread::spawn( move || {
    |                      ^^^^^^^^^^^^^ `RefCell<i32>` cannot be shared between threads safely

共享的数据需要'Static，局部变量共享时的报错

    let mut handles = vec![];
    let sum = 3;

    for _ in 0..10 {
        let handle = thread::spawn( || {
           println!("sum is {}",  sum);

        });
        handles.push(handle);
    }
    for handle in handles {
        handle.join().unwrap();
    }
    println!("sum is {}", sum);

error[E0373]: closure may outlive the current function, but it borrows `sum`, which is owned by the current function
  --> src/main.rs:71:37
   |
71 |         let handle = thread::spawn( || {
   |                                     ^^ may outlive borrowed value `sum`
72 |            println!("sum is {}",  sum);
   |                                   --- `sum` is borrowed here
   |
note: function requires argument type to outlive `'static`

正确可运行的程序是


use std::sync::{Mutex, Arc};
use std::thread;

fn main() {
    let counter = Arc::new(Mutex::new(0));
    let mut handles = vec![];

    for _ in 0..10 {
        let counter = Arc::clone(&counter);
        let handle = thread::spawn(move || {
            let mut num = counter.lock().unwrap();

            *num += 1;
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("Result: {}", *counter.lock().unwrap());
}

多线程编程的困境：Sync, Send, and 'Static

多线程编程的困境：Sync, Send, and 'Static

多线程的简单bug

不共享数据或者共享只读数据

Recommend

Google will stop using Oracle finance software, switch to SAP

用「上瘾模型」打开「创造营2021」……

黄金圈法则：WHY、HOW、WHAT，让运营结果与目的更一致

文案版《吐槽大会》，一句比一句狠！

Alon Gal (Under the Breach)

10万+爆文如何生产？7000字新媒体写作指南

公众号付费阅读功能一年？最全面的解读来了！

抖音导流转化率巨差！谁的锅？

产品公开课｜如何转岗B端产品经理？看美团产品专家的3步攻略 | 运营派

学完这5次实操，你也能达到95%的公司对产品经理的原型能力要求！ | 运营派

About Joyk