Exploring ways to make async Rust easier

 3 years ago
source link: https://carllerche.com/2021/06/17/six-ways-to-make-async-rust-easier/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Exploring ways to make async Rust easier

Published Thu, Jun 17, 2021 by Carl Lerche

Estimated reading time: 15 min

Asynchronous Rust is powerful but has a reputation for being hard to learn. There have been various ideas on how to fix the trickiest aspects, though with my focus being on Tokio 1.0, I had not been able to dedicate much focus to those topics. However, Niko’s async vision effort has recently started the discussion again, so I thought I would take some time to participate.

In this article, I collect some previously proposed ideas and offer some new ones, tying them together to explore what could be. This exploration isn’t a proposal but a thought experiment: what could we do if we didn’t have to worry about the status quo. Making a significant change to Rust would be disruptive. We would need a rigorous way to determine the pros and cons to determine that the churn is worth it. I also urge you to approach the article with an open mind. I expect some aspects will generate an immediate adverse reaction. Try to suppress it and approach it with an open mind.

Before exploring different paths asynchronous Rust could take, we should first understand when to use asynchronous programming styles. After all, asynchronous programming is more challenging than just using threads. So what is the benefit? Some might say the reason is performance and that asynchronous code is faster because threads are too heavyweight. Reality is more nuanced. Using threads for I/O based application can be faster depending on details. For example, a threaded echo server is faster than an asynchronous version when there less than about 100 concurrent connections. After that, the threaded version starts dropping off, but not drastically.

I believe that the better reason for asynchronous is it enables modeling complex flow control efficiently. For example, patterns like pausing or canceling an in-flight operation are challenging without asynchronous programming. Furthermore, with threads, coordinating between connections requires synchronization primitives and starts adding contention. With asynchronous programming, we avoid adding synchronization by operating on multiple connections on the same thread.

Rust’s asynchronous model does a phenomenal job of enabling us to model complex flow control. For example, mini-redis’ subscribe command implementation is very concise and elegant. Yet, it isn’t all sunshine and rainbows. When getting started, users of asynchronous Rust report a confusing learning curve. While getting started feels straightforward, it is easy to stumble on unexpected sharp edges. Niko and others have been doing a fantastic job cataloging the sharp edges as part of the async vision effort. While there are opportunities to improve on several fronts, I believe that the biggest issue with asynchronous Rust is that it can violate the principle of least surprise.

Let’s start with an example. Alan is learning Rust, has read the Rust book and the Tokio guide, and wants to write a toy chat server. He opts for a simple line-based protocol and encodes each line with a length prefix. His line parsing function looks like this:

async fn parse_line(socket: &TcpStream) -> Result<String, Error> {
    let len = socket.read_u32().await?;
    let mut line = vec![0; len];
    socket.read_exact(&mut line).await?;
    let line = str::from_utf8(line)?;

This code looks very much like blocking Rust code except for the async and await keywords. Even though Alan has never written Rust before, he can read this function and understand how it behaves, or so he thinks. When testing locally, his chat server appears to work, so he sends a link to Barbara. Unfortunately, after chatting for a bit, the server crashes with an “invalid UTF-8” error. Alan is baffled; he inspects his code and finds no apparent bugs.

So what is the problem? It turns out that the task uses a select! higher up in the call stack:

loop {
    select! {
        line_in = parse_line(&socket) => {
            if let Some(line_in) = line_in {
            } else {
                // connection closed, exit loop
        line_out = channel.recv() => {
            write_line(&socket, line_out).await;

Suppose a message is received on channel while parse_line is waiting for more data, the select! statement aborts the parse_line operation, losing in-progress parsing state as a result. In the following loop iteration, parse_line is called again and starts from the middle of a frame, resulting in reading gibberish.

Therein lies the problem: any async Rust function may stop running at any time if the caller cancels it, and unlike with blocking Rust, cancellation is a typical asynchronous operation. Worse, no affordance guides new users to discover this behavior.

A Shiny Future

What if we could fix this so that asynchronous Rust meets the learner’s expectations at each step? If behavior must deviate from expectations, then there must be an affordance to point the learner in the right direction. Additionally, we want to minimize surprises during the learning process, especially early on.

Let’s start by fixing the unexpected cancellation problem by saying that async functions always run to completion (first proposed here). With completion-guaranteed futures, Alan learns that asynchronous Rust behaves like blocking Rust, but with the addition of the async and await keywords. Spawning tasks adds concurrency, and channels coordinate between tasks. Instead of select! taking arbitrary async statements, it works with channels and channel-like types (e.g., JoinHandle).

With completion-guaranteed futures, Alan’s chat server looks like this.

async fn handle_connection(socket: TcpStream, channel: Channel) {
    let reader = Arc::new(socket);
    let writer = reader.clone();
    let read_task = task::spawn(async move {
        while let Some(line_in) in parse_line(&reader).await {
    loop {
        // `channel` and JoinHandle are both "channel-like" types.
        select! {
            res = read_task.join() => {
                // The connection closed, exit loop
            line_out = channel.recv() => {
                write_line(&writer, line_out).await;

The code is similar to the earlier example, but because all async statements must complete and select! only accepts channel-like types, the call to parse_line() moves to a spawned task. This change prevents the error Alan encounters with today’s asynchronous Rust. Should Alan try to call parse_line() in the select! statement, he would get a compiler error with a recommendation to spawn a task. Select’s channel-like requirement ensures that it is safe to abort losing branches. Channels can store values, and receiving the value is atomic. Losing select branches does not lose data on cancelation.


What happens if writing encounters an error? Today, read_task keeps running. Instead, Alan wants an error to result in gracefully closing the connection and all tasks. Unfortunately, this is where we start running into design challenges. When we can drop any async statement at any time, cancellation is trivial: drop the future. We need a way to cancel an in-flight operation as this ability is one of the primary reasons to use asynchronous programming. To achieve this, JoinHandle provides a cancel() method:

async fn handle_connection(socket: TcpStream, channel: Channel) {
    let reader = Arc::new(socket);
    let writer = reader.clone();
    let read_task = task::spawn(async move {
        while let Some(line_in) in parse_line(&reader).await? {
    loop {
        // `channel` and JoinHandle are both "channel-like" types.
        select! {
            _ = read_task.join() => {
                // The connection closed or we encountered an error,
                // exit the loop
            line_out = channel.recv() => {
                if write_line(&writer, line_out).await.is_err() {

But what does cancel() do? It cannot immediately abort the task because async statements are now guaranteed to complete. We do want the canceled task to stop processing and return as soon as possible. Instead, all resource types within the canceled task will stop operating and return an “interrupted” error. Further attempts to use them will also result in errors. This strategy is similar to Kotlin, except that Kotlin raises an exception. If read_task is awaiting socket.read_u32() in parse_line() when canceled, then the read_u32() function returns immediately with Err(io::ErrorKind::Interrupted). The ? operator bubbles up the task and results in the task terminating.

At first glance, this behavior may also seem like the task arbitrarily stops running, but this is not the case. To Alan, the current async Rust abort behavior appears as the task hangs indefinitely. By forcing resources, such as sockets, to return an error on cancellation, Alan can follow the cancelation flow. Alan may choose to add println! statements or use other debugging strategies to understand better how the canceled task terminates.


Unbeknownst to Alan, his chat server is avoiding most syscalls by using io-uring. Asynchronous Rust can transparently leverage the io-uring API thanks to completion-guaranteed futures and AsyncDrop. When Alan drops the TcpStream value at the end of handle_connection(), the socket is asynchronously closed. To do this, the TcpStream includes the following AsyncDrop implementation.

impl AsyncDrop for TcpStream {
    async fn drop(&mut self) {

Niko already has a plausible proposal for using async fn in traits. The remaining open question is how to handle the implicit .await point. Currently, asynchronously awaiting a future requires an .await call. An AsyncDrop trait would add a hidden yield point added by the compiler when a value goes out of scope within an async context. This behavior would violate the principle of least surprise. Why would there be implicit await points if others are explicit?

One proposal to solve the “sometimes implicit drop” problem requires an explicit function call to perform the drop asynchronously.

my_tcp_stream.read(&mut buf).await?;

Of course, what happens if the user forgets the async drop call? After all, the compiler handles dropping with blocking Rust, and this is a powerful feature. Also, note how the snippet above has a subtle bug: the ? operator skips the async_drop call on a read error. The Rust compiler could provide warnings that catch the problem, but what is the fix? Is there a way to make ? compatible with an explicit async_drop?

Get rid of .await

What if, instead of requiring explicit calls to async drop, we remove the await keyword? Alan would no longer have to use .await after calling an async function (e.g., socket.read_u32().await). When calling an async function within an async context, the .await becomes implicit.

This may seem like a big departure from today’s Rust, and it is. But it is good for us to question our assumptions. It’s interesting to look at .await in light of the criteria laid out in Aaron’s “Reasoning footprint” blog post. Implicit .await has limited applicability and context-dependence by only occurring within async statements. Alan only has to look at the function definition to know he is within an async context. Additionally, it would be easy for an IDE to highlight yield points.

Removing .await has another benefit: it brings async Rust in line with blocking Rust. The concept of blocking is already implicit. With blocking Rust, we don’t write my_socket.read(buffer).block?. When he writes his asynchronous chat server, the only noticeable difference to Alan becomes the need to annotate his functions with the async keyword. Alan can intuit the execution of the async code. The issue of “lazy futures” goes away, and Alan cannot accidentally do the following and wonder why “two” prints first.

async fn my_fn_one() {

async fn my_fn_two() {

async fn mixup() {
    let one = my_fn_one();
    let two = my_fn_two();
    join!(two, one);

The “.await” RFC did include some discussion of implicit await. At that time, the most compelling argument against implicit awaits was that the await calls annotated points at which the async statement could be aborted. This argument would apply less with completion-guaranteed futures. Of course, with abort-safe async statements, should the await keyword remain? This question would need to be answered. Regardless, removing “.await” would be a big change and would need to be approached cautiously. Usability studies would need to demonstrate that the pros outweigh the cons.

Scoped tasks

So far, Alan can build his chat server with asynchronous Rust without learning many new concepts or hitting unexpected behavior. He learned about select! but the compiler enforces selecting over channel-like types. Besides that, Alan added async to his functions, and it just worked. He shows his code to Barbara and asks about needing to wrap the socket with an Arc. Barbara suggests he looks into scoped tasks as a way to avoid the allocation.

A scoped task is the asynchronous equivalent of crossbeam’s scoped threads: a task that can borrow data owned by the spawner. Alan can use a scoped task to avoid the Arc in his connection handler.

async fn handle_connection(socket: TcpStream, channel: Channel) {
    task::scope(async |scope| {
        let read_task = scope.spawn(async || {
            while let Some(line_in) in parse_line(&socket)? {

        loop {
            // `channel` and JoinHandle are both "channel-like" types.
            select! {
                _ = read_task.join() => {
                    // The connection closed or we encountered an error,
                    // exit the loop
                line_out = channel.recv() => {
                    if write_line(&writer, line_out).is_err() {

The key to ensuring safety is guaranteeing that the scope outlives all tasks spawned within the scope, or, in other words, ensuring that async statements complete. There is a downside. Enabling scoped tasks requires making “Future::poll” unsafe as polling the future to completion becomes required for memory safety (covered in Matthias’ RFC). The addition of unsafe would make implementing Future by hand much more difficult. As a mitigation, we need to remove the need for virtually all users to implement Future by hand, including when implementing traits like AsyncRead and AsyncIterator. I believe this is an achievable goal.

Besides scoped tasks, ensuring async statement completion enables passing pointers from the task value to the kernel when using io-uring or integrating with C++ futures. In some cases, it may also be possible to avoid the allocation when spawning sub-tasks, which could be helpful in embedded contexts, though it would require a slightly different API than above.

Add concurrency by spawning.

With today’s asynchronous Rust, applications can add concurrency by spawning a new task, using select! or FuturesUnordered. So far, we have discussed spawning and select!. I propose removing FuturesUnordered as it is a common source of bugs. When using FuturesUnordered, it is easy to spawn tasks into it, expecting they will run in the background, then be surprised when they don’t make any progress (see status quo story).

Instead, we can provide a similar utility using scoped tasks.

let greeting = "Hello".to_string();

task::scope(async |scope| {
    let mut task_set = scope.task_set();
    for i in 0..10 {
        task_set.spawn(async {
            println!("{} from task {}", greeting, i);
    async for res in task_set {
        println!("task completed {:?}", res);

Each spawned task runs concurrently, borrows data from the spawning task, and the TaskSet utility provides a similar API as FuturesUnordered without the hazard. Other primitives, such as buffered streams, can also be implemented on top of scoped tasks.

There is an opportunity to explore new concurrency primitives on top of these primitives. For example, structured concurrency, similar to Kotlin, would now be possible. Matthias has previously explored this space, but asynchronous Rust’s current model makes it impossible to achieve. By switching asynchronous Rust to guarantee completion, we unlock this whole area.

What about select!?

At the beginning of the article, I claimed that using asynchronous programming enables us to model complex flow control efficiently. The most effective primitive we have today is select!. I also proposed in this article to reduce select! only to accept channel-like primitives, forcing Alan to spawn two tasks per connection to read and write concurrently. While spawning a task does help prevent bugs when canceling the read operation, it also would have been possible to rewrite the read operation to tolerate unexpected cancellation. For example, when parsing frames in mini-redis, we first store received data in a buffer. When canceling the read operation, no data is lost because it is in the buffer. The next call to read will resume from where we left off. The Mini-Redis read operation is abort-safe.

What if, instead of limiting select! to channel-like types, we restrict it to abort-safe operations. Receiving from a channel is abort-safe, but so is reading from a buffered I/O handle. The point here is, instead of assuming that all asynchronous operations are abort-safe, we require the developer to opt-in by adding #[abort_safe] (or async(abort)) to their function definition. This opt-in strategy has a few advantages. First, when Alan is learning asynchronous Rust, he doesn’t need to know anything about abort safety. It is possible to implement everything without the concept by spawning to get concurrency.

async fn read_line(&mut self) -> io::Result<Option<String>> {
    loop {
        // Consume a full line from the buffer
        if let Some(line) = self.parse_line()? {
            return Ok(line);

        // Not enough data has been buffered to parse a full line
        if 0 == self.socket.read_buf(&mut self.buffer)? {
            // The remote closed the connection.
            if self.buffer.is_empty() {
                return Ok(None);
            } else {
                return Err("connection reset by peer".into());

Instead of having the default be abort-safe statements, it becomes opt-in. This opt-in strategy follows the existing pattern around unwind safety. When a new developer jumps into the code, the annotation informs them that the function must uphold the abort safety guarantees. The rust compiler may even provide additional checks and warnings for functions annotated with #[abort_safe].

Now Alan can use his read_line() function from a loop with “select!”.

loop {
    select! {
        line_in = connection.read_line()? => {
            if let Some(line_in) = line_in {
            } else {
                // connection closed, exit loop
        line_out = channel.recv() => {

Mixing abort-safe and non-abort-safe

The #[abort_safe] annotation introduces two variants of asynchronous statements. Mixing abort-safe with non-abort-safe statements requires special consideration. Calling an abort-safe function is always possible, whether from an abort-safe or non-abort-safe context. However, the Rust compiler will prevent calling non-abort-safe functions from an abort-safe context and provide a helpful error message.

async fn must_complete() { ... }

async fn can_abort() {
    // Invalid call => compiler error
async fn must_complete() { ... }

async fn can_abort() {
    // Valid call
    spawn(async { must_complete() }).join();

The developer can always spawn a new task to bridge a non-abort-safe function with an abort-safe context.

Including two flavors of asynchronous statements adds complexity to the language, but this complexity appears late in the learning curve. When learning asynchronous Rust, the default asynchronous statement is non-abort-safe. From this context, the learner can call asynchronous functions regardless of abort-safety. Abort-safety will appear in late chapters of asynchronous Rust tutorials as an optional advanced topic.

Getting there

Shifting from the current, abort-safe by default, asynchronous model to completion-guaranteed will require a new Rust edition. For discussion’s sake, let’s say that Rust edition 2026 introduces this change. The Future trait in this edition would be changed to model completion-guaranteed futures and would not be compatible with the trait from older versions. Instead, the old trait exists in the 2026 edition as AbortSafeFuture (naming TBD).

With the 2026 edition, adding #[abort_safe] to an asynchronous statement would generate an AbortSafeFuture implementation instead of Future. Any async function written with pre-2026 editions implements the AbortSafeFuture trait, making all existing async code compatible with the new edition (recall that an abort-safe statement is callable from any context).

Upgrading an old codebase to completion-guaranteed async Rust requires adding #[abort_safe] to all async statements. This transition is a mechanical process, and a tool can automate the process. Runtimes like Tokio will probably need to issue a breaking change to support the new completion-guaranteed asynchronous Rust.

Parting thoughts

I discussed several potential changes for Rust. To briefly recap, they were:

  • Async functions guarantee completion.
  • Remove the await keyword.
  • Introduce an #[abort_safe] annotation to mark async functions that are safe to abort.
  • Limit select! to accept only abort-safe branches.
  • Cancel spawned tasks by disabling resources from completing.
  • Support scoped tasks.

I believe these changes could significantly simplify asynchronous Rust, though making the change comes at the cost of disrupting the status quo. Before making any decisions, we need more data. What percentage of today’s asynchronous code is abort-safe? Can we run usability studies to evaluate the benefit of these changes? How much cognitive load would having two flavors of async statements add to Rust?

I’m also hoping that this exploration will spark discussion, and perhaps others will propose alternative directions. Now is the time to come up with wild ideas for where Rust could go. Either write up articles or submit shiny future PRs to the wg-async-foundations repository.

And finally…

What if we remove await

About Joyk

Aggregate valuable and interesting links.
Joyk means Joy of geeK