The Path to Rust :: Jon Gjengset

The Path to Rust (23 min. read)

Posted on May 25, 2016 — shared on Hacker News Twitter Reddit Lobsters

About six months ago, I started my first large-scale Rust-based project. I’d dabbled with the language in its early days, but back then it was a different beast, and not particularly approachable. I decided to try again, and I’m glad I did. Rust is quickly becoming my favorite language for all systems work (which is most of what I do anyway), and has largely replaced both Go, Python, and C/C++ in my day-to-day.

Rust helps you avoid a lot of silly mistakes while also being expressive, flexible, and fast. However, that’s not what is most important to me. I like writing programs in Rust. It’s the first time in quite a long time that I am excited to be coding in a language — I actively want to convert old projects to Rust. YMMV of course, but I urge you to give it a shot!

Rust is not the most beginner-friendly language out there — the compiler is not as lenient and forgiving as that of most other languages (Go, I’m looking at you), and will regularly reject your code (albeit usually for good reasons). This creates a relatively high barrier to entry, even for people with extensive programming backgrounds. In particular, Rust’s “catch bugs at compile time” mentality means that you often do not see partial progress — either your program doesn’t compile, or it runs and does the right thing. Obviously, this is not always true, but it can make it harder to learn by doing than in other, less strict languages.

This post is not meant to be a comprehensive introduction to Rust. If you want to learn Rust, you should go read the excellent Rust book. Instead, I will attempt to give an evaluation of Rust for developers coming from other systems languages (Go and C/C++ in particular), and to point out why they may or may not want to try Rust. At the end, I’ll also point out some tips and gotcha’s, at the end for those who are interested in that kind of stuff.

Why is Rust better for me?

When researching a new language, developers (like you, dear reader) will inevitably focus on how the language in question is different from the one they are currently using. In particular, they want to know whether (and if so, how) the new languages is better. The Rust website has a list of Rust “features”, but that’s not all that helpful if you’re trying to decide whether the new language is better for you. So, let’s go through some of the ways Rust might make your life easier.

Fewer runtime bugs

If debugging is the process of removing bugs, then programming must be the process of putting them in.

Edsger W. Dijkstra [citation needed]

Code is very rarely correct the first time it is written, especially for complex systems code. Even for seasoned developers, a large portion of programming time is spent debugging why code doesn’t do what it’s supposed to.

One of the Tor developers recently did a retrospective on what kinds of bugs had crept into the Tor onion router over the past couple of years, as well as how they could have been avoided. While reading through the list of identified issues, I noticed that many of these would be caught at compile-time in Rust. To note a few:

2.1 would be caught by Rust’s overflow detection.
2.2 would be a non-issue in Rust — void* is not available, and the more expressive type system (generics for example) would render them unnecessary anyway.
2.4 speaks for itself.
3.1: Rust’s lifetimes are designed to address exactly this issue.
3.3: Pattern matching in Rust (think of it as a switch on steroids) is checked for completeness (i.e., all possible cases are handled) at compile-time.
4.1: Propagating Results with try! is a common pattern in Rust, which would effectively provide exactly this kind of behavior.
9.1: Rust has strong conventions for return values that can return errors (basically, use Result), and callers must deal with the fact that a function can error.
10.2/10.3: Rust’s borrow checker enforces that data can’t be simultaneously read and written, making this kind of bug impossible.

Go, which has increasingly been adopted as a systems language, solves some of these issues, but far from all. It also introduces its own set of issues, such as type-casting from the interface{} type or data races between Goroutines, which are non-existent in Rust.

Safe concurrency

This latter point is particularly interesting; the Rust compiler will not compile a program that has a potential data race in it. Unless you explicitly mark your code as unsafe (which you rarely, if ever, need to do), your code simply cannot have data races. Rust checks this using the borrow checker, which enforces two simple rules:

First, any borrow must last for a scope no greater than that of the owner. Second, you may have one or the other of these two kinds of borrows, but not both at the same time:

one or more references (&T) to a resource,

exactly one mutable reference (&mut T).

The first rule ensures that you never use a value after it has gone out of scope (eradicating use-after-free and double-free in one fell swoop). The second rule guarantees that you have no data races, since you cannot have two mutable references to the same data, nor can you have one thread modify while another thread reads. This might seem restrictive at first, but all the solutions you would use to avoid races in regular code are fully supported, and their correctness is checked at compile-time: you can add locks to allow two threads mutable access to a variable, or use atomic operations to implement RCU and other algorithms that allow concurrent reads and writes.

Performance without sacrifice

Some of the bugs found by the Tor developers are handled in other higher-level languages as well. Unfortunately, higher-level languages are often not a great fit for systems code. Systems code is often performance critical (e.g., kernels, databases), so the developer wants predictable performance, and tight control over memory allocation/de-allocation and data layout. This can be hard to achieve in higher-level languages or when using a garbage collector.

Rust provides features that are often associated with high-level languages (such as automatic memory free-ing when values go out of scope, pattern matching, functional programming abstractions, a powerful type system), as well as powerful features like the borrow checker, with no runtime cost. This might seem unbelievable (and it admittedly still feels that way to me), but Rust’s claim to achieve performance comparable to that of C++ seems to be supported in multiple benchmarks.

Furthermore, Rust gives the developer control over when memory is allocated, and how it is laid out. This in turn allows straightforward and efficient interaction with C APIs (and other languages) through the Foreign Function Interface, and makes it easy to interact with high-performance libraries like BLAS, or low-level toolkits like DPDK, which may not be available natively in Rust (yet).

Expressivity and productivity

One of the reasons developers often report to be more productive in higher-level languages is the availability of higher-level primitives. Consider the case of constructing an inverted index for a given string. In C (or C++), you might write something like this (there are examples in other languages there too).

Let’s have a look at how you might implement something similar in Rust. Note that it’s not actually the same as the C++ example, since that also implements a Trie. For a more apples-to-apples comparison, consider this C++ variant written by a Redditor.

fn main() {
  use std::io;
  use std::fs;
  use std::env;
  use std::collections::{HashMap, HashSet};
  use std::io::BufRead;

  let args = env::args().skip(1).collect::<Vec<_>>();
  let idx = args
    // iterate over our arguments
    .iter()
    // open each file
    .map(|fname| (fname.as_str(), fs::File::open(fname.as_str())))
    // check for errors
    .map(|(fname, f)| {
      f.and_then(|f| Ok((fname, f)))
        .expect(&format!("input file {} could not be opened", fname))
    })
  // make a buffered reader
  .map(|(fname, f)| (fname, io::BufReader::new(f)))
    // for each file
    .flat_map(|(f, file)| {
      file
        // read the lines
        .lines()
        // split into words
        .flat_map(|line| {
          line.unwrap().split_whitespace()
            .map(|w| w.to_string()).collect::<Vec<_>>().into_iter()
	    // NOTE: the collect+into_iter here is icky
	    // have a look at the flat_map entry
	    // in the Appendix for why it's here
        })
      // prune duplicates
      .collect::<HashSet<_>>()
        .into_iter()
        // and emit inverted index entry
        .map(move |word| (word, f))
    })
  .fold(HashMap::new(), |mut idx, (word, f)| {
    // absorb all entries into a vector of file names per word
    idx.entry(word)
      .or_insert(Vec::new())
      .push(f);
    idx
  });

  println!("Please enter a search term and press enter:");
  print!("> ");

  let stdin = io::stdin();
  for query in stdin.lock().lines() {
    match idx.get(&*query.unwrap()) {
      Some(files) => println!("appears in {:?}", files),
      None => println!("does not appear in any files"),
    };
    print!("> ");
  }
}

If you are familiar with functional programming, you might find the above both readable and straightforward. If you aren’t, you can substitute the expression starting at let idx = above with:

let mut idx = HashMap::new();
for fname in &args {
  let f = match fs::File::open(fname) {
    Ok(f) => io::BufReader::new(f),
    Err(e) => panic!("input file {} could not be opened: {}", fname, e),
  };
  let mut words = HashSet::new();
  for line in f.lines() {
    for w in line.unwrap().split_whitespace() {
      if words.insert(w.to_string()) {
          // new word seen
          idx.entry(w.to_string()).or_insert(Vec::new()).push(fname);
      }
    }
  }
}

Crucially, these are both valid Rust programs, and you can mix and match between the different styles as you want (you can see more examples in the Hacker News discussion linked to at the top of this post). Furthermore, both result in reasonably efficient code (each file is processed as a stream), terminate nicely with an error if a file could not be opened, and exit cleanly if the user closes the input stream (e.g., with ^D).

The code above shows examples of functional programming and pattern matching in Rust. These are neat, but you can approximate something similar in many other languages. One feature that is relatively unique to Rust, and also turns out to be really useful, is lifetimes. Say, for example, that you want to write a helper function that returns the first and last name of a struct User that contains the user’s full name. You don’t want to copy strings unnecessarily, and instead just want pointers into the existing memory. Something along the lines of:

#include <string.h>
#include <stdio.h>

struct User {
  char full_name[255];
};

char* first_name(struct User *u) {
  return strtok(u->full_name, " ");
}
char* last_name(struct User *u) {
  char *last = strrchr(u->full_name, ' ')+1;
  return last;
}

int main() {
  struct User *u = malloc(sizeof(struct User));
  strcpy(u.full_name, "Jon Gjengset");
  char* last = last_name(&u);
  char* first = first_name(&u);
  // ...
  printf("first: %s, last: %s\n", first, last);
  return 0;
}

The caller now has three pointers into struct User, first, last, and u. What happens if the program now calls free(u) and tries to print first or last? Oops. Since C strings are null-terminated, the code also breaks u->full_name once first_name has been called, because strtok will replace the first space in the string with a null terminator.

Let’s see what this code would look like in Rust:

struct User {
  full_name: String,
}

impl User {
  fn first_name<'a>(&'a self) -> &'a str {
    self.full_name.split_whitespace().next().unwrap()
  }
  fn last_name<'a>(&'a self) -> &'a str {
    self.full_name.split_whitespace().last().unwrap()
  }
}

fn main() {
  let u = User { full_name: "Jon Gjengset".to_string() };
  let first = u.first_name();
  let last = u.last_name();
  println!("first: {}, last: {}", first, last);
}

Notice the weird 'a thing? That’s a Rust lifetime. For first_name and last_name, it says that the returned string reference (&str) can’t outlive the reference to self. Thus, if the programmer tried to call drop(u) (Rust’s equivalent of an explicit free), the compiler would check that they did not later try to use first or last. Pretty neat! Also, since Rust strings aren’t null-terminated (array references instead store their length), we can safely use u.full_name after calling first_name. In fact, the caller knows that this is safe, because first_name takes an immutable reference (&) to the object, and thus can’t modify it.

A great build system

Rust comes with a build tool called Cargo. Cargo is similar to npm, go get, pip and friends; it lets you declare dependencies and build options, and then automatically fetches and builds those when you build your project. This makes it easy for yourself and others to build your code, including third-party testing services like Travis.

Cargo is pretty featureful compared to many of its siblings in other languages. It has versioned dependencies, supports multiple build profiles (e.g., debug vs. release), and can even link against C code. It also has built-in support for uploading and updating packages on crates.io, generating documentation, and running tests.

These latter two points are worth elaborating on. First, Rust makes writing documentation very easy. Comments that start with three slashes (i.e., /// instead of //) are documentation comments, and are automatically associated with the following statement. The contents are rendered as Markdown, and code examples are automatically compiled and run as tests. The idea is that code examples should always build (of course, you can override this for any given code block), and if it doesn’t, that should be considered a test failure. The rendered documentation automatically links between different modules (including the standard library documentation), and is really easy to use.

Most of the standard library has been very well documented by now, and thanks to the ease of writing documentation, most of the available Rust crates (the name of Rust packages) are also pretty well covered (have a look at the documentation for this Rust SQL ORM crate for example). In many cases, the Rust documentation is even better than the documentation for similar C++ or Go features — for example, compare Rust’s Vec, C++’s std::vector, and Go’s slices (part2, part3, part4).

Why would I not choose Rust?

By now, I hope I have convinced you that Rust has some pretty attractive features. However, I bet you are still thinking “okay, but it can’t all be rainbows and roses”. And you are right. There are some things that you might dislike about Rust.

First, Rust is still fairly young, and so is its ecosystem. The Rust team has done an excellent job at building a welcoming community, and the languages is improving constantly, but Rome wasn’t built in a day. There are still some unfinished features in the language itself (although surprisingly few), and some documentation is still missing (though this is being rapidly addressed). A number of useful libraries are still in their early stages, and the tools for the ecosystem are still being developed. There is a lot of interest and engagement from developers though, so the situation is improving daily.

Second, since Rust is not garbage collected, there will be times when you have to fall back to reference counting, just like you have to do in C/C++. Closely related is the fact that Rust does not have green threads similar to Go’s goroutines. Rust dropped green threads early on in favor of moving this to an external crate, which means the integration is not as neat as it is in Go. You can still spawn threads, and the borrow checker will prevent your from much of the nastiness of concurrency bugs, but these will essentially be pthreads, not CSP-style co-routines.

Third, Rust is a fairly complex language compared to C or Go (it’s more comparable to C++), and the compiler is a lot pickier. It can be tricky for a newcomer to the language to get even relatively simple programs to compile, and the learning curve remains steep for quite some time. However, the compiler usually gives extremely helpful feedback when your code doesn’t compile, and the community is very friendly and responsive (I suggest visiting #rust-beginners when you’re starting out). Furthermore, once your code compiles, you’ll find (at least I have) that it is much more likely to be correct (i.e., do the right thing) than if you tried to write similar C, C++, or Go code.

Finally, compared to C (and to some extent C++), Rust’s complexity can also make it harder to understand exactly what the runtime behavior of your code is. That said, as long as you write idiomatic Rust code, you’ll probably find that your code turns out to be as fast as you can expect in most cases. With time and practice, estimating the performance of your code also becomes easier, but it is certainly trickier than in simpler languages like C.

Concluding remarks

I’ve given you what I believe to be a pretty fair and comprehensive overview of Rust compared to C and Go, based on my experience from the past six months. Rust has impressed me immensely, and I urge you to give it a shot. This is especially true if you tried it a while ago and didn’t like it — the language has matured a lot over the past year! If you can think of additional pros and cons for switching to Rust, please let me know either in HN comments, on Twitter, or by e-mail!

Appendix A: Tips & Gotchas

String derefs to &str, and through deref coercion you can call all the methods on &str directly on a String. This is neat, but there is one case where it doesn’t work as you’d hope: if you use a String to index into a HashMap where the keys are &str. This is because Deref is defined on &String, not String. You can prefix your String with a & when using it inside [] to overcome this. In general, you can also get a &str of a String by prefixing it with &*, which comes in handy at times.
If you ever use flat_map, you may get weird lifetime complains from the compiler about the thing you are iterating over inside the flat_map closure not living long enough. This is usually because you have an IntoIter (i.e., an iterator that owns what it’s iterating over), and since iterators are lazily evaluated, the owned value may no longer exist by the time the closure runs. The easiest (though not most efficient) way to overcome this is to write your code like this:
```
// ...
.flat_map(|e| {
  e.into_iter()
   .map(|ee| {
     // ...
   })
   .collect::<Vec<_>>().into_iter()
})
// ...
```
The collect forces the iterator to be evaluated immediately, executing the closure. The resulting list is then converted to an iterator with no borrows, which can safely be returned by the flat_map closure without lifetime issues.
If you have an iterator and you want to add an element (say 1) to the end, you can do this using the following trick:
```
for x in iter.chain(Some(1).into_iter()) {}
```
This exploits the fact that an Option can be turned into an iterator, and chains that single-element iterator onto the existing one, giving you an iterator that yields an extra element after the original iterator ends.
Since the removal of thread::scoped, it has become tricky to spawn threads that borrow from their environment. This is often useful if you want to run a pool of workers that need to share access to some resource. You can often get around this using reference counting, but that’s not always a desirable option. Instead, you should use the scoped-pool crate, which supports scoped workers, or crossbeam::spawn which provides the same functionality without requiring a pool.
Rust currently (as far as I’m aware) does not have a nice way of talking about only one variant within an enum. That is, you cannot write a function that operates on only a particular enum variant, or have a variable that Rust knows is of a particular variant. This can lead to a bunch of code along the lines of:
```
if let MyEnum::ThisVariant(x) = x {
  // do something with x
} else {
  unreachable!();
}
```
The try! macro and the unwrap()/expect() methods mitigate this pain when working with Result or Option types, but do not generalize. If anyone knows of a cleaner way of dealing with this, please let me know!

The Path to Rust (23 min. read)

Why is Rust better for me?

Fewer runtime bugs

Safe concurrency

Performance without sacrifice

Expressivity and productivity

A great build system

Why would I not choose Rust?

Concluding remarks

Appendix A: Tips & Gotchas

Recommend

DB Weekly Issue 359: June 18, 2021

心理契约的打破过程

让 VSCode 在本地 Run 起来

Web Tools #439 - JS Patterns, CSS, Build Tools, React

Creating RPMs from python packages – James Adam

Playing with VSCode: C# Hello World Project

Coding 五年，我在阿里“啃”了块硬骨头

Thoughts On Java 8 Functional Programming (and also Clojure) – James Adam

Python for the Lab

Smashing the Stack in the 21st Century :: Jon Gjengset

About Joyk