0

The Path to Rust :: Jon Gjengset

 2 years ago
source link: https://thesquareplanet.com/blog/the-path-to-rust/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

The Path to Rust (23 min. read)

Posted on May 25, 2016 — shared on Hacker News Twitter Reddit Lobsters

About six months ago, I started my first large-scale Rust-based pro­ject. I’d dab­bled with the lan­guage in its early days, but back then it was a dif­fer­ent beast, and not par­tic­u­larly ap­proach­able. I de­cided to try again, and I’m glad I did. Rust is quickly be­com­ing my fa­vorite lan­guage for all sys­tems work (which is most of what I do any­way), and has largely re­placed both Go, Python, and C/C++ in my day-to-day.

Rust helps you avoid a lot of silly mis­takes while also being ex­pres­sive, flex­i­ble, and fast. How­ever, that’s not what is most im­por­tant to me. I like writ­ing pro­grams in Rust. It’s the first time in quite a long time that I am ex­cited to be cod­ing in a lan­guage — I ac­tively want to con­vert old pro­jects to Rust. YMMV of course, but I urge you to give it a shot!

Rust is not the most be­gin­ner-friendly lan­guage out there — the com­piler is not as le­nient and for­giv­ing as that of most other lan­guages (Go, I’m look­ing at you), and will reg­u­larly re­ject your code (al­beit usu­ally for good rea­sons). This cre­ates a rel­a­tively high bar­rier to entry, even for peo­ple with ex­ten­sive pro­gram­ming back­grounds. In par­tic­u­lar, Rust’s “catch bugs at com­pile time” men­tal­ity means that you often do not see par­tial progress — ei­ther your pro­gram doesn’t com­pile, or it runs and does the right thing. Ob­vi­ously, this is not al­ways true, but it can make it harder to learn by doing than in other, less strict lan­guages.

This post is not meant to be a com­pre­hen­sive in­tro­duc­tion to Rust. If you want to learn Rust, you should go read the ex­cel­lent Rust book. In­stead, I will at­tempt to give an eval­u­a­tion of Rust for de­vel­op­ers com­ing from other sys­tems lan­guages (Go and C/C++ in par­tic­u­lar), and to point out why they may or may not want to try Rust. At the end, I’ll also point out some tips and gotcha’s, at the end for those who are in­ter­ested in that kind of stuff.

Why is Rust bet­ter for me?

When re­search­ing a new lan­guage, de­vel­op­ers (like you, dear reader) will in­evitably focus on how the lan­guage in ques­tion is dif­fer­ent from the one they are cur­rently using. In par­tic­u­lar, they want to know whether (and if so, how) the new lan­guages is bet­ter. The Rust web­site has a list of Rust “fea­tures”, but that’s not all that help­ful if you’re try­ing to de­cide whether the new lan­guage is bet­ter for you. So, let’s go through some of the ways Rust might make your life eas­ier.

Fewer run­time bugs

If de­bug­ging is the process of re­mov­ing bugs, then pro­gram­ming must be the process of putting them in.

Eds­ger W. Dijk­stra [ci­ta­tion needed]

Code is very rarely cor­rect the first time it is writ­ten, es­pe­cially for com­plex sys­tems code. Even for sea­soned de­vel­op­ers, a large por­tion of pro­gram­ming time is spent de­bug­ging why code doesn’t do what it’s sup­posed to.

One of the Tor de­vel­op­ers re­cently did a ret­ro­spec­tive on what kinds of bugs had crept into the Tor onion router over the past cou­ple of years, as well as how they could have been avoided. While read­ing through the list of iden­ti­fied is­sues, I no­ticed that many of these would be caught at com­pile-time in Rust. To note a few:

  • 2.1 would be caught by Rust’s over­flow de­tec­tion.
  • 2.2 would be a non-is­sue in Rust — void* is not avail­able, and the more ex­pres­sive type sys­tem (gener­ics for ex­am­ple) would ren­der them un­nec­es­sary any­way.
  • 2.4 speaks for it­self.
  • 3.1: Rust’s life­times are de­signed to ad­dress ex­actly this issue.
  • 3.3: Pat­tern match­ing in Rust (think of it as a switch on steroids) is checked for com­plete­ness (i.e., all pos­si­ble cases are han­dled) at com­pile-time.
  • 4.1: Prop­a­gat­ing Results with try! is a com­mon pat­tern in Rust, which would ef­fec­tively pro­vide ex­actly this kind of be­hav­ior.
  • 9.1: Rust has strong con­ven­tions for re­turn val­ues that can re­turn er­rors (ba­si­cally, use Result), and callers must deal with the fact that a func­tion can error.
  • 10.2/10.3: Rust’s bor­row checker en­forces that data can’t be si­mul­ta­ne­ously read and writ­ten, mak­ing this kind of bug im­pos­si­ble.

Go, which has in­creas­ingly been adopted as a sys­tems lan­guage, solves some of these is­sues, but far from all. It also in­tro­duces its own set of is­sues, such as type-cast­ing from the interface{} type or data races be­tween Gor­ou­tines, which are non-ex­is­tent in Rust.

Safe con­cur­rency

This lat­ter point is par­tic­u­larly in­ter­est­ing; the Rust com­piler will not com­pile a pro­gram that has a po­ten­tial data race in it. Un­less you ex­plic­itly mark your code as unsafe (which you rarely, if ever, need to do), your code sim­ply can­not have data races. Rust checks this using the bor­row checker, which en­forces two sim­ple rules:

First, any bor­row must last for a scope no greater than that of the owner. Sec­ond, you may have one or the other of these two kinds of bor­rows, but not both at the same time:

  • one or more ref­er­ences (&T) to a re­source,
  • ex­actly one mu­ta­ble ref­er­ence (&mut T).

The first rule en­sures that you never use a value after it has gone out of scope (erad­i­cat­ing use-af­ter-free and dou­ble-free in one fell swoop). The sec­ond rule guar­an­tees that you have no data races, since you can­not have two mu­ta­ble ref­er­ences to the same data, nor can you have one thread mod­ify while an­other thread reads. This might seem re­stric­tive at first, but all the so­lu­tions you would use to avoid races in reg­u­lar code are fully sup­ported, and their cor­rect­ness is checked at com­pile-time: you can add locks to allow two threads mu­ta­ble ac­cess to a vari­able, or use atomic op­er­a­tions to im­ple­ment RCU and other al­go­rithms that allow con­cur­rent reads and writes.

Per­for­mance with­out sac­ri­fice

Some of the bugs found by the Tor de­vel­op­ers are han­dled in other higher-level lan­guages as well. Un­for­tu­nately, higher-level lan­guages are often not a great fit for sys­tems code. Sys­tems code is often per­for­mance crit­i­cal (e.g., ker­nels, data­bases), so the de­vel­oper wants pre­dictable per­for­mance, and tight con­trol over mem­ory al­lo­ca­tion/de-al­lo­ca­tion and data lay­out. This can be hard to achieve in higher-level lan­guages or when using a garbage col­lec­tor.

Rust pro­vides fea­tures that are often as­so­ci­ated with high-level lan­guages (such as au­to­matic mem­ory free-ing when val­ues go out of scope, pat­tern match­ing, func­tional pro­gram­ming ab­strac­tions, a pow­er­ful type sys­tem), as well as pow­er­ful fea­tures like the bor­row checker, with no run­time cost. This might seem un­be­liev­able (and it ad­mit­tedly still feels that way to me), but Rust’s claim to achieve per­for­mance com­pa­ra­ble to that of C++ seems to be sup­ported in mul­ti­ple bench­marks.

Fur­ther­more, Rust gives the de­vel­oper con­trol over when mem­ory is al­lo­cated, and how it is laid out. This in turn al­lows straight­for­ward and ef­fi­cient in­ter­ac­tion with C APIs (and other lan­guages) through the For­eign Func­tion In­ter­face, and makes it easy to in­ter­act with high-per­for­mance li­braries like BLAS, or low-level toolk­its like DPDK, which may not be avail­able na­tively in Rust (yet).

Ex­pres­siv­ity and pro­duc­tiv­ity

One of the rea­sons de­vel­op­ers often re­port to be more pro­duc­tive in higher-level lan­guages is the avail­abil­ity of higher-level prim­i­tives. Con­sider the case of con­struct­ing an in­verted index for a given string. In C (or C++), you might write some­thing like this (there are ex­am­ples in other lan­guages there too).

Let’s have a look at how you might im­ple­ment some­thing sim­i­lar in Rust. Note that it’s not ac­tu­ally the same as the C++ ex­am­ple, since that also im­ple­ments a Trie. For a more ap­ples-to-ap­ples com­par­i­son, con­sider this C++ vari­ant writ­ten by a Red­di­tor.

fn main() {
  use std::io;
  use std::fs;
  use std::env;
  use std::collections::{HashMap, HashSet};
  use std::io::BufRead;

  let args = env::args().skip(1).collect::<Vec<_>>();
  let idx = args
    // iterate over our arguments
    .iter()
    // open each file
    .map(|fname| (fname.as_str(), fs::File::open(fname.as_str())))
    // check for errors
    .map(|(fname, f)| {
      f.and_then(|f| Ok((fname, f)))
        .expect(&format!("input file {} could not be opened", fname))
    })
  // make a buffered reader
  .map(|(fname, f)| (fname, io::BufReader::new(f)))
    // for each file
    .flat_map(|(f, file)| {
      file
        // read the lines
        .lines()
        // split into words
        .flat_map(|line| {
          line.unwrap().split_whitespace()
            .map(|w| w.to_string()).collect::<Vec<_>>().into_iter()
	    // NOTE: the collect+into_iter here is icky
	    // have a look at the flat_map entry
	    // in the Appendix for why it's here
        })
      // prune duplicates
      .collect::<HashSet<_>>()
        .into_iter()
        // and emit inverted index entry
        .map(move |word| (word, f))
    })
  .fold(HashMap::new(), |mut idx, (word, f)| {
    // absorb all entries into a vector of file names per word
    idx.entry(word)
      .or_insert(Vec::new())
      .push(f);
    idx
  });

  println!("Please enter a search term and press enter:");
  print!("> ");

  let stdin = io::stdin();
  for query in stdin.lock().lines() {
    match idx.get(&*query.unwrap()) {
      Some(files) => println!("appears in {:?}", files),
      None => println!("does not appear in any files"),
    };
    print!("> ");
  }
}

If you are fa­mil­iar with func­tional pro­gram­ming, you might find the above both read­able and straight­for­ward. If you aren’t, you can sub­sti­tute the ex­pres­sion start­ing at let idx = above with:

let mut idx = HashMap::new();
for fname in &args {
  let f = match fs::File::open(fname) {
    Ok(f) => io::BufReader::new(f),
    Err(e) => panic!("input file {} could not be opened: {}", fname, e),
  };
  let mut words = HashSet::new();
  for line in f.lines() {
    for w in line.unwrap().split_whitespace() {
      if words.insert(w.to_string()) {
          // new word seen
          idx.entry(w.to_string()).or_insert(Vec::new()).push(fname);
      }
    }
  }
}

Cru­cially, these are both valid Rust pro­grams, and you can mix and match be­tween the dif­fer­ent styles as you want (you can see more ex­am­ples in the Hacker News dis­cus­sion linked to at the top of this post). Fur­ther­more, both re­sult in rea­son­ably ef­fi­cient code (each file is processed as a stream), ter­mi­nate nicely with an error if a file could not be opened, and exit cleanly if the user closes the input stream (e.g., with ^D).

The code above shows ex­am­ples of func­tional pro­gram­ming and pat­tern match­ing in Rust. These are neat, but you can ap­prox­i­mate some­thing sim­i­lar in many other lan­guages. One fea­ture that is rel­a­tively unique to Rust, and also turns out to be re­ally use­ful, is life­times. Say, for ex­am­ple, that you want to write a helper func­tion that re­turns the first and last name of a struct User that con­tains the user’s full name. You don’t want to copy strings un­nec­es­sar­ily, and in­stead just want point­ers into the ex­ist­ing mem­ory. Some­thing along the lines of:

#include <string.h>
#include <stdio.h>

struct User {
  char full_name[255];
};

char* first_name(struct User *u) {
  return strtok(u->full_name, " ");
}
char* last_name(struct User *u) {
  char *last = strrchr(u->full_name, ' ')+1;
  return last;
}

int main() {
  struct User *u = malloc(sizeof(struct User));
  strcpy(u.full_name, "Jon Gjengset");
  char* last = last_name(&u);
  char* first = first_name(&u);
  // ...
  printf("first: %s, last: %s\n", first, last);
  return 0;
}

The caller now has three point­ers into struct User, first, last, and u. What hap­pens if the pro­gram now calls free(u) and tries to print first or last? Oops. Since C strings are null-ter­mi­nated, the code also breaks u->full_name once first_name has been called, be­cause strtok will re­place the first space in the string with a null ter­mi­na­tor.

Let’s see what this code would look like in Rust:

struct User {
  full_name: String,
}

impl User {
  fn first_name<'a>(&'a self) -> &'a str {
    self.full_name.split_whitespace().next().unwrap()
  }
  fn last_name<'a>(&'a self) -> &'a str {
    self.full_name.split_whitespace().last().unwrap()
  }
}

fn main() {
  let u = User { full_name: "Jon Gjengset".to_string() };
  let first = u.first_name();
  let last = u.last_name();
  println!("first: {}, last: {}", first, last);
}

No­tice the weird 'a thing? That’s a Rust life­time. For first_name and last_name, it says that the re­turned string ref­er­ence (&str) can’t out­live the ref­er­ence to self. Thus, if the pro­gram­mer tried to call drop(u) (Rust’s equiv­a­lent of an ex­plicit free), the com­piler would check that they did not later try to use first or last. Pretty neat! Also, since Rust strings aren’t null-ter­mi­nated (array ref­er­ences in­stead store their length), we can safely use u.full_name after call­ing first_name. In fact, the caller knows that this is safe, be­cause first_name takes an im­mutable ref­er­ence (&) to the ob­ject, and thus can’t mod­ify it.

A great build sys­tem

Rust comes with a build tool called Cargo. Cargo is sim­i­lar to npm, go get, pip and friends; it lets you de­clare de­pen­den­cies and build op­tions, and then au­to­mat­i­cally fetches and builds those when you build your pro­ject. This makes it easy for your­self and oth­ers to build your code, in­clud­ing third-party test­ing ser­vices like Travis.

Cargo is pretty fea­ture­ful com­pared to many of its sib­lings in other lan­guages. It has ver­sioned de­pen­den­cies, sup­ports mul­ti­ple build pro­files (e.g., debug vs. re­lease), and can even link against C code. It also has built-in sup­port for up­load­ing and up­dat­ing pack­ages on crates.io, gen­er­at­ing doc­u­men­ta­tion, and run­ning tests.

These lat­ter two points are worth elab­o­rat­ing on. First, Rust makes writ­ing doc­u­men­ta­tion very easy. Com­ments that start with three slashes (i.e., /// in­stead of //) are doc­u­men­ta­tion com­ments, and are au­to­mat­i­cally as­so­ci­ated with the fol­low­ing state­ment. The con­tents are ren­dered as Mark­down, and code ex­am­ples are au­to­mat­i­cally com­piled and run as tests. The idea is that code ex­am­ples should al­ways build (of course, you can over­ride this for any given code block), and if it doesn’t, that should be con­sid­ered a test fail­ure. The ren­dered doc­u­men­ta­tion au­to­mat­i­cally links be­tween dif­fer­ent mod­ules (in­clud­ing the stan­dard li­brary doc­u­men­ta­tion), and is re­ally easy to use.

Most of the stan­dard li­brary has been very well doc­u­mented by now, and thanks to the ease of writ­ing doc­u­men­ta­tion, most of the avail­able Rust crates (the name of Rust pack­ages) are also pretty well cov­ered (have a look at the doc­u­men­ta­tion for this Rust SQL ORM crate for ex­am­ple). In many cases, the Rust doc­u­men­ta­tion is even bet­ter than the doc­u­men­ta­tion for sim­i­lar C++ or Go fea­tures — for ex­am­ple, com­pare Rust’s Vec, C++’s std::vector, and Go’s slices (part2, part3, part4).

Why would I not choose Rust?

By now, I hope I have con­vinced you that Rust has some pretty at­trac­tive fea­tures. How­ever, I bet you are still think­ing “okay, but it can’t all be rain­bows and roses”. And you are right. There are some things that you might dis­like about Rust.

First, Rust is still fairly young, and so is its ecosys­tem. The Rust team has done an ex­cel­lent job at build­ing a wel­com­ing com­mu­nity, and the lan­guages is im­prov­ing con­stantly, but Rome wasn’t built in a day. There are still some un­fin­ished fea­tures in the lan­guage it­self (al­though sur­pris­ingly few), and some doc­u­men­ta­tion is still miss­ing (though this is being rapidly ad­dressed). A num­ber of use­ful li­braries are still in their early stages, and the tools for the ecosys­tem are still being de­vel­oped. There is a lot of in­ter­est and en­gage­ment from de­vel­op­ers though, so the sit­u­a­tion is im­prov­ing daily.

Sec­ond, since Rust is not garbage col­lected, there will be times when you have to fall back to ref­er­ence count­ing, just like you have to do in C/C++. Closely re­lated is the fact that Rust does not have green threads sim­i­lar to Go’s gor­ou­tines. Rust dropped green threads early on in favor of mov­ing this to an ex­ter­nal crate, which means the in­te­gra­tion is not as neat as it is in Go. You can still spawn threads, and the bor­row checker will pre­vent your from much of the nas­ti­ness of con­cur­rency bugs, but these will es­sen­tially be pthreads, not CSP-style co-rou­tines.

Third, Rust is a fairly com­plex lan­guage com­pared to C or Go (it’s more com­pa­ra­ble to C++), and the com­piler is a lot pick­ier. It can be tricky for a new­comer to the lan­guage to get even rel­a­tively sim­ple pro­grams to com­pile, and the learn­ing curve re­mains steep for quite some time. How­ever, the com­piler usu­ally gives ex­tremely help­ful feed­back when your code doesn’t com­pile, and the com­mu­nity is very friendly and re­spon­sive (I sug­gest vis­it­ing #rust-be­gin­ners when you’re start­ing out). Fur­ther­more, once your code com­piles, you’ll find (at least I have) that it is much more likely to be cor­rect (i.e., do the right thing) than if you tried to write sim­i­lar C, C++, or Go code.

Fi­nally, com­pared to C (and to some ex­tent C++), Rust’s com­plex­ity can also make it harder to un­der­stand ex­actly what the run­time be­hav­ior of your code is. That said, as long as you write id­iomatic Rust code, you’ll prob­a­bly find that your code turns out to be as fast as you can ex­pect in most cases. With time and prac­tice, es­ti­mat­ing the per­for­mance of your code also be­comes eas­ier, but it is cer­tainly trick­ier than in sim­pler lan­guages like C.

Con­clud­ing re­marks

I’ve given you what I be­lieve to be a pretty fair and com­pre­hen­sive overview of Rust com­pared to C and Go, based on my ex­pe­ri­ence from the past six months. Rust has im­pressed me im­mensely, and I urge you to give it a shot. This is es­pe­cially true if you tried it a while ago and didn’t like it — the lan­guage has ma­tured a lot over the past year! If you can think of ad­di­tional pros and cons for switch­ing to Rust, please let me know ei­ther in HN com­ments, on Twit­ter, or by e-mail!

Ap­pen­dix A: Tips & Gotchas

  • String derefs to &str, and through deref co­er­cion you can call all the meth­ods on &str di­rectly on a String. This is neat, but there is one case where it doesn’t work as you’d hope: if you use a String to index into a HashMap where the keys are &str. This is be­cause Deref is de­fined on &String, not String. You can pre­fix your String with a & when using it in­side [] to over­come this. In gen­eral, you can also get a &str of a String by pre­fix­ing it with &*, which comes in handy at times.

  • If you ever use flat_map, you may get weird life­time com­plains from the com­piler about the thing you are it­er­at­ing over in­side the flat_map clo­sure not liv­ing long enough. This is usu­ally be­cause you have an IntoIter (i.e., an it­er­a­tor that owns what it’s it­er­at­ing over), and since it­er­a­tors are lazily eval­u­ated, the owned value may no longer exist by the time the clo­sure runs. The eas­i­est (though not most ef­fi­cient) way to over­come this is to write your code like this:

    // ...
    .flat_map(|e| {
      e.into_iter()
       .map(|ee| {
         // ...
       })
       .collect::<Vec<_>>().into_iter()
    })
    // ...
    

    The collect forces the it­er­a­tor to be eval­u­ated im­me­di­ately, ex­e­cut­ing the clo­sure. The re­sult­ing list is then con­verted to an it­er­a­tor with no bor­rows, which can safely be re­turned by the flat_map clo­sure with­out life­time is­sues.

  • If you have an it­er­a­tor and you want to add an el­e­ment (say 1) to the end, you can do this using the fol­low­ing trick:

    for x in iter.chain(Some(1).into_iter()) {}
    

    This ex­ploits the fact that an Option can be turned into an it­er­a­tor, and chains that sin­gle-el­e­ment it­er­a­tor onto the ex­ist­ing one, giv­ing you an it­er­a­tor that yields an extra el­e­ment after the orig­i­nal it­er­a­tor ends.

  • Since the re­moval of thread::scoped, it has be­come tricky to spawn threads that bor­row from their en­vi­ron­ment. This is often use­ful if you want to run a pool of work­ers that need to share ac­cess to some re­source. You can often get around this using ref­er­ence count­ing, but that’s not al­ways a de­sir­able op­tion. In­stead, you should use the scoped-pool crate, which sup­ports scoped work­ers, or crossbeam::spawn which pro­vides the same func­tion­al­ity with­out re­quir­ing a pool.

  • Rust cur­rently (as far as I’m aware) does not have a nice way of talk­ing about only one vari­ant within an enum. That is, you can­not write a func­tion that op­er­ates on only a par­tic­u­lar enum vari­ant, or have a vari­able that Rust knows is of a par­tic­u­lar vari­ant. This can lead to a bunch of code along the lines of:

    if let MyEnum::ThisVariant(x) = x {
      // do something with x
    } else {
      unreachable!();
    }
    

    The try! macro and the unwrap()/expect() meth­ods mit­i­gate this pain when work­ing with Result or Option types, but do not gen­er­al­ize. If any­one knows of a cleaner way of deal­ing with this, please let me know!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK