Writing Well-Documented Code - Learn from Examples - JOYK Joy of Geek, Geek News, Link all geek

Writing Well-Documented Code - Learn from Examples

04 September 2021

Who doesn’t like well-documented code? I greatly appreciate when the author of the code I’m reading took his/her time to explain it.

While there’s a never-ending debate about how to write comments and how to write code that’s easy to read without comments, perhaps it can be generally accepted that documenting one’s code one way or another is an important skill for a programmer.

The reason we started Code Catalog is because we believe that we can improve our coding skills by reading good code written by others. Commenting code is no exception. So let’s look at a few examples of good code with helpful comments.

Exhibit A - Firecracker

Firecracker is an open source virtualization technology that is purpose-built for creating and managing secure, multi-tenant container and function-based services that provide serverless operational models. Firecracker runs workloads in lightweight virtual machines, called microVMs, which combine the security and isolation properties provided by hardware virtualization technology with the speed and flexibility of containers.

Take a look at Firecracker’s rate limiter responsible for throttling operations and bandwidth. It’s written in Rust and is based on the Token Bucket algorithm. We recently wrote about it.

Expand (very long block of code)

This code sparks joy, doesn’t it? Mostly due to how well it is documented. It starts with an extensive doc comment explaining what it does and the overall approach, and then the rest of the implementation is documented in abundant details.

E.g. consider the method below. It would still be rather readable even without comments, but they certainly make understanding it even easier.

/// Attempts to consume `tokens` from the bucket and returns whether the action succeeded.
pub fn reduce(&mut self, mut tokens: u64) -> BucketReduction {
    // First things first: consume the one-time-burst budget.
    if self.one_time_burst > 0 {
        // We still have burst budget for *all* tokens requests.
        if self.one_time_burst >= tokens {
            self.one_time_burst -= tokens;
            self.last_update = Instant::now();
            // No need to continue to the refill process, we still have burst budget to consume from.
            return BucketReduction::Success;
        } else {
            // We still have burst budget for *some* of the tokens requests.
            // The tokens left unfulfilled will be consumed from current `self.budget`.
            tokens -= self.one_time_burst;
            self.one_time_burst = 0;
        }
    }

    if tokens > self.budget {
        // Hit the bucket bottom, let's auto-replenish and try again.
        self.auto_replenish();

        // This operation requests a bandwidth higher than the bucket size
        if tokens > self.size {
            error!(
                "Consumed {} tokens from bucket of size {}",
                tokens, self.size
            );
            // Empty the bucket and report an overconsumption of
            // (remaining tokens / size) times larger than the bucket size
            tokens -= self.budget;
            self.budget = 0;
            return BucketReduction::OverConsumption(tokens as f64 / self.size as f64);
        }

        if tokens > self.budget {
            // Still not enough tokens, consume() fails, return false.
            return BucketReduction::Failure;
        }
    }

    self.budget -= tokens;
    BucketReduction::Success
}

Can it be rewritten to make some comments redundant? Perhaps something like this:

// First things first: consume the one-time-burst budget.
if self.one_time_burst > 0 {
  // ...
}

->

let has_remaining_one_time_burst = self.one_time_burst > 0;
if has_remaining_one_time_burst {
  // ...
}

It is better? I’m not sure.

Some comments might be a little unnecessary, e.g.

// Compute time passed since last refill/update.
let time_delta = self.last_update.elapsed().as_nanos() as u64;

- surely it’s perfectly clear already - but overall they make the code so easy to read.

Exhibit B - Protobuf

Protocol buffers (Protobuf) are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

Let’s look at Protobuf’s tokenizer. While the entire file is rather well-documented, what really stands out is the top-level comment about what the code does, why it is the way it is, what design alternatives were considered but rejected and why it exists in the first place. Comments like this are invaluable to understanding the code.

The tone of the comments is unusually personal, but I can’t complain. The only concern is if the code is updated by someone else, what should they do about the comments.

// Here we have a hand-written lexer.  At first you might ask yourself,
// "Hand-written text processing?  Is Kenton crazy?!"  Well, first of all,
// yes I am crazy, but that's beside the point.  There are actually reasons
// why I ended up writing this this way.
//
// The traditional approach to lexing is to use lex to generate a lexer for
// you.  Unfortunately, lex's output is ridiculously ugly and difficult to
// integrate cleanly with C++ code, especially abstract code or code meant
// as a library.  Better parser-generators exist but would add dependencies
// which most users won't already have, which we'd like to avoid.  (GNU flex
// has a C++ output option, but it's still ridiculously ugly, non-abstract,
// and not library-friendly.)
//
// The next approach that any good software engineer should look at is to
// use regular expressions.  And, indeed, I did.  I have code which
// implements this same class using regular expressions.  It's about 200
// lines shorter.  However:
// - Rather than error messages telling you "This string has an invalid
//   escape sequence at line 5, column 45", you get error messages like
//   "Parse error on line 5".  Giving more precise errors requires adding
//   a lot of code that ends up basically as complex as the hand-coded
//   version anyway.
// - The regular expression to match a string literal looks like this:
//     kString  = new RE("(\"([^\"\\\\]|"              // non-escaped
//                       "\\\\[abfnrtv?\"'\\\\0-7]|"   // normal escape
//                       "\\\\x[0-9a-fA-F])*\"|"       // hex escape
//                       "\'([^\'\\\\]|"        // Also support single-quotes.
//                       "\\\\[abfnrtv?\"'\\\\0-7]|"
//                       "\\\\x[0-9a-fA-F])*\')");
//   Verifying the correctness of this line noise is actually harder than
//   verifying the correctness of ConsumeString(), defined below.  I'm not
//   even confident that the above is correct, after staring at it for some
//   time.
// - PCRE is fast, but there's still more overhead involved than the code
//   below.
// - Sadly, regular expressions are not part of the C standard library, so
//   using them would require depending on some other library.  For the
//   open source release, this could be really annoying.  Nobody likes
//   downloading one piece of software just to find that they need to
//   download something else to make it work, and in all likelihood
//   people downloading Protocol Buffers will already be doing so just
//   to make something else work.  We could include a copy of PCRE with
//   our code, but that obligates us to keep it up-to-date and just seems
//   like a big waste just to save 200 lines of code.
//
// On a similar but unrelated note, I'm even scared to use ctype.h.
// Apparently functions like isalpha() are locale-dependent.  So, if we used
// that, then if this code is being called from some program that doesn't
// have its locale set to "C", it would behave strangely.  We can't just set
// the locale to "C" ourselves since we might break the calling program that
// way, particularly if it is multi-threaded.  WTF?  Someone please let me
// (Kenton) know if I'm missing something here...
//
// I'd love to hear about other alternatives, though, as this code isn't
// exactly pretty.

Read more about this piece of code in our article.

Exhibit C - AWS CLI

Universal Command Line Interface for Amazon Web Services.

This shorthand parser in AWS CLI demonstrates that good comments don’t need to be verbose.

The parser (learn more about it in our article), which is already quite easy to understand (short methods doing one thing each, good names), is made even more readable by short comments with the expected format of the input in the beginning of most parsing methods. E.g.

def _parameter(self):
    # parameter = keyval *("," keyval)
    params = {}
    key, val = self._keyval()
    params[key] = val
    last_index = self._index
    while self._index < len(self._input_value):
        self._expect(',', consume_whitespace=True)
        key, val = self._keyval()
        # If a key is already defined, it is likely an incorrectly written
        # shorthand argument. Raise an error to inform the user.
        if key in params:
            raise DuplicateKeyInObjectError(
                key, self._input_value, last_index + 1
            )
        params[key] = val
        last_index = self._index
    return params

def _keyval(self):
    # keyval = key "=" [values]
    key = self._key()
    self._expect('=', consume_whitespace=True)
    values = self._values()
    return key, values

# parameter = keyval *("," keyval) and # keyval = key "=" [values] really help, don’t they? Even though _keyval is almost trivial, the comment makes it even clearer.

Disclaimer

The usual disclaimer: terms like good and helpful are subjective. If you disagree that these code examples are documented well, let’s debate. Leave a comment or shoot me an email.

Edit this page on GitHub.

Writing Well-Documented Code - Learn from Examples