Go: When deferring a lock smells

Perhaps you have a code repo written in Go that has made the decision to synchronize some shared data using one or more vanilla mutexes or even the variant read/write mutexes in the standard library.

What’s the plural of mutex by the way?

We know that in Go, the threading model is based on goroutines (coroutines): sometimes called fibers in other languages, and other times called green threads. The point being, that these are not OS-level threads which are very heavy-weight in general and do not scale even close to what Go’s goroutines are capable of.

In fact, it’s normal to have an application with 10’s of thousands or 100’s of thousands of goroutines and even millions are reported for a single application. It really depends on the nature of the app and if the app is heavily IO bounded yes, you can get away having a massive amount of goroutines if needed — but should you?

Okay so what’s the point of all of this? The point is that goroutines are extremely lightweight. Context switching is extremely lightweight and cheap therefore it stands to reason that synchronization in its many forms is also cheap and lightweight. This is all very good for developers like you and me and it means that for the most part we don’t have to worry too much about the cost of using these extremely lightweight concurrency constructs.

Now let’s zoom in a little bit and analyze how sometimes using a mutex can get us into trouble in terms of lock contention.

Here are some operations that are extremely lightweight and cheap in general:

// It’s inexpensive to Lock() and Unlock()
// Perhaps we could just use an atomic variable, but we’re talking about locks
func code() {
    mu.Lock()
    myVal+=1
    mu.Unlock()
}

// Also relatively inexpensive to RLock() and RUnlock()
func code() {
    mu.RLock()
    // read only related code
    mu.RUnlock()
}

// Using defers also can be very cheap...and usually not a problem
func code() {
    mu.Lock()
    defer mu.Unlock()
    myVal+=100
}

// What's cool about defer is it helps us avoid trouble!
// Here's a case where it is not leveraged at all.
func code() error {
    // Without defer, we introduced a deadlock because we return an error
    // and failed to call Unlock on an early return!
    mu.Lock()

stats.counter +=1

stats, err := getStats()
    if err != nil {
       // oops, forgot to Unlock() here
       return err
    }

// ...

mu.Unlock()
}

// defer is our friends and help us avoid this problem.
// Instead, we don't have to sprinkle Unlock() every time we early return.
func code() error {
    // With defer, we're covered no matter how many early returns we have
    // Of course, we're covered at the final implicit return as well.
    mu.Lock()
    defer mu.Unlock()

someCounter+=1

stats, err := getStats()
    if err != nil {
       // mu.Unlock() is deferred to here
       return err
    }

// ...

// mu.Unlock() is deferred to here as well
}

This is all well and good and the practice of using: defer + Unlock() is encouraged in general. It helps us avoid dead-lock scenarios which can be tricky to troubleshoot and track down. Yes, the Go runtime has partial dead-lock detection but it will not catch everything.

Also, today we’re actually talking about lock contention so please keep reading.

Let’s take a step back and remind ourselves that if goroutines are super cheap along with context switching and synchronizing than why do we really need to worry about lock contention in the first place?

I want you to think about locking in a different way: Think about lock lifetimes. Another way to say this: How long do you hold a lock for? The actual act of locking or unlocking is fast and will always remain fast. In fact locks are not really slow at all in Go. How long a lock is held is what can be slow.

In the very last piece of code did anyone see anything troubling with the code block? I’ll copy it here so you can review it once more. I know this code is artificial and contrived and you don’t have the full picture of what this code is actually doing but please analyze the code closely to see where lock contention could be introduced.

// We're using defer like good gophers but...
// could this code possibly introduce lock contention???
func code() error {
    mu.Lock()
    defer mu.Unlock()

someCounter+=1

stats, err := getStats()
    if err != nil {
       // mu.Unlock() is deferred to here
       return err
    }

// ...

// mu.Unlock() is deferred to here as well
}

Well I have not shared the implementation of getStats()but perhaps the function does this:

func getStats() (*SimpleStats, error) {
    s := &SimpleStats{}
    // Just imagine simple code here
    // ...
    return s, nil
}

In the above contrived code, the code simply returns a pointer-type of SimpleStats and practically does nothing useful. No real heavy lifting is done in the form of computation. No IO work is done. Big deal, this code is fast because it’s not doing much at all.

Now what if getStats() was doing something like this?

func getStats() (*SimpleStats, error) {
    resp, err := http.Get("http://www.isitlunchyet.com")
    if err != nil {
        return nil, err
    }

// Here's our friendly defer again...
    defer resp.Body.Close()

// Yes, i'm aware of the json decoder...
    body, err := io.ReadAll(resp.Body)
    if err != nil {
        return nil, err
    }

var stats SimpleStats
    err = json.Unmarshal(body, &stats)
    if err != nil {
        return nil, err
    }

return &stats, nil
}

Let’s get back to analyzing the code in question and now do we see how lock contention could possibly be introduced?

// Hmmm...
func code() error {
    mu.Lock()
    defer mu.Unlock()

someCounter+=1

stats, err := getStats()
    if err != nil {
       // mu.Unlock() is deferred to here
       return err
    }

// ...

// mu.Unlock() is deferred to here as well
}

Do you see it?

We’re now at the pinnacle of this article and something that I hope is becoming more obvious by now. If getStats() is for example an IO bound function perhaps doing an HTTP request, the latency on the request will in fact be orders of magnitude slower than what our code is doing so far. In fact, it could be very slow. It could be having to fetch stats on some server on the other side of the world. Or perhaps that server could be busy aggregating the stats data and may spend huge amounts of time just having to serialize a response back to our initial request. So many things may contribute to its slowness.

This means that the defer Unlock() gets us into trouble because our defer is contingent upon the function ending. The function ending is now contingent upon how long it takes for the getStats() endpoint to return to us. This means that what originally looked like a best practice is now seemingly causing our service to degrade significantly because other code that wants to acquire a lock is now waiting for a very long period of time.

Bad, bad, bad! We’ve introduced lock contention because the lock lifetime has increased by many orders of magnitude because we’re now waiting on an IO response from a function that can easily be slow relative to the cost of rest of the code.

These are the things that a Go programmer working on concurrent code-bases have to think through. As a programmer writing concurrent/multi-threaded code you must get into the habit of think about your code in terms of logic, in terms of event ordering, and in terms of lifetimes in order to avoid problems like this.

Can the code be fixed? Yes! let’s just introduce some finer and more granular locking.

// Fixed now!
func code() error {
    // Forget defer, we're in control let's make the locking more granular.
    // After all, we only need to synchronize this stupid counter.
    mu.Lock()
    someCounter+=1    
    mu.Unlock()

// If getStats() takes awhile...the lock was Unlocked already!
    // No contention!
    stats, err := getStats()
    if err != nil {
       return err
    }
}

Friends, don’t forget you are in final control and sometimes code must be refactored to mitigate problems like this. I like defer too, but in this case it gets us into serious trouble and our critical section could be more tightly managed by simply calling Lock() and immediately following up with Unlock().

Here’s some helpful mutex/locking guidelines to consider:

First off, why are using a lock in the first place? Maybe a channel is better?
Think about how long a lock is held
Avoid locking over IO, and possibly over expensive operations
Even waiting on a channel could extend a locks lifetime.
Keep your locks as granular as possible for a snappy application!
But not too granular, you can introduce other problems
Keep locks encapsulated
In read-heavy scenarios, use a RWMutex
Avoid deadlocks, lock-contention and inversion of priority
Repeat after me: locks aren’t really slow at all in Go
Defer the Unlock() when it makes sense
None of this means anything if you don’t instrument your code to really see what’s going on…

Hey, if this blog post helped you catch a bug in your codebase I want to know about it in the comments! Please share the dirty details. This problem although contrived for demonstration purposes comes up now and again. I encourage you to think about lock lifetimes to hopefully catch problems like this.

Now for a poem:

In Go, be aware, take your stand,
For lock contention is at hand.
Mutex locks may seem quite benign,
But they can tangle code in twine.

Threads all clamoring for a share,
In a dance of caution, beware.
A race for resources, quite intense,
A deadlock could be the consequence.

So design with care, do not be rash,
Or else your program may just crash.
Take heed, be wise, don't skip the lesson,
Avoid the plight of lock contention.

By: Gideon Percival Thorne the IV

That’s it for now and if you read the entire article and found it helpful I would appreciate some feedback or comments!

Cheers,

-deckarep

Go: When deferring a lock smells

Go: When deferring a lock smells

Recommend

裁员中的58同城：老问题难解、护城河难建

3 brands that have mastered the art and science of marketing

Solana 推出 ChatGPT 插件，并将 AI 赠款基金从 100 万美元扩大至 1000 万美元

NASA's Lunar Orbiter Spots Japanese Lander Crash Site

Amateur Astronomers Can See a New Supernova in the Night Sky

It is starting to get strange.

【密码学】为什么不推荐在对称加密中使用CBC工作模式 - 9eek

Why car parks are the hottest space in solar power

SignalR WebSocket通讯机制 - wskxy

Windows 11 is adding a one-stop RGB lighting shop - The Verge

About Joyk