9

Moving Away from UUIDs

 1 year ago
source link: https://neilmadden.blog/2018/08/30/moving-away-from-uuids/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Moving away from UUIDs

Woman throwing dice in the air

If you need an unguessable random string (for a session cookie or access token, for example), it can be tempting to reach for a random UUID, which looks like this:

 88cf3e49-e28e-4c0e-b95f-6a68a785a89d

This is a 128-bit value formatted as 36 hexadecimal digits separated by hyphens. In Java and most other programming languages, these are very simple to generate:


import java.util.UUID;


String id = UUID.randomUUID().toString();

Under the hood this uses a cryptographically secure pseudorandom number generator (CSPRNG), so the IDs generated are pretty unique. However, there are some downsides to using random UUIDs that make them less useful than they first appear. In this note I will describe them and what I suggest using instead.

How random is a UUID anyway?

As stated on the Wikipedia entry, of the 128 bits in a random UUID, 6 are fixed variant and version bits leaving 122 bits of actual random. 122 bits is still quite a lot of random, but is it actually enough? If you are generating OAuth 2 access tokens, then the spec says no:

The probability of an attacker guessing generated tokens (and other
credentials not intended for handling by end-users) MUST be less than
or equal to 2^(-128) and SHOULD be less than or equal to 2^(-160).

Well, even if the attacker only makes a single guess, the probability of guessing a 122-bit random value can never be less than 2-122, so strictly speaking a random UUID is in violation of the spec. But does it really matter?

To work out how long it would take an attacker to guess a valid token/cookie, we need to consider a number of factors:

  • How quickly can the attacker make guesses? An attacker that can try millions of candidate tokens per second can find a match much faster than somebody who can only try a hundred. We will call this rate (in guesses per second) A.
  • How many bits of randomness are in each token? A 128-bit random token is more difficult to guess than a 64-bit token. We will label the number of random bits B.
  • How many tokens are valid at any given time? If you have issued a million active session cookies then an attacker can try and guess any of them, making their job easier than if there was just one. Such batch attacks are often overlooked. We will label the number of valid tokens in circulation at any one time S.

Given these parameters, OWASP give a formula for calculating how long it will take an attacker to guess a random token as:

(2^B+1)/(2AS)

Let’s plug in some numbers and see what we get. But what are reasonable numbers? Well, for security we usually want to push the numbers well beyond what we think is actually possible to be really sure. So what is actually possible now?

When we consider how fast a well resourced attacker could guess a token, we can use the Bitcoin hash rate as a reasonable upper-bound approximation. A lot of people are investing a lot of time and money into generating random hashes, and we can view this as roughly equivalent to our attacker’s task. When I looked back in February (you can see how long my blog queue is!), the maximum rate was around 24293141000000000000 hashes per second, or around 264.

That’s a pretty extreme number. It’s fairly unlikely that anyone would direct that amount of resource at breaking your site’s session tokens, and you can use rate limiting and other tactics. But it is worth considering the extremes. After all, this is clearly possible with current technology and will only improve over time.

How many tokens might we have in circulation at any time? Again, it is helpful to consider extremes. Let’s say your widely successful service issues access tokens to every IoT (Internet of Things) device on the planet, at a rate of 1 million tokens per second. As you have a deep instinctive trust in the security of these devices (what could go wrong?), you give each token a 2-year validity period. At a peak you will then have around 63 trillion (246) tokens in circulation.

If we plug these figures into the equation from before, we can see how long our 122-bit UUID will hold out:

A = 264
B = 122
S = 246

img_1550.jpg?w=840

That comes out as … 2048 seconds. Or a bit less than 35 minutes. Oh.

OK, so those extreme numbers look pretty terrifying, but they are also quite extreme. The Bitcoin community spend enormous sums of money (certainly in the tens of millions of dollars) annually to produce that kind of output. Also, testing each guess most likely requires actually making a request to one of your servers, so you are quite likely to notice that level of traffic – say by your servers melting a hole through the bottom of your datacentre. If you think you are likely to attract this kind of attention then you might want to carefully consider which side of the Mossad/not-Mossad threat divide you live on and maybe check your phone isn’t a piece of Uranium.

All this is to say that if you have deployed random UUIDs in production, don’t panic! While I would recommend that you move to something better (see below) at some point, plugging more likely numbers into the equation should reassure you that you are unlikely to be at risk immediately. An attacker would still have to invest considerable time and money into launching such an attack.

Other nits

The borderline acceptable level of entropy in a random UUID is my main concern with them, but there are others too. In the standard string form, they are quite inefficient. The dash-separated hexadecimal format takes 36 characters to represent 16 bytes of data. That’s a 125% expansion, which is pretty terrible. Base64-encoding would instead use just 24 characters, and just 22 if we remove the padding, resulting in just 37.5% expansion.

Finally, a specific criticism of Java’s random UUID implementation is that internally it uses a single shared SecureRandom instance to generate the random bits. Depending on the backend configured, this can acquire a lock which can become heavily contended if you are generating large numbers of random tokens, especially if you are using the system blocking entropy source (don’t do that, use /dev/urandom). By rolling your own token generation you can use a thread-local or pool of SecureRandom instances to avoid such contention. (NB – the NativePRNG uses a shared static lock internally, so this doesn’t help in that case, but it also holds the lock for shorter critical sections so is less prone to the problem).

What should we use instead?

My recommendation is to use a 160-bit (20 byte) random value that is then URL-safe base64-encoded. The URL-safe base64 variant can be used pretty much anywhere, and is reasonably compact. In Java:

import java.security.SecureRandom;
import java.util.Base64;

public class SecureRandomString {
    private static final SecureRandom random = new SecureRandom();
    private static final Base64.Encoder encoder = Base64.getUrlEncoder().withoutPadding();

    public static String generate() {
        byte[] buffer = new byte[20];
        random.nextBytes(buffer);
        return encoder.encodeToString(buffer);
    }
}

This produces output values like the following:

Xl3S2itovd5CDS7cKSNvml4_ODA

This is both shorter than a UUID and also more secure having 160 bits of entropy. I can also make the SecureRandom into a ThreadLocal if I want.

So how long would it take our extreme attacker to find a 160-bit random token? Around 17.9 million years. By tweaking the format of our tokens just a little we can move from worrying about attacker capabilities and resources to inner peace and happiness. It is so far beyond the realm of possible that we can stop worrying.

Why not go further? Why not 256 bits? That would push the attack costs into even more absurd territory. I find that 160 bits is a sweet spot of excellent security while still having a compact string representation.

Author: Neil Madden

Security Architect at ForgeRock. Experienced software engineer with a PhD in computer science. Interested in application security, applied cryptography, logic programming and intelligent agents. View all posts by Neil Madden


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK