An elegant data type for a more civilized age

In a recent blog post, I recommended that you learn C++ for the deep understanding you’ll get of every other language you ever use. One of the key points is an understanding of how much it costs to create and manage an object, and how that affects the modern trend of deprecating primitives. In Ruby, for instance, numbers are objects just like everything else, and can be referenced using the same dot-notation (e.g., “65.chr”). Even Java is getting in on the act, with plans to completely remove support for primitives in favor of objects by Java 10 (c.f. slide 41). And why not? After all, memory is cheap, processors are powerful, and primitives are really just an anachronistic special case, throwbacks to a time when 640k seemed like more than enough RAM.

Well, maybe yes, maybe no. Let’s take a deep dive into what it costs to move from a primitive to the corresponding object.

Stack vs. Heap

Before we get into the differences between primitives and objects, it’s worthwhile to give a quick overview of where memory gets allocated. The stack is where memory is allocated for local/short-term use – conceptually, it’s a big block of memory that’s completely empty when the program is run. There’s a pointer to the very bottom of the stack, and every time a function is called, the pointer is incremented to include space for a return value (if necessary), local variables, and the pointer’s previous value. A “stack frame” is the entire area on the stack allocated for a single function call. Once a function returns, its stack frame is “popped” from the stack, and any values that were in it are lost.

The heap is where memory is allocated for long-term use. Again, conceptually it’s a big block of memory – but it’s controlled by a memory manager that has to handle allocation of memory, deallocation of memory, fragmentation/compaction, and (in the case of most modern languages) garbage collection. All of these combine to make heap-based memory allocation a more complicated, costlier process than stack-based memory allocation. As an aside, one trick we used when I was in the video game industry was to allocate a large block of memory (on the heap) for everything that had to persist for the duration of a single frame, treat it as a stack and allocate objects on it over the course of the frame, and just blow it away (i.e., move the pointer to the bottom of the block of memory) when starting a new frame. This gave us the ability to allocate memory without the cost of standard memory management.

But I digress.

Actual Memory Usage

So, let’s take a more concrete look at how memory gets allocated. In Java, you have the following two declarations:

int prim = 1000;
Integer boxed = 1000;

These don’t appear that different, and in fact boxed has the advantage of also having a “null” value which can indicate an unset value. Many people exploit this in interviews, as I described in another blog post. There are two main contexts in which these variable declarations can occur. The first is as local variables:

void foo() {
    int prim = 1000;
    Integer boxed = 1000;  // i.e., boxed = new Integer(1000);
    ...
}

prim is allocated on the stack, as is boxed (in very recent versions of the JVM), assuming the compiler can determine there isn’t a chance that it needs to survive the popping of the current stack frame. If there’s the potential it could be externally referenced (e.g., if it’s returned, or assigned into a field of an object that could be externally referenced), then boxed has to be allocated on the heap.

The second context is as member variables in a class, which will cause them to be allocated in the heap:

class Bar {
    int prim = 0;
    Integer boxed = 0;
}
Bar bar = new Bar()

Putting aside the overhead for the container object (bar), the memory footprints for prim and boxed are very different (we’ll assume a 32 byte JVM):

prim (total = 4 bytes)
- int: 4 bytes
boxed (total = 20 bytes)
- Pointer to boxed memory location: 4 bytes
- boxed object:
  - Object overhead: 12 bytes
  - int: 4 bytes

Note 1: the calculated space for boxed doesn’t include additional memory required by the memory manager to keep track of the boxed object.

Note 2: “approximately”? What the hell do you mean “approximately”??? Well, every JVM handles things a little differently, so this could differ based on your platform (sigh).

Note 3: though this analysis is nominally specific to Java, and every language has a separate way of handling objects, it’s safe to assume that no matter what the language, every object will need to keep a pointer back to the class, and some kind of local meta-data (flags, locking information, etc.).

Garbage Collection

The second thing to consider is that boxed will have to be garbage collected, whereas prim will not. This isn’t a big deal when you’re dealing with individual objects, but when you scale up it can cause significant performance issues, even to the point of bringing down your site. As an example, a couple of years ago we had a situation at TripAdvisor in which one of our services was putting hundreds of thousands of ints into a Set, and int/int pairs into a Map – every second – then throwing them away. Because we were using the standard java.util.HashMap and java.util.HashSet classes, the ints were getting auto-boxed – i.e., converted to Integer objects – and the system ground to a halt in under 60 seconds when put under full site load due to constant garbage collection (a.k.a. “GC hell”).

The problem was two-fold: first, we were creating and throwing away hundreds of thousands of Integer objects. Secondly, the standard Java HashMap and HashSet classes resolve collisions by maintaining arrays of linked lists (chaining). The nodes in the linked lists, and the Map.Entry objects, also had to be allocated and garbage collected. This created a crushing burden on the garbage collector, which ended up taking down the system.

The solution was to use a special set of classes specifically designed to avoid both of these problems. First, they only use primitives, not generics, which means that there’s no boxing and unboxing required. Second, they use open addressing instead of chaining, so there’s no need to maintain linked lists, and allocate space for nodes, etc. The downside is inconvenience for the data structure implementor (i.e., there are a lot of classes with names like “OpenIntIntHashMap”, “OpenDoubleIntHashMap”, “OpenIntDoubleHashMap”, etc.). Luckily, the work had already been done, and we were able to get the functionality we needed out of the box (let’s take a moment here to be very grateful for open source software).

What’s the lesson here?

There are a couple of questions that have to be asked at this point:

How often do these kinds of issues come up in practice?
If you have a catastrophic problem as described above, doesn’t it indicate a problem with the underlying algorithm?
Memory is getting cheaper, processors faster, and garbage collection algorithms better – is this really such a big deal?

None of these questions, of course, have definitive answers. Favoring objects over primitives probably won’t bring down your site (or, as in our case, one of our backend services), but it will exact a performance tax throughout your code base. There might be better ways to organize things to avoid catastrophic failures, or you might have to resort to incredibly ugly hacks to avoid an otherwise elegant implementation that does bad things to GC. And more memory, faster machines, and better GC might solve your problems, until you scale up.

For the purposes of this conversation, though, these are the wrong questions. Or rather, they’re good questions, but they don’t get to a deeper truth. Which is that understanding how things work, and developing a discipline of using the cleanest, simplest, most efficient solution, is a necessary step in becoming a great coder. You can’t be a great chess player unless you understand why you’re making the moves you’re making. You can’t be a great jazz musician unless you have a deep understanding of the music you’re playing. You can’t be great at something if you’re basing your decisions on opinions, faith, or trust. You have to know.

The choice of primitives vs. objects is interesting in this context specifically because it usually doesn’t matter. Most of the time, you could choose either one and there wouldn’t be a noticeable difference in performance or functionality. The reason it matters is because most people don’t understand what the difference is, and so “choose” one or the other based on entirely irrelevant considerations. As a result, if they ever do run into a problem, they don’t have the tools to understand what’s causing it.

Using primitives is a good discipline, specifically because your default programming techniques should be clean, simple, efficient. The point isn’t that you should be prematurely optimizing your code, but rather that when given a choice between two nearly identical approaches, you shouldn’t irrationally choose the less performant option without a good reason. If your language’s standard libraries included both quicksort and bubblesort, you should never choose the latter.

Last thoughts

LISP is worth learning for a different reason — the profound enlightenment experience you will have when you finally get it. That experience will make you a better programmer for the rest of your days, even if you never actually use LISP itself a lot. – Eric Raymond, How to Become a Hacker

Ultimately, most languages no longer give you the option to use primitives, and that’s fine – Erlang isn’t competing with C++ on execution speed, it’s trying to be the best language for parallelization. Rails isn’t trying to be as fast as Java, it’s trying to give you tools to build sites quickly. Lisp is competing on parentheses. The world is a big place, and there’s room for lots of different languages that optimize along orthogonal axes. The key is to understand what the tradeoffs are, and what they mean. Learning to program in C++ will help you understand these things in a way that other languages have been specifically designed to hide.

Correction 1: In a previous version of this post, I had assigned prim and boxed to zero, but it was correctly pointed out that low Integer values are cached (c.f. Flyweight pattern)

Correction 2: In a previous version of this post, I incorrectly ascribed higher memory usage to 64 bit JVMs, not taking compressed pointers into account.

An elegant data type for a more civilized age

An elegant data type for a more civilized age

Recommend

XGBoost解读(1)--原理

What Is a Permutation Matrix?

一种无 PD 情况下给内网设备配置公网 IPv6 的方法

BGP 相关 | Seyana Blog

2022 Java生态系统报告：Java 11超Java 8、Oracle在缩水、Amazon在崛起

milvus使用总结

Black Friday: how to fine-tune your site search to maximize profit

Go 将在下个版本支持新型排序算法：pdqsort

I need an inverse in my model

Debugging numerical problems

About Joyk