How to write about web performance

12 Sep

Posted September 12, 2021 by Nolan Lawson in performance, Web. Tagged: performance. 2 Comments

I’ve been writing about performance for a long time. I like to think I’ve gotten pretty good at it, but sometimes I look back on my older blog posts and cringe at the mistakes I made.

This post is an attempt to distill some of what I’ve learned over the years to offer as advice to other aspiring tinkerers, benchmarkers, and anyone curious about how browsers actually work when you put them to the test.

Why write about web performance?

The first and maybe most obvious question is: why bother? Why write about web performance? Isn’t this something that’s better left to the browser vendors themselves?

In some ways, this is true. Browser vendors know how their product actually works. If some part of the system is performing slowly, you can go knock on the door of your colleague who wrote the code and ask them why it’s slow. (Or send them a DM, I suppose, in our post-pandemic world.)

But in other ways, browser vendors really aren’t in a good position to talk frankly about web performance. Browsers are in the business of selling browsers. Web performance claims are often used in marketing, with claims like “Browser X is 25% faster than Browser Y,” which might need to get approved by the marketing department, the legal department, not to mention various owners and stakeholders…

And that’s only if your browser is the fast one. If you run a benchmark and it turns out that your browser is the slow one, or it’s a mixed bag, then browser vendors will keep pretty quiet about it. This is why whenever a browser vendor releases a new benchmark, surprise surprise! Their browser wins. So the browser vendors’ hands are pretty tied when it comes to accurately writing about how their product actually works.

Of course, there are exceptions to this rule. Occasionally you will find someone’s personal blog, or a comment on a public bugtracker, which betrays that their browser is actually not so great in some benchmark. But nobody is going to go out of their way to sing from the mountaintops about how lousy their browser is in a benchmark. If anything, they’ll talk about it after they’ve done the work to make things faster, meaning the benchmark already went through several rounds of internal discussion, and was maybe used to evaluate some internal initiative to improve performance – a process that might last years before the public actually hears about it.

Other times, browser vendors will release a new feature, announce it with some fanfare, and then make vague claims about how it improves performance without delving into any specifics. If you actually look into these claims, though, you might find that the performance improvement is pretty meager, or it only manifests in a specific context. (Don’t expect the team who built the feature to eagerly tell you this, though.)

By the way, I don’t blame the browser vendors at all for this situation. I worked on the performance team at Microsoft Edge (back in the EdgeHTML days, before the switch to Chromium), and I did the same stuff. I wrote about scrolling performance because, at the time, our engine was the best at scrolling. I wrote about input responsiveness after we had already made it faster. (Not before! Definitely not before.) I designed benchmarks that explicitly showed off the improvements we had made. I worked on marketing videos that showed our browser winning in experiments where we already knew we’d win.

And if you think I’m picking on Microsoft, I could easily find examples of the other browser vendors doing the same thing. But I choose not to, because I’d rather pick on myself. (If you work for a browser vendor and are reading this, I’m sure some examples come to mind.)

Don’t expect a car company to tell you that their competitor has better mileage. Don’t expect them to admit that their new model has a lousy safety rating. That’s what Consumer Reports is for. In the same way, if you don’t work at a browser vendor (I don’t, anymore), then you are blessedly free to say whatever you want about browsers, and to honestly assess their claims and compare them to each other in fair, unbiased benchmarks.

Plus, as a web developer, you might actually be in a better position to write a benchmark that is more representative of real-world code. Browser developers spend most of their day writing C, C++, and Rust, not necessarily HTML, CSS, and JavaScript. So they aren’t always familiar with the day-to-day concerns of working web developers.

The subtle science of benchmarking

Okay, so that was my long diatribe about why you’d bother writing about web performance. So how do you actually go about doing it?

First off, I’d say to write the benchmark before you start writing your blog post. Your conclusions and high-level takeaways may be vastly different depending on the results of the benchmark. So don’t assume you already know what the results are going to be.

I’ve made this mistake in the past! Once, I wrote an entire blog post before writing the benchmark, and then the benchmark completely upended what I was going to say in the post. I had to scrap the whole thing and start from scratch.

Benchmarking is science, and you should treat it with the seriousness of a scientific endeavor. Expect peer review, which means – most importantly! – publish your benchmark publicly and provide instructions for others to test it. Because believe me, they will! I’ve had folks notify me of a bug in my benchmark after I published a post, so I had to go back and edit it to correct the results. (This is annoying, and embarrassing, but it’s better than willfully spreading misinformation.)

Since you may end up running your benchmark multiple times, and even generating your charts and tables multiple times, make an effort to streamline the process of gathering the data. If some step is manual, try to automate it.

These days, I like Tachometer because it automates a lot of the boring parts of benchmarking – launching a browser, taking a measurement, taking multiple measurements, taking enough measurements to achieve statistical significance, etc. Unfortunately it doesn’t automate the part where you generate charts and graphs, but I usually write small scripts to output the data in a format where I can easily import it into a spreadsheet app.

This also leads to an important point: take accurate measurements. A common mistake is to use Date.now() – instead, you should use performance.now(), since this gives you a high-resolution timestamp. Or even better, use performance.mark() and performance.measure() – these are also high-resolution, but with the added benefit that you can actually see your measurements laid out visually in the Chrome DevTools. This is a great way to double-check that you’re actually measuring what you think you’re measuring.

Screenshot of a performance trace in Chrome DevTools with an annotation in the User Timing section for the

Note: Sadly, the Firefox and Safari DevTools still don’t show performance marks/measures in their profiler traces. They really should; IE11 had this feature years ago.

As mentioned above, it’s also a good idea to take multiple measurements. Benchmarks will always show variance, and you can prove just about anything if you only take one sample. For best results, I’d say take at least three measurements and then calculate the median, or better yet, use a tool like Tachometer that will use a bunch of fancy statistics to find the ideal number of samples.

Humility

Writing about web performance is really hard, so it’s important to be humble. Browsers are incredibly complex, so you have to accept that you will probably be wrong about something. And if you’re not wrong, then you will be wrong in 5 years when browsers update their engines to make your results obsolete.

There are a few ways you can limit your likelihood of wrongness, though. Here are a few strategies that have worked well for me in the past.

First off, test in multiple browser engines. This is a good way to figure out if you’ve identified a quirk in a particular browser, or a fundamental truth about how the web works. Heck, if you write a benchmark where one browser performs much more poorly than the other ones, then congratulations! You’ve found a browser bug, and now you have a reproducible test case that you can file on that browser.

(And if you think they won’t be grateful or won’t fix the problem, then prepare to be surprised. I’ve filed several such bugs on browsers, and they usually at least acknowledge the issue if not outright fix it within a few releases. Sometimes browser developers are grateful when you file a bug like this, because they might already know something is a problem, but without bug reports from customers, they weren’t able to convince management to prioritize it.)

Second, reduce the variables. Test on the same hardware, if possible. (For testing the three major browser engines – Blink, Gecko, and WebKit – this sadly means you’re always testing on macOS. Do not trust WebKit on Windows/Linux; I’ve found its performance to be vastly different from Safari’s.) Browsers can differ based on whether the device is plugged into power or has low battery, so make sure that the device is plugged in and charged. Don’t run other applications or browser windows or tabs while you’re running the benchmark. If networking is involved, use a local server if possible to eliminate latency. (Or configure the server to always respond with a particular delay, or use throttling, as necessary.) Update all the browsers before running the test.

Third, be aware of caching. It’s easy to fool yourself if you run 1,000 iterations of something, and it turns out that the last 999 iterations are all cached. JavaScript engines have JIT compilers, meaning that the first iteration can be different from the second iteration, which can be different from the third, etc. If you think you can figure out something low-level like “Is const faster than let?”, you probably can’t, because the JIT will outsmart you. Browsers also have bytecode caching, which means that the first page load may be different from the second, and the second may even be different from the third. (Tachometer works around this by using a fresh browser tab each iteration, which is clever.)

My point here is that, for all of your hard work to do rigorous, scientific benchmarking, you may just turn out to be wrong. You’ll publish your blog post, you’ll feel very proud of yourself, and then a browser engineer will contact you privately and say, “You know, it only works like this on a 60FPS monitor.” Or “only on Intel CPUs.” Or “only on macOS Big Sur.” Or “only if your DOM size is greater than 1,000 and the layer depth is above 10 and you’re using a trackball mouse and it’s a Tuesday and the moon is in the seventh house.”

There are so many variables in browser performance, and you can’t possibly capture them all. The best you can do is document your methodology, explain what your benchmark doesn’t test, and try not to make grand sweeping statements like, “You should always use const instead of let; my benchmark proves it’s faster.” At best, your benchmark proves that one number is higher than another in your very specific benchmark in the very specific way you tested it, and you have to be satisfied with that.

Conclusion

Writing about browser performance is hard, but it’s not fruitless. I’ve had enough successes over the years (and enough stubbornness and curiosity, I guess) that I keep doing it.

For instance, I wrote about how bundlers like Rollup produced faster JavaScript files than bundlers like Webpack, and Webpack eventually improved its implementation. I filed a bug on Firefox and Chrome showing that Safari had an optimization they didn’t, and both browsers fixed it, so now all three browsers are fast on the benchmark. I wrote a silly JavaScript “optimizer” that the V8 team used to improve their performance.

I bring up all these examples less to brag, and more to show that it is possible to improve things by simply writing about them. In all three of the above cases, I actually made mistakes in my benchmarks (pretty dumb ones, in some cases), and had to go back and fix it later. But if you can get enough traction and get the right people’s attention, then the browsers and bundlers and frameworks can change, without you having to actually write the code to do it. (To this day, I can’t write a line of C, C++, or Rust, but I’ve influenced browser vendors to write it for me, which aligns with my goal of spending more time playing Tetris than learning new programming languages.)

My point in writing all this is to try to convince you (if you’ve read this far) that it is indeed valuable for you to write about web performance. Even if you don’t feel like you really understand how browsers work. Even if you’re just getting started as a web developer. Even if you’re just curious, and you want to poke around at browsers to see how they tick. At worst you’ll be wrong (which I’ve been many times), and at best you might teach others about performant programming patterns, or even influence the ecosystem to change and make things better for everyone.

There are plenty of upsides, and all you need is an HTML file and a bit of patience. So if that sounds interesting to you, get started and have fun benchmarking!

How to write about web performance | Read the Tea Leaves