5 mistakes to avoid when optimising your web app performance

Are you getting lost with all the theory, tools and metrics and don’t know where to start to optimise your web app performance? Then this article is for you!

At Trainline, we have been working on optimising our pages as much as we can, and we keep doing this every day, via advocating the web performance standards and metrics throughout our Engineering teams.

In this article, I’d like to share with you what we’ve learned in our experience optimising our marketing landing pages and web apps, hoping that this will save you some time and guide you in this endeavour.

These are 5 mistakes you should avoid when optimising your web app performance.

1. Aiming for 100% Lighthouse score

Lighthouse is an amazing tool that can help you understand how your web app is performing, giving an overall score based on standard, weighted performance metrics.

Lighthouse gives you a lot of recommendations and it’s normal to aim at achieving an 100% score in every metric.

We used to have that ambition as well when we first began.

Every performance metric has different meaning and importance, which is reflected in the way they are weighted.

Figure 1 This image represents the weighting of each metric (source: https://bit.ly/3fb6vfV)

This does not mean you should never optimise or care for the low weighting ones, just that you need to factor in the effort to optimise for them versus the benefit to your customers.

If you have enough budget you can invest how much you want and get a 100% score, but when this is not the case, what are the right metrics to invest in?

How to decide what to invest in

Clue: It’s all about user experience.

What metrics would have the biggest impact, if optimised, on your user experience?

You can give those a priority and an importance relevant to your business and your customers.

All metrics matter, but if you have to pick, focus on the top ones (Largest Contentful Paint, Cumulative Layout Shift, First Input Delay), as improving them will impact your customer experience the most.

2. Google Page Speed Insights to track over time

Google Page Speed Insights is an online tool, built on top of Lighthouse, which gives you a consistent environment to test your app. Every test runs within the same simulated hardware, giving us a consistent read without noise.

Our Technical SEO Team suggested us to use it to track performance over time as it’s a Google approved tool.

However…

There are 2 types of environments in which you can collect performance metrics: lab and field.

Lab environment (Google Page Speed Insights) can be used to inform you of any regression in comparison with a previous test, for example when you release a new change. This does not necessarily reflect what your customers are experiencing, so it should be used as a relative comparison (i.e. regressed by 10%, rather than 100ms).
Field environment (RUM: Real User Metrics collected over Google Analytics or other tools like SpeedCurve LUX) can instead be used to look at what your customers are actually experiencing every day, which will help you understand how user experience is impacting your business metrics as well.

Are the majority of your customers really experiencing the same results as you can see in Google Page Speed Insights?

By only testing in a Lab environment, you’re making assumptions on your customers connections, devices and behaviours, which might lead to optimising for a small niche.

Additionally, some metrics cannot be evaluated in a lab environment, like First Input Delay.

Tracking metrics over time

You should ask yourself, who and what am I optimising for? The answer is “my customers”.

Your customers may all have powerful enough connections and devices for your app to perform just fine.

Google Page Speed Insights, as well as Lighthouse, can only test the first experience a customer has visiting your pages, but for your product it might not be as important as repeated visits.

You should use a combination of lab and field environment metrics and prioritise what to invest in.

3. Only looking into 95th percentile

As we learned to give the right value to real user metrics in our investigations and analysis, we now have a lot of data to go through. We can expect every experience to be slightly different, so it’s normal to see some noise.

To help us, most of the analysis tools for RUM data use percentiles.

At the beginning it feel instinctive to want to focus on where you’re seeing the slowest results, in the 95th percentile, on the basis this will then have a knock-on impact for customers in the other percentiles.

Why is it a mistake?

When we started to do this, we built a backlog of things to tackle, doing tests before starting the actual work to validate.

We later found a few issues with this strategy:

😭 High effort - little result
A lot of work goes into improving such a high percentile at times, and although it might seem like quick-wins are going to be more impactful for those users, we discovered that it often translates to little improvement and return on investment
😡Disruption
Investing on massive changes to impact this high percentile can become unreasonable and disruptive while the rest of the team is developing new features on your applications
🤷‍♂️ Hard to make an impact
Even if you do impact some metrics, is it really translating into a benefit? For example, if your page loads in 10s for these users, is removing 500ms of FCP (First Contenful Paint) going to feel like a significant difference to these customers?
📊 Isolated Impact
Plus, impact made in the 95th percentile does not automatically translate to the same impact in other percentiles, proportionally, as there is a high variability of conditions contributing to their experience, which don’t necessarily apply to other users.

The first question we should ask ourselves is: why are these users having a slower experience? Can we analyse the data and understand more about them?

Secondly, is our page ever going to be fast enough for these users just by optimising it? Or would it require an entirely different strategy? And how expensive would it be?

There is a high probability that these users have bad network conditions and low-end devices, which means our actions will have limited impact.

Invest in the right percentiles

The better you understand your user base analysing RUM data, the better you’ll know what’s best to invest in to make the most impact where it really matters.

Prioritise improvements to the 50th and 75th percentiles first, as it can deliver good results for the majority of your users. You can then evaluate to invest on the slowest experiences, doing a proper cost vs. benefit analysis.

4. TTI < 5s

TTI measures how long it takes a page to become fully interactive — you can read more about it here.

TTI’s original name was TTCI, where the C stands for consistently. This means that TTI marks the exact point in time after which the browser is in idle, waiting for user input.

Initially, one of the most important indications of success in our journey was having a TTI of less than 5 seconds for all percentiles.

Why is it a mistake?

TTI is trying to calculate at what point in time the browser is ready to accept our inputs without causing delays due to work happening behind the scenes. This is mainly how it relates to user experience.

TTI algorithm explained (source: https://bit.ly/2QbbjcO)

This has value, but does this mean our users will only be able to interact after this point? What if they interact with the page before? What experience do they have?

Due to the way TTI is calculated, it can be overly pessimistic, as it captures the last most expensive JavaScript execution, which might happen after a long period of inactivity, maybe after a third-party triggers some script. Before then, a user might interact and have a perfectly good experience.

Therefore, users interacting before TTI may not be impacted negatively. Most of our users interacted with our page 2 seconds after LCP (https://web.dev/lcp/), way before TTI happened.

You can try to optimise TTI, but there is a high possibility that it won’t reflect your customers’ behaviours or their experience.

What are we measuring instead?

A combination of Total Blocking Time, which tells us how much of the JavaScript in our page is blocking and First Input Delay, a very good indication of what your users are experiencing in the wild, which can tell us what to invest in, if correlated with First Interaction time.

You will be able to see when most of your users are interacting with your pages and how much delay they are getting: data that can be correlated with a waterfall view for your main thread execution in tools like SpeedCurve and WebPageTest to understand what might be causing long delays.

We log when our application has done most of its processing as a custom performance metric, so that we can link all this information together to a specific phase of our apps initial load and tackle it.

5. Focus on JS uncompressed size

JS size is one of the main culprits for slow performing pages and a poor user experience.This is because our code needs to be first delivered to the client and then executed.

The size of any assets can be reported using different metrics:

uncompressed: original size in bytes
compressed: after gzip or brotli compression, significantly smaller than original size

Working on our pages’ optimisation, we started to do analysis on the size of the biggest libraries using different tools, with the aim of removing or replacing them with lightweight alternatives.

Our initial analysis was based on uncompressed size.

What’s wrong with it?

What matters in our analysis, is to match the production environment as much as possible, to better understand the impact of a change on our customer experience.

As the major impact of our JS is on slow connections and low-end devices, compression is a deal breaker and can make a big difference.

Removing 100KB from uncompressed size, might ultimately result in nothing less than 10KB.

As a result of this, most of the improvements we did, however big they were on uncompressed size, only resulted in small improvements after compression.

A better way

Instead of looking at uncompressed size when evaluating the heaviest things in your pages, it’s best to look at compressed size.

“But, if everything becomes small after compression, what should we remove?”

It’s always a matter of cost vs. benefit analysis.

You can decide if something is worth removing by taking into account the effort of replacing such code with something lightweight and the compressed size reduction you’ll get.It’s also noting that a small change (~5KB) in compressed size, can make a difference if bundled with other such small changes.

To understand the impact of a change on bundle size, while catering for compression, we built an open source tool at Trainline called Webpack Bundle Delta, which works in your CI and reports size changes for every bundle in your pull requests compared to a baseline, go check it out!

Additionally, I can recommend looking into unused code percentage via Chrome’s code coverage feature in the Dev Tool, in order to measure how much code our customers are downloading and evaluating, without actually needing it.

Conclusions

It’s important to fail fast. Failure is required to understand, to learn and do better.

Sometimes you need to fail first to understand what works for you, as your situation might be entirely different and unique to you.

Retrace your steps, reflect on what didn’t go well and start again, with the newfound knowledge and perhaps more weapons at your disposal to succeed the next time.

And don’t forget to have fun in the process 😎

To understand how Trainline prepared to speed up web performance, check out Paul’s blog here.

5 mistakes to avoid when optimising your web app performance

5 mistakes to avoid when optimising your web app performance

1. Aiming for 100% Lighthouse score

How to decide what to invest in

2. Google Page Speed Insights to track over time

However…

Tracking metrics over time

3. Only looking into 95th percentile

Why is it a mistake?

Invest in the right percentiles

4. TTI < 5s

Why is it a mistake?

What are we measuring instead?

5. Focus on JS uncompressed size

What’s wrong with it?

A better way

Conclusions

Recommend

Vance (YC W22) Is Hiring

安徽卫视发行“安徽卫视logo”创意数字藏品

iOS 摸鱼周报 #53 | 远程办公正在成为趋势

Women in Tech: "Don't let self-doubt get in the way and go for it" - J...

Kubernetes 1.24: Introducing Non-Graceful Node Shutdown Alpha

对 iPod 说再见，我想带你走进无数人的「青春记忆」

没有贾跃亭的日子，乐视整活了？

No Honeycomb Testing and Mocks? You’re Probably Getting App Testing Wrong

亚马逊雨林里的WiFi密码，引发了土著和军队的流血冲突

#18: Debug Static Web Apps with Edge DevTools

About Joyk