How likely is it that the next visitor will find your app easy to use? A humanistic approach on usability.

“How easy is it to use?” something that we would like to hear from the leadership, but eventually it is the burden that we keep asking ourselves (as UX Designers, Analysts, Product people…). Now, measuring usability is a highly researched subject that we will briefly touch as means of starting the discussion. What we will discuss is how to present usability in a more humanistic way.

Conventionally, when measuring usability we are taking a number of individual scores given by people who went through the system in question (participants), then we take the average of those. This average is the system’s usability score. A usability score, most of the time, has a meaning behind it: Does the system have good usability? Does the system have mediocre usability? Does it have bad usability? That score comes with an approximation on its accuracy, sometimes with the minimum participants required to achieve 100% accuracy. Other times, the accuracy is related to statistical measures.

An illustration of how the average sums individual scores

Averages are great means of summarising a large set of numbers, but although they give a good overview about the system, they take us away from the individual. A system with good usability means that the average user will find that it has good usability, but as one will find out sooner or later there is no average user; except of probably the Everyman. As we more and more take decisions based on metrics, rethinking of these metrics to emphasise the human instead of the system will help us take more humane decisions, naturally. So instead of asking “does the system have good usability?” we can investigate “How likely is it that the next visitor will find our system easy to use?” or most importantly “How likely is it that the next visitor will not find our system easy to use?”.

Approaching uncertainty

Let’s first take a look at the most simple scenario: we have performed zero usability measurement of the procedure in question — we have maximum uncertainty about the usability of the procedure. If we have no previous knowledge, then our best case is to treat the event of “someone going through the procedure and finding it usable” as completely random. That means, it is 50% likely that the next visitor will find our app easy to use.

As you can imagine, this is not the most helpful result, since most likely the humans going through the procedure do not produce random data. Nonetheless, we are getting an answer to our question, regardless of the fact that we know nothing about what users “think” of the system in question. That’s why it is important to keep in mind the number of users the result was calculated upon (i.e. certainty). This is a reality that we have to deal with statistics: being able to calculate the number doesn’t mean that the number is correct.

Although there are numerous tools to measure usability, time-cost restraints, the nature of the tool itself (used to measure), and the unavoidable nature of humans will create uncertainty about the exact results; this is fine and we will leave it as it is for now. However, keep in mind that the following way of presenting usability results follows the same uncertainty-based pitfalls of conventional usability presentation.

Getting some understanding of the system’s ease-of-use

Let’s say that the procedure we are trying to measure exists in a website, it could be for example the sign-up flow. The people who will go through that procedure are not randomly selected from a pool of people with evenly distributed and different characteristics, past experiences, and knowledge. Furthermore, a big percentage of those people are working in similar contexts (viewing the webpage from a mobile screen, typing on a keyboard, etc.). The perceived usability of the procedure for each individual would be different, but related to one another. Except for some outliers, most usability scores will revolve around some specific numbers. If we asked ten participants if the flow was easy to use on a scale from 1 (hard) to 5 (easy), it would be more likely to get the responses “1, 4, 5, 5, 5, 4, 1, 4, 5, 4”, than “1, 5, 1, 5, 1, 5, 1, 5, 1, 5”.

Another way to view it is, if 200 people rated the usability as “good” and 10 as “bad”, intuitively we would expect the next person that would come, to find the procedure as with “good” usability.

Measuring usability

We have the understanding now that if we get some initial information, we can better estimate the ease of use of the process. Depending on the scope and budget of each research that can mean achieving 100% certainty or just getting a good-enough estimate. Regardless, we would need to recruit a tool of measuring usability. A good tool would be reliable and valid, furthermore it would give some sort of qualitative result of the usability of the system in question. Usability questionnaires (like the PSSUQ or the SUS) are a great tool to gather the initial data. But a simpler measure like the Net Promoter Score *, or any other measurement that its result will be segmented to “good” vs “bad” usability, will do.

Usability and the rising of the sun

Going back to “it is more likely that the next user will find the procedure of good usability” if 200 people rated the usability as “good” and 10 as “bad”. We can very well go back in time — in the 18th century to be exact. Pierre-Simon Laplace, at the time, was trying to answer “If we repeat an experiment that we know can result in a success or failure, n times independently, and get s successes, and n − s failures, then what is the probability that the next repetition will succeed?” [1] He ended up creating a mathematical formula that he used to calculate “the likelihood that the sun will rise tomorrow, based on the fact that it has risen each day the previous 5000 years” (around 99.9999453%).

The likelihood of good usability

A person that goes through the procedure we are testing can either find it “good” or “not good” regarding ease of use. How we differentiate “good” from “not good” is up to the tool we used to measure individual perceptions. For example, a System Usability Score in around the 80s will indicate a system with good usability; individuals who score 8 and up are considered promoters by the NPS.

First we want to find the frequency of which individuals are rating our system as with “good” usability. Let’s call that number: s (for success). E.g. If we have used the SUS questionnaire, and 5 out of 8 individuals have rated it around 80. Then we have 5 “good” observations, s = 5.

The total number of individuals that participated in the study, we will call it n (for number), n = 8 in our example.

Then, we can calculate what is the likelihood of the next person finding that the procedure has “good” usability as: (s + 1) / (n + 2). That is 6 / 10 = 60%, in our example. More importantly, we can calculate the reverse of that, how many people will not find the process usable, how many will experience some hiccup, or are more likely to not complete the process [money loss = business likes]. That is of course, 100% — 60% = 40% of the individuals.

But why?

A question that might pop into someone’s head [it sure popped up into mine] is: Why plus 1 and plus 2? Why not 5 / 8 = 62.5%? And it is very logical [I hope]. Five out of eight (5 / 8) is the current state of knowledge. We already know how many people used the process and how many found it with “good” usability. What we want to find is in regards to the future. We want to find how likely it is for the next user to find the process with “good” usability. A practical way to think of this is the following:

Currently n participants gave their feedback
Out of those n, s found the process with “good” usability. This is our knowledge, that s / n participants find the process usable.
If a new participant uses the application then they can either find the process “good” or “not good”
We don’t know if the next participant will find the process “good” or “not good“. But we want to know how much it will affect our knowledge, in case that either of these happen [2]
We assume then, from the next 2 participants that they will give their feedback, one will say it has “good” usability and the other “not good” usability
So, in the future, we will have s + 1 participants saying that the process has “good” usability, were in total n + 2 participants will have given their feedback
So finally, the likelihood of the next participant giving a good feedback is (s + 1) / (n + 2)

Inductive reasoning, anecdotal generalisation, and future of the metric

As you may have guessed, this formula didn’t come out just for usability studies. It can be applied to any two-outcomes (mathematical) system that some prior knowledge exists, but there is no other knowledge about the likelihood of any of the events occurring (if generalisation is your thing and you like the video format: you are in for a treat). To quote Laplace’s own words when presenting the example of the probability of the sun rising the next day “But this number is far greater for him who, seeing in the totality of phenomena the principle regulating the days and seasons, realizes that nothing at the present moment can arrest the course of it.”

Indeed, if there was a deterministic way to combine all of the system’s attributes and result in a single number regarding its usability 3, then this metric will be redundant and not that appealing as an option.

The process used to calculate this metric falls under Inductive Reasoning, and if someone has to be careful about one thing in Inductive Reasoning that is Anecdotal Generalisation. As we discussed in the “Approaching uncertainty” section, this method will give a result regardless of how certain the result is. In other words, one has to be cautious saying that the process has a 66.6% likelihood of the next person finding it usable, if the result is based on one participant. A way to mediate this risk is by including a measurement of uncertainty alongside the result, so at any point the person viewing the report will understand the validity of the result.

Conclusion

As Goodhart famously stated “When a measure becomes a target, it ceases to be a good measure”. Indeed, this metric didn’t come out of nowhere to be followed blindly. It just presents the same information in a different way: instead of responding to the question “Does the system have good usability?”, we are responding to “Will the next user find the system usable?”. That metric is independent of the tool used to measure usability and utilises Laplace’s Rule of Succession to calculate a number that we consider more humanistic. Putting an emphasis on what percentage of the users will not find the system usable, could drive research and development in minimising the usability issues of the people affected by them, rather than finding comfort in a system with “good usability”.

* Okay okay, NPS doesn’t directly measure usability, we can rewrite the question to “How likely is it that the next visitor will be a promoter?”.

[1] Taken from: https://en.wikipedia.org/wiki/Rule_of_succession

[2] This is the point that the reasoning can go two ways. An alternative approach would go in the lines of: if there are two outcomes for the next person rating the process, then the combined probability is the average of each specific outcome (given that the both outcomes are equal). That would be (s + 0.5) / (n + 1). Both results are close and eventually they are describing the same underlying effect: the more prior knowledge we have, the less the next person will affect that knowledge. We follow Laplace’s reason for historical purposes.

[3] Of course, deterministic systems that assess usability indirectly exist. There are models that calculate the time it will take for a user to go through a process, or try to estimate overall UX with metrics like First Input Delay, visual stability etc. We are talking about direct deterministic measurement of a system’s usability.

How likely is it that the next visitor will find your app easy to use? A humanis...

How likely is it that the next visitor will find your app easy to use? A humanistic approach on usability.

Approaching uncertainty

Getting some understanding of the system’s ease-of-use

Measuring usability

Usability and the rising of the sun

The likelihood of good usability

But why?

Inductive reasoning, anecdotal generalisation, and future of the metric

Conclusion

Recommend

为什么成年人的心脏不能再生？科学家：主动拒绝

戴在头上的 IMAX 级巨幕，便携高清头显 GOOVIS G3 Max 也想成为「第四屏」 | 深圳湾

Science：白天吃，晚上禁，促进燃脂，想吃胖都难

9to5MacDaily: October 24, 2022 – Apple Music and TV+ price increases

Inside Java | JavaOne 2022 Technical Keynote

This week's best early Black Friday sales on Apple: save up to $600

2000万私域用户，2年卖出73万辆， “国民神车”是如何做私域的？

深度拆解运营型增长的底层逻辑

起底中国人造钻石现状：互打价格战、设计多刻板、盲目求认证……

Apple turns away from China

About Joyk