6

Error Codes And The Law Of Least Astonishment

 2 years ago
source link: https://hackaday.com/2021/12/17/error-codes-and-the-law-of-least-astonishment/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Error Codes And The Law Of Least Astonishment

Do you know the law of least astonishment? I am not sure of its origin, but I first learned it from the excellent “Tao of Programming.” Simply put, it is the principle that software should always respond to the users in a way that least astonishes them. In other words, printing a document shouldn’t erase it from your file system.

Following the law of least astonishment, what should a program do when it hits a hard error? You might say that it should let the user know. Unfortunately, many systems just brush it under the rug these days.

I think it started with Windows. Or maybe the Mac. The thinking goes that end users are too stupid or too afraid of error codes or detailed messages so we are just leaving them out. Case in point: My wife’s iPhone wouldn’t upload pictures. I’m no expert since I carry an Android device, but I agreed to look at it. No matter what I tried, I got the same useless message: “Can’t upload photos right now. Please try again later.” Not only is this not very informative, but it also implies the problem is in something that might fix itself later like the network.

The real culprit? The iCloud terms of service had changed and she had not accepted the new contract. I have a feeling it might have popped up asking her to do that at some point, but for whatever reason she missed it. Until you dug into the settings and checked the box to agree to those terms, “later” was never going to happen.

But it isn’t just iPhones. Windows is full of things like that and you only hope there will be a log in the event viewer with more details. I also see more of it now on Linux, although there is usually a log file somewhere if you know how to find it. While I get it that programs having errors run the risk of astonishing the user, it is even more astonishing if there’s no explanation of what’s wrong. Imagine if your bank sent you a note: there is a problem with your account. So you respond: “Did I overdraw?” They reply, “No.” Now what? That’s the state of many software errors today.

There’s really no excuse on desktop systems or websites. However, you might want to forgive tiny embedded systems. Don’t! I recently ported the 3D printer firmware Marlin to an ANET A8 board — an 8-bit processor with little memory — that had been on Repetier firmware for many years. The first time I tried to do an autolevel probe I got the message: Probing failed. That’s it.

I’ll grant you, that you can turn on autolevel debugging to get more information, but I’m already at 98% flash utilization, so that would require temporarily removing a bunch of features and rebuilding the code. But why not do like we would do in the old days:

unit global_error=0;
void do_something(void) {
   global_error=1;
   if (process1()==FAIL)  return;
   global_error++;
   if (process2()==FAIL) return;
. . .

   global_error=0;
   return;
}

This doesn’t take much space. Now you can report something like Probing failed (8) and I can at least go to the code and determine what the 8th step was that failed. I’m sure someone would even post a list of codes and what they meant in a case like that.

Too much overhead? Tell me the program counter where the error happened. That used to be a pretty common practice. Granted, it requires you to have a memory map file and know how to read it but it is still better than nothing.

We spend a lot of time thinking about how projects and software should work. But we need to spend time thinking, too, about what happens when they don’t work. It is fine that we can do in-circuit debugging or hook up a logic analyzer, but that won’t help our users. Even if it is just for you, why not make it a little easier on yourself?

As we have said before, “There’s no such thing as too much information.” In addition to guarding against system errors, you can also help users not to astonish themselves.

Image Credit: [Elisa Ventur] via Unsplash.com


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK