13

Ctrl-C: Why Programmers Can’t "Reset" Programs With Ctrl-C, but Used t...

 2 years ago
source link: https://kevinlawler.com/ctrl-c
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Ctrl-C: Why Programmers Can’t "Reset" Programs With Ctrl-C, but Used to Be Able To, and Why They Should Be Able to Again

August 2022

When a programmer presses ctrl-c inside of a command-line program, that program should stop what it's doing, return to the nearest sensible restart position, and allow you to continue where you left off.

This isn't much different from insisting on "one-button builds," except it's an older phenomenon. Programs actually used to work this way: you could press ctrl-c to get them to give you back control. Only now, they don't really. More often than not I find myself having to kill the running process from an external app, such as the shell, after first figuring out what the process ID is. Not only is this awkward and inconvenient, but it guarantees that you lose everything that you were working on. When you hit ctrl-c, a program should return control to you while keeping as much of your work intact as possible.

These days what usually happens is the program gets stuck in some sort of tight C loop, such as a for loop, and it becomes unreachable to the outside world. Even if the original author was clever and added yields() at checkpoints inside the app, invariably some optimization is going to require a tight C loop, and that loop isn't going to be looking for an opportunity to break. Inevitably the tight C loop is exactly where you'll wish you had a checkpoint because in practice that's where the program is going to get stuck. Maybe you can get around this by forcing your app to perpetually roundtrip checkpoints, but then your app is slow.

The explanation for why we have gotten ourselves into this mess is complicated but I will do my best to explain.

Terminating the process with a kill command is always going to work (eventually) for a variety of reasons: for one, the operating system is going to insist on some kind of ability to end processes, and it makes sense to expose this functionality to the user, and two, to get into the guts of it, on POSIX systems the termination signals are unignorable. Using ctrl-c generates a different signal, SIGINT, but that signal is ignorable, and if it isn't ignored, it terminates your program by default.

So, in the old days, what you needed was to attend to the SIGINT signal and add a handler for when it occurred. Of course, you also had to structure your program around recovering from the interruption. This can be a big ask, but for most standard architectures, such as an interpreter loop, say, it's going to be straightforward if not trivially easy. The real trick there, I suppose, is not leaking or creating inconsistent data, and that is a reasonably big ask. We don't want our ctrl-c to leak memory.

Since a ctrl-c can be delivered at any line of the code, that means providing guarantees, often in the form of critical sections with a signal mask, wherever it's possible to lose or corrupt data. If you allocate a piece of memory, you need to store a pointer to that memory from the object graph, and both of those operations need to occur inside of a critical section. Otherwise, if you get interrupted right after the allocation, there won't be any way to reach your memory and it will leak.

It was never especially easy not to leak memory and so most older applications compromised by just leaking it: we'll let you recover from anywhere, just don't expect us not to pollute all of heap memory and available RAM. I suppose the understanding was that you'd use your "extra life" to finish up what you were doing, maybe debug or save your workspace, and then you would eventually restart the application.

Where things really got broken was when multithreaded applications became commonplace. That's when ctrl-c basically stopping working at all. The reason for this is powerful but subtle. Prior to multithreading, apps were necessarily single-threaded, and any parallelization model they used was something earlier, like process signals or fork(). The issue with this is that for backwards compatibility reasons pthreads were layered on top of the older models, and as it turns out it is extraordinarily difficult to mix parallelization models in this way. The response you are going to receive if you attempt to mix signals and threads, if the person even understands what you are talking about, is "don't." It can be done however, and I think it's time for more people to acquire this skill.

What typically goes wrong in mixing signals and threads is that the signal must first be corralled into one specific thread which must then deactivate all of the running threads. Even literally following the prompt here can be a multi-day task; following the spirit of the prompt is an order of magnitude harder. Procedures for ending threads do not behave like most people would guess, so that's an early challenge. Any program with multithreaded interaction is necessarily going to have its own complicated system of gating and mutexes, and is going to require threads to be trackable and stopped at any point. All the cross thread interactions must be consistent in the face of a SIGINT, and each thread must itself behave as its own ctrl-c-safe running program. The entire thing can be interrupted at any time, in any configuration. The potential interactions can be too much to think about, and so nobody does.

Multithreading and signals are one of the darker areas of operating systems and you will find that they are not well trod even by the operating systems programmers assigned to them. I routinely see code that is designed to cut corners in order to finish the problem and have it be correct, if it isn't outright wrong or nonconforming. No OS seems to want to invest as much effort in parallelization or standards as they do in the critical single-threaded path. The lack of robust platform code and newly produced standards is one of the reasons why we don't have good ctrl-c support in the first place, and that prevents us from having other nice things, like command-line interfaces with internal debuggers. We'd be better off if we embraced interactions between multithreading and signals as real first-class features instead of trying to drag our feet, hoping they'll just disappear.

We've raised the issue of single-process multithreading, but you can easily see how this issue worsens when extended to multithreaded child processes, remote worker processes, etc. All apps I have ever seen architected this way rely on you to kill the whole suite. Imagine how nice it would be to press ctrl-c on your terminal and have the whole networked cluster gracefully yield to you.

Anyway, this is not impossible, it's just hard, and to do it you have to bake it in from the beginning. Most programmers are not going to be capable of working on such a feature. This is a challenge only for wizards. Even though ctrl-c shouldn't be a concern for every app, there are many apps where programmers spend lots of time where it should be implemented. It definitely applies to interpreters, database-style terminal interfaces, REPLs, consoles, calculators, command-lines, and other categories I've unintentionally left out.

When a programmer presses ctrl-c:

  1. User control must return promptly, from anywhere in the app. Promptly means below the threshold of human perception (~30ms), except with good explanation.

  2. Relevant jobs halt or pause. If dying makes sense then die.

  3. Memory does not leak, except in negligible amounts. An amount is neglible if no reasonable human repetition of ctrl-c can make it non-negligible.

  4. The global data graph remains consistent. That means no internal data format violations and no crashing.

  5. If the app advertises a stronger guarantee than consistency, ctrl-c doesn't affect that guarantee: an ACID-compliant database with atomic rollbacks still rolls back in the presence of ctrl-c.

  6. Critical sections also return promptly. Intentionally postponing a signal is not a license to dilly-dally.

  7. The ctrl-c recovery process remains repeatable so that ctrl-c can be invoked again and again. The program remains usable. As much progress as possible is preserved.

  8. All of this applies to all threads and subprocesses including networked subprocesses. Everything resets cleanly, as appropriate, in response to the listening thread.

  9. This functionality is available in optimized production code and does not depend on hobbled development builds.

Supporting ctrl-c in this way reduces delay at a crucial part of the development feedback loop and makes everyone more productive.

Most apps can probably not be retrofitted with ctrl-c. Any sufficiently mature project that didn't bake it in from the beginning is probably too late to be saved. I'm trying to think about the practical effort involved in doing this in a popular project, and it seems to be an "if aliens have threatened to destroy Earth..." type problem. Adding ctrl-c is going to be infeasible after the fact if it's a heroic effort to put it in from the beginning. I don't think it makes a lot of sense to needle projects in this state but I do think it makes sense to encourage project maintainers to build ctrl-c support into their projects going forward...if they are capable of doing so. The time to build in ctrl-c support is at the start of the project. That makes it much easier to carry all the way through.

Ack ColTim, ngn, eris, dzaima, Marshall, DiscoDoug, Alve, loke.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK