Retro-Porting to OS/2 1.0

A few weeks ago I embarked on a somewhat crazy side project: Make the Open Watcom debugger work on OS/2 1.0. This project was not entirely successful, but I learned a couple of things along the way.

The Open Watcom debugger components for OS/2 are 32-bit modules, but didn’t use to be. At least up to and including Watcom C/C++ 11.0, the OS/2 debugger components were almost entirely 16-bit. That allowed them to work on 32-bit OS/2 but also on 16-bit OS/2 version 1.x.

Needless to say, debugging 32-bit programs with a 16-bit debugger required a bit of extra work and there was an additional interface module (os2v2hlp.exe) which called the 32-bit DosDebug API on 32-bit OS/2 and interfaced with the 16-bit debugger ‘trap file’ std32.dll.

Building the 16-bit components again was not particularly difficult. And they even mostly worked on OS/2 1.2 and 1.1. But I wanted to get the code working on OS/2 1.0 as well. Because why not.

Running the Open Watcom debugger on OS/2 1.1

The OS/2 debug trap file has code in it that’s supposed to refuse running unless the host is at least OS/2 1.2. It looks like this:

    DosGetVersion( &os2ver );
    if( os2ver < 0x114 ) {
        strcpy( err, TRP_OS2_not_1p2 );
        return( ver );
    }

The OS/2 major version is in the high byte and minor version in the low byte. This was clearly meant to check for versions below 1.20. But… the major version returned by OS/2 1.x is actually 10, not 1 — this was done so that the OS/2 major version would be higher than the DOS versions of the time (3.x, 4.0). So the version check actually never worked because the reported OS/2 version can’t be lower than 0x114! As a result, the debug trap file didn’t refuse to load on OS/2 1.0 when it should have. But it certainly did not work.

How does one debug a debugger? Either using a different, functioning debugger, or the old-fashioned way—debug print statements and such. I went the second route and after some difficulties I ran into, I’m not at all sure the first option is really workable at all on OS/2 1.0.

Before going into details, I will say that this kind of project may now be easier than ever before. Certainly much easier than it would have been 20 or so years ago. The reason is that antique programming documentation from IBM and Microsoft is now much easier to find than it used to be.

And for reasons that no one ever really understood, Microsoft’s and IBM’s programming documentation for OS/2 1.x was completely different, and there was always some information that could only be found only in one documentation set but not the other.

The first problem I ran into was that the trap file was crashing OS/2 1.0 hard. The cause turned out to be calls to the DosPTrace API executed before a program was loaded. OS/2 1.0 does not appear to detect this situation and dies. Newer OS/2 versions just return an error and don’t do anything harmful. Easy to fix.

The second problem that I identified is that the program to be debugged could not be started. This was because the DosStartSession API (which a debugger must use) takes a parameter structure, but said structure must be smaller on OS/2 1.0 and the API failed when attempting to use it with a newer, larger version of the structure. Again, easy enough to fix.

Another problem had to do with rather interesting logic that the Watcom debugger uses. It injects a small piece of code into the debugged process and runs it. The code does not do much, only invokes the DosLoadModule API to load a copy of the trap file. The debugger can later use this to perform various magic tricks like writing to the debugged program’s console or redirecting files within the debugged process.

The code, quite sensibly, queried the full path of the trap DLL and used that to load a copy of itself in the debugged process. This failed on OS/2 1.0. After briefly wondering why, I reviewed the documentation and found that in OS/2 1.0, DosLoadModule does not accept a full path.

The only thing that DosLoadModule can do on OS/2 1.0 is take an up to 8-character name as input. This name has the .DLL extension tacked onto it and the operating system searches for the file along LIBPATH. This is obviously quite restrictive, but it’s all OS/2 1.0 can do.

The next problem caused me quite a bit of grief. I used remote debugging (over an emulated serial port) to debug the GUI (really text UI) version of the OS/2 debugger. The debugger stubbornly kept failing to load a support file (cpp.prs). But if I ran the debugger directly (not trying to debug it), there was no problem loading the file!

Remote debugging to OS/2 1.0 works. Mostly.

I thought the problem was perhaps due to case sensitivity, or because OS/2 1.0 didn’t accept some specific combination of DosOpen flags. To narrow down the problem, I used the remote debugger to modify the DosOpen flags and re-run DosOpen repeatedly.

Well, I tried. Whenever I tried modifying the flags in the debugged process, the memory contents changed… to junk. After some head scratching, I found that the std16.dll trap file functionality to write memory had a not very obvious bug in it, taking the size of the wrong structure when calculating internal offsets. This bug was likely present for a very, very long time.

With the remote debugger improved, I could attack DosOpen again. I could not find any combination of flags that would work.

Eventually I had a flash of inspiration and realized that the problem was something completely different: When the debugged program started, it was running in the wrong directory! And that’s why it couldn’t find a file that was expected to be in the current directory. I believe the problem is that in OS/2 1.0, a new session always inherits open files etc. from the shell; in newer versions, the debugger can (and does) request that the new session inherits from the debugger instead. This is another area where OS/2 1.0 is different and more limited than the later versions.

In the end, I managed to get the remote debugging somewhat working on OS/2 1.0… but not particularly well. I can load a simple hello world program, run it, set breakpoints, step through it, inspect and modify its memory. I also verified that the debugger can properly catch #GP faults in a simple program.

As an aside, the debug interface in OS/2 1.0 has odd limitations. For example, a debugger can intercept general protection faults, but it can not intercept divide overflow faults. An integer divide by zero always terminates the program and the debugger can’t do anything about it.

But my attempts to debug a more complex program (the GUI debugger) miserably failed. The debugger, when executed directly (not in a debugger), always crashed—no doubt because of yet another subtle difference between OS/2 1.0 and later versions.

But when trying to run the debugger through remote debugging, all I could achieve was to crash OS/2. I could still load the program and set breakpoints, and step through it, but when it crashed, it took the whole OS with it.

All in all, it’s apparent that OS/2 1.0 was a work in progress. OS/2 1.1 was notably improved, clearly building on users’ experience with version 1.0. It was only OS/2 1.2 that finally included all major features that OS/2 1.x was supposed to have, including installable file systems (IFSs) and improved Presentation Manager. It’s no surprise that OS/2 1.0 was rough around the edges.

Retro-Porting to OS/2 1.0

Retro-Porting to OS/2 1.0

Recommend

PHP availability: Troubleshooting PHP availability scenarios

怕黑

NVIDIA何以成为全球AI训练标配？“黄氏定律”如何取代摩尔定律？一文了解芯片新王英伟达

PHP performance: Disk I/O causing poor performance

PHP performance: Profilers and debuggers for PHP applications on App Service Lin...

Using and troubleshooting GitHub Actions with Container Apps

Troubleshooting OCI runtime create errors

How to download files from a custom container

打扫卫生

实时社群技术专题(三)：百万级成员实时社群技术实现（关系系统篇）

About Joyk