PC DOS 1.1 From Scratch

A number of years ago, the Computer History Museum together with Microsoft released the source code for MS-DOS 1.25 (very close to PC DOS 1.1) and MS-DOS 2.11. I never did anything with it beyond glancing at the code, in no small part because the release was rather poorly organized.

PC DOS 1.1 rebuilding itself

Now I finally decided to look at the code for DOS 1.1 and see how far I could get with it. For both DOS 1.1 and 2.0, there are ‘object’ and ‘source’ directories. The ‘object’ directory for DOS 1.1 simply contains a copy of PC DOS 1.1, which is not particularly revealing or useful on its own (and strictly speaking I’m not even sure why the CHM thought it could publish those files).

The ‘source’ directory is much more interesting and contains the following files:

05/09/1983 09:59 AM   63,781 ASM.ASM
05/17/1983 06:19 PM   67,064 COMMAND.ASM
07/02/1982 11:33 AM    3,625 HEX2BIN.ASM
08/03/1982 12:29 AM   36,882 IO.ASM
05/17/1983 06:15 PM  114,253 MSDOS.ASM
05/17/1983 06:20 PM      649 STDDOS.ASM
07/01/1982 11:54 PM   16,223 TRANS.ASM

This turns out to be an interesting mix, and an included 2013 e-mail from Tim Paterson explains its origin: Those files are the source code for what SCP (Seattle Computer Products) shipped to its customers. ASM, HEX2BIN, and TRANS were SCP’s development tools used for initial DOS development. MSDOS.ASM and COMMAND.ASM are source code for core DOS components. IO.ASM is source code for SCP’s IO.SYS (i.e. IBMBIO.COM equivalent).

Macros called MSVER and IBMVER are used to build either IBM-style or Microsoft-style COMMAND.COM and DOS kernel (MSDOS.SYS/IBMBIO.COM). Interestingly, MSDOS.ASM and COMMAND.ASM are clearly intended to be assembled with MASM, not with the included SCP assembler. IO.ASM on the other hand is written for SCP’s ASM. It can be assumed that at some point around 1982, Microsoft converted the DOS kernel and COMMAND.COM from SCP’s assembler to MASM.

The obvious gaping hole is the lack of any source code for IBMBIO.COM. I do not know exactly what arrangement IBM and Microsoft had at the time, but in the days of DOS 1.x and 2.x OEMs did not get the source code for IBMBIO.COM/IO.SYS suitable for PC compatibles.

I toyed with the idea of writing my own IBMBIO.COM replacement, but eventually gave up because it’s not a totally trivial piece of code and I had no real documentation to work with (until much later). The MSDOS.ASM source code obviously uses the IBMBIO interface, but makes no attempt to document it. The provided IO.ASM source is quite useful, but SCP’s hardware was different enough from the IBM PC that it is of limited utility.

So, disassembler it was, and I produced reconstructed source code for PC DOS 1.1 IBMBIO.COM. Actually assembling it turned out to be a bit of an adventure; more on that below.

COMMAND.COM

Building COMMAND.COM was not difficult once I found the right assembler. Very quickly I established that although it must have been originally written for SCP’s ASM, the COMMAND.COM source used numerous directives (e.g. SEGMENT, GROUP) that ASM does not support. MASM was the obvious suspect and after going through a couple of MASM versions, MASM 1.00 from 1982 looked like a good match (IBM’s MASM 1.0 appears to work as well, but its runtime suffers from bugs which may cause MASM to hang).

The CHM source code for COMMAND.COM contains the following section:

IBMVER  EQU     FALSE   ;Switch to build IBM version of Command
MSVER   EQU     TRUE    ;Switch to build MS-DOS version of Command

HIGHMEM EQU     TRUE    ;Run resident part above transient (high memory)

This needs to be flipped around to build an IBM style COMMAND.COM, i.e. IBMVER must be TRUE and MSVER and HIGHMEM must be FALSE. Note that the code can be built as is to produce a Microsoft-style COMMAND.COM which does run on top of IBMBIO.COM and IBMDOS.COM, but exhibits numerous differences.

COMMAND.COM can be built as follows:

MASM COMMAND;
LINK COMMAND;
EXE2BIN COMMAND .COM

The result of this is rather interesting. The fresh COMMAND.COM is 4,959 bytes long, just like the one in PC DOS 1.1, and it is almost identical. There is a three byte difference at offset 161h in the file, which corresponds to the following line in the source code:

MOV     [COMFCB],AL             ;Use default drive

In the COMMAND.COM shipped with PC DOS 1.1, that instruction is not exactly missing but rather replaced with NOPs. That strongly suggests someone at either IBM or Microsoft patched COMMAND.COM after it was built, rather than removing the line from source code and rebuilding.

The fact that the file was otherwise identical is good evidence that Microsoft in fact did use MASM 1.00 or something very close.

IBMDOS.COM

Reconstructing the PC DOS 1.1 version of IBMDOS.COM was a little more involved. As Tim Paterson’s e-mail explains, the provided source code is MS-DOS 1.25, but PC DOS 1.1 corresponds to MS-DOS 1.24. The revision history in MSDOS.ASM shows:

; 1.25 03/03/82 Put marker (00) at end of directory to speed searches

The changes are not identified in the source code, so I had to disassemble IBMDOS.COM and readjust the source code to match. That took some time even though the resulting modifications turned out to be fairly small.

Once again the assembly had to be modified to set MSVER FALSE and IBM TRUE (with the HIGHMEM and DSKTEST switches remaining FALSE).

Once this was done, I ended up with an IBMDOS.COM file that was 6,076 bytes long, while the PC DOS 1.1 original is 6,400 bytes long (an exact multiple of 256 but not a multiple of 512). The first 6,076 bytes were identical, but at this point I do not know why IBM’s version is longer. The extra bytes in IBM’s file are mostly zeros, but there’s also a hundred bytes or so of what appears to be junk, more or less random data copied from a buffer that hadn’t been zeroed.

That said, my shorter IBMDOS.COM appears to be working just fine. The padding at the end of IBMDOS.COM that IBM shipped should have no functional significance.

IBMBIO.COM

Although I had no source code for IBMBIO.COM at all, I did have SCP’s IO.ASM, as well as this very handy document.

My first attempt was to produce IBMBIO.ASM that could be built with SCP’s ASM. That turned out to be… interesting. SCP’s ASM uses a syntax that is not unlike MASM and other PC assemblers, but exhibits quite a few differences.

For example, ‘SHL AL,1’ must be coded simply as ‘SHL AL’, leaving out the immediate. On the other hand, ‘MUL CX’ must be coded as ‘MUL AX,CX’, explicitly mentioning the accumulator.

Unlike MASM, SCP’s assembler does not try to be clever and guess what ‘MOV AX,LABEL’ might mean. In ASM, it means ‘move offset of LABEL into AX’. To move the word at LABEL into AX, one must write ‘MOV AX,[LABEL]’.

To resolve size ambiguity, ASM does not use the BYTE PTR or WORD PTR syntax but rather B or W pseudo-operands, such as ‘MOV B,[FOO],5’ to indicate a byte-sized operation is intended.

Segment overrides are coded differently and instead of ‘MOV ES:[BX],AX’, one must write ‘SEG ES’ as a separate “instruction”, followed by ‘MOV [BX],AX’. This reflects the fact that segment overrides are encoded as prefixes separate from the instruction itself.

Some forms of the XCHG instruction ended up being backwards, e.g. ‘XCHG BX,SI’ was came out as what would be written in MASM as ‘XCHG SI,BX’. This has no impact on the behavior of the code and of course without seeing the original source, I don’t know how the XCHG was written.

After massaging the source code, I was left with a curious problem. Instructions such as ‘CMP DL,100’ ended up with a different encoding, namely the ‘S’ (sign extend) bit was set. Since the instruction does not set the ‘W’ (word data) bit, the ‘S’ bit is irrelevant. I was able to confuse ASM into producing the same encoding as found in IBMBIO.COM by using ‘CMP DL,100-256’, which takes advantage of a quirk in SCP’s ASM. But in this case, it’s reasonably certain that the original source code did not contain such weird constructs.

In the end, I convinced myself that SCP’s ASM was not what was used to build IBMBIO.COM in PC DOS 1.1. So I went ahead and converted the source code back to MASM style, which was not nearly as straightforward as I thought it’d be and I learned much about MASM’s phase errors and similar nonsense. I also gained appreciation for why so many programmers disliked MASM and why TASM and other assemblers quickly gained a following.

I was fairly certain that MASM was used to build IBMBIO.COM because various quirks matched exactly, such as pointless NOPs after certain MOV instructions (see here for an explanation where those come from).

I was able to re-create IBMBIO.ASM that produced an exact match for IBMBIO.COM in PC DOS 1.1, although again there was some random-looking junk in the middle and perhaps extra zeros at the end.

DOS Boot Notes

While reconstructing IBMDOS.COM, I was forced to acquaint myself closer with how the PC DOS 1.1 boot process works.

The files IBMBIO.COM and IBMDOS.COM must be the first two files on a bootable floppy, in that order, and stored in consecutive sectors right after the end of the root directory. The boot sector does not fully parse the directory entries but verifies that IBMBIO.COM and IBMDOS.COM are the first two files. The boot sector “knows” that the root directory is on sector 4 of the disk, right after the boot sector (at sector 1) and two one-sector copies of the FAT. This is true for both single-sided 160K and double-sided 320K floppies.

After that, the boot sector loads the first 20 sectors (10 KB) from the disk’s data area as a single blob starting at address 60:0. The assumption is that 10 KB covers all of IBMBIO.COM and IBMDOS.COM (and anything extra won’t cause harm). Keep in mind that the boot sector itself is loaded at 0:7C00, just below 32K. The boot sector then jumps to 60:0.

IBMBIO.COM (further just IBMBIO) is split into several sections. Near the the end (offset 650h) there’s early initialization code which does the following:

Install a DPT (Disk Parameter Table) at 50:70 and point interrupt vector 1Eh at it
Jump into the second initialization phase in the middle of IBMBIO

The second initialization stage is in a 512-byte area which later becomes a disk buffer. It does the following:

Set stack pointer to 0:600, i.e. 60:0, right below IBMBIO
Reset disks (INT 13h/00)
Initialize serial ports and printer
Install interrupt vectors 1, 3, and 4
Clear print screen flag at 50:0
Move 8KB down from E0:0 to BF:0; this is IBMDOS.COM, overwriting the no longer needed tail of IBMBIO
Detect installed drives and memory
Call the initialization entry point at the beginning of relocated IBMDOS.COM
DOS initialization returns with DS pointing to segment where COMMAND.COM will load

At this point, DOS will be functional but IBMBIO isn’t done yet. It further does the following:

Install interrupt vectors 25h/26h (absolute disk read/write)
Set DOS DTA (Disk Transfer Area) to DS:100h, just after the PSP
Open COMMAND.COM using a statically defined FCB
Read the entire COMMAND.COM file
Set DOS DTA to DS:80h as programs expect by default
Jump to DS:100h (start of COMMAND.COM code)

Now the IBMBIO initialization is done. Again, the entire second stage runs out of a 512-byte bounce buffer which is later used to handle DMA 64K boundary crossing.

What’s the deal with DMA 64K boundary crossings? To recap, the floppy drive in the IBM PC uses DMA, but the DMA controller can only access memory within a single 64K aligned block during one operation. If DOS needs to read or write a sector to/from memory that crosses a 64K boundary, IBMBIO needs to use the bounce buffer.

How can this work, you ask? How can the initialization code read a random COMMAND.COM file while residing inside a bounce buffer that disk reads may need? Easy: In PC DOS 1.1, it is guaranteed that COMMAND.COM will be loaded within the first 64K. In fact PC DOS 1.1 needs much less than 32K of memory to run.

That said, if the IBM-provided COMMAND.COM were replaced with an user-supplied executable that is 60 or so kilobytes big, DOS boot might fail due to the IBMBIO initialization code overwriting itself while handling a 64K DMA boundary crossing. This is very unlikely to have caused problems in practice.

Once initialization is completed, IBMBIO takes up less than 1.5K memory.

It is noteworthy but unsurprising that the interface “exported” by IBMBIO for use by DOS resembles a CP/M BIOS module. There’s a jump table at the beginning of IBMIO which DOS calls into using far calls. Interestingly, CP/M-86 merged the BIOS, BDOS, and CCP (COMMAND.COM equivalent) into a single CPM.SYS module.

The boot process also changed at some point in the early life of DOS (aka QDOS/86-DOS). Initially, disks had a large reserved area (typically two tracks or so) at the beginning, containing the boot sector and the BIOS + DOS components. This was changed so that IO.SYS/IBMBIO.COM and MSDOS.SYS/IBMDOS.COM would show up as files and the reserved area was reduced to the bare minimum, i.e. one boot sector. The internal logic changed very little however, and the BIOS and DOS files still had to occupy sequential, consecutive sectors at the beginning of the disk’s data area. Which is why they were marked as system/hidden to avoid being moved/fragmented.

That new arrangement changed very little for the boot loader, which only needed to start loading from a different sector, but had two advantages: The system files were easy to replace (as long as the newer files didn’t occupy more clusters on the disk), and they became optional. A non-system floppy didn’t need to waste space with boot files, which was roughly 5% of its capacity (about 8 K out of 160 K).

The Result

I was able to put together a bootable 320K floppy running something very close to PC DOS 1.1 with source code and tools that allow rebuilding IBMBIO.COM, IBMDOS.COM, and COMMAND.COM. Batch files (MKBIO.BAT, MKDOS.BAT, MKCOM.BAT) are provided to simplify the effort. Everything is done “in place” with very little room left on the disk.

The floppy (image) should run on any PC compatible or emulator.

Note that when building IBMBIO.COM on top of DOS 1.x, the user must enter ’60’ when prompted “Fix-ups needed – base segment (hex):”. When building on DOS 2.0 or later, file redirection does that automatically… but DOS 1.x did not support that yet. I do not know if it’s possible to write the code such that MASM/LINK would do the work that EXE2BIN otherwise needs to do.

In MSDOS.ASM, there are several instances of ‘IFDEF NEWVER’ bracketing code that was apparently added in DOS 1.25. Since NEWVER is not defined by default, code corresponding to DOS 1.24 is built, which happens to match PC DOS 1.1.

A version of MASM 1.00 dated 1-05-82 is used to build the source code. This is about the oldest MASM version I could find which is capable of building the source without errors. It also happens to be older than the source code, which means it could have been used back in the day, although it is not known what tools exactly Microsoft used, or if the DOS components were typically built on a PC at all. Clearly they at least could have been.

Happy retro development!

PC DOS 1.1 From Scratch

PC DOS 1.1 From Scratch

COMMAND.COM

IBMDOS.COM

IBMBIO.COM

DOS Boot Notes

The Result

Recommend

How to Take a 3D Photo on Your Phone

ARRIS CABLE MODEM TEARDOWN

~60x speed-up of Linux "perf"

How to install SafeEyes on Ubuntu / Fedora / Arch / Debian

The UX of new parenthood: Why we’ve got to conceive a better way

How to Find and Remove Duplicate Files on Linux Using fdupes

Learn To Build a Simple Dictionary Application Using JavaScript

Can we add the -graphy to the crypto tag?

How Does Antivirus Software Work?

Install Redis on Ubuntu 20.04|18.04 / Debian 10|9

About Joyk