3

Misconceptions on Top of Misconceptions

 7 months ago
source link: http://www.os2museum.com/wp/misconceptions-on-top-of-misconceptions/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Misconceptions on Top of Misconceptions

While researching the precise meaning of the Ctrl-Z (26 decimal, hex 1Ah, ASCII SUB) character in DOS, I was somewhat taken aback by this article which purports to correct a common misconception.

The article is, for the most part, entirely correct. The handle-based file API in DOS 2.0 and onward deals with pure binary files and no byte has any special meaning. Any special meaning that the Ctrl-Z character might have in such files is generally implemented in application programs and run-time libraries.

However, the statement that “MS-DOS didn’t have an End-Of-File character of any sort” is grossly misleading. The statement that “The treatment of character 26 and the handling of “text” files was a shared delusion, […] wholly layered above DOS itself” is simply untrue.

First of all, claiming that COMMAND.COM is “just an application” requires stretching the definition of what is “DOS” to the breaking point, and likely beyond. Sure, COMMAND.COM is not part of DOS… now try booting DOS without it.

And COMMAND.COM certainly ascribes special meaning to Ctrl-Z — for example when interpreting batch files, processing stops at a Ctrl-Z character. Again, claiming that batch files are merely an application construct, not part of DOS, is contrary to most people’s understanding of what DOS is.

But it’s not just COMMAND.COM. The DOS kernel itself (MSDOS.SYS in Microsoft’s releases) very much does ascribe special meaning to Ctrl-Z. One look at DOS 1.25 MSDOS.ASM is enough to ascertain that DOS does treat Ctrl-Z (look for ‘1AH’ in the source code) specially.

The catch is that Ctrl-Z has no particular meaning in the DOS file API but rather in the device I/O code.

When writing to the AUX device (typically a serial port), Ctrl-Z is written but terminates the output. When writing to the CON or LST devices (typically the console and a printer, respectively), Ctrl-Z terminates the output and isn’t written.

Analogously when reading from an AUX device, Ctrl-Z is stored and terminates the input. Console input is treated similarly, but the logic is much more complex.

But Why?

Now that we’ve established that the DOS kernel does in fact treat Ctrl-Z as a special character, the logical next questions is, why? What’s the point of all this?

To understand the purpose of Ctrl-Z, one has to consider early versions of CP/M, as well as early pre-releases of 86-DOS which, after all, mimicked CP/M.

What CP/M versions 1.x/2.x as well as 86-DOS 0.x had in common is that file sizes were not stored with byte granularity. Instead, file sizes were only tracked in terms of 128-byte “records”, which typically happened to correspond to 128-byte floppy disk sectors.

For executable programs, this was not an issue. When loading a program, CP/M or 86-DOS loaded a certain number of records/sectors. If there was some junk in the last record (very likely), it didn’t matter because it was never executed as code and may have been overwritten by data.

However, for text files, or possibly other data files, this was a problem. No one wanted up to 127 bytes of junk displayed on the screen or sent to the printer. CP/M, like old DEC operating systems, adopted the ASCII SUB (substitute) character in order to solve the problem. The SUB character is defined in the ASCII standard as “a control character used in the place of a character that has been found to be invalid or in error”. Which means that SUB therefore should never be used in (ASCII) text.

One possible approach was to pre-fill the 128-byte record buffer with Ctrl-Z (ASCII SUB) characters, write to the buffer as many characters of text as there were available, and then write the buffer to disk. When reading, a Ctrl-Z character indicated an end of file.

In the canonical usage, there was no requirement that a text file must be terminated with Ctrl-Z. If a text file was an exact multiple of 128 bytes, there was no need to add another record containing just a Ctrl-Z. When reading the file, the program already knew that there were no more records and that the file had been completely read.

By necessity, much of the Ctrl-Z handling was the responsibility of applications, not least because applications (rather than the OS) knew whether they were dealing with text or binary files. Applications processing text files considered Ctrl-Z to be an End-Of-File (EOF) marker on input, and added a Ctrl-Z at the end of output.

Byte Granular Files

Circa April 1981, 86-DOS version 1.0 was refined to track the file size on the basis of bytes rather than records. This removed most of the need for Ctrl-Z, but not all. Any existing text files, as well as text files transferred from CP/M disks, still had file sizes that were multiples of 128 bytes and almost certainly contained junk at the end. DOS-based applications therefore continued treating Ctrl-Z as an EOF marker in text files.

Likewise for transferring files to other systems, many DOS-based applications continued adding a Ctrl-Z to the end of text files.

The built-in Ctrl-Z handling in the DOS kernel for console/AUX/printer I/O didn’t go away, and neither did the Ctrl-Z processing in COMMAND.COM. It was propagated to DOS-like operating systems such as OS/2 or Windows 9x and Windows NT.

Consider the following (on a Windows 10 workstation):

M:\dos\86dos>type "86-DOS v0.34 #221 - 81-02-20.imd"
IMD 1.18: 27/12/2023 16:20:11
Generated by Applesauce Fast Imager 1.88.3

Although the IMD floppy image format is binary, the IMD header only contains ASCII text. Because it ends with a Ctrl-Z character, TYPEing an IMD file will only show the header text and none of the binary data. This trick was used by a number of binary formats — Ctrl-Z is not an EOF marker in this usage, but it is an end of text marker.

In other words, the TYPE command still treats Ctrl-Z as an EOF marker, even in modern Windows versions.

There is another place where Ctrl-Z is not only used but necessary. Many DOS users know that in the absence of a text editor, it is possible to create a text file by using COPY CON FOO.TXT. But how does one terminate the file input? By using Ctrl-Z of course.

Summary

It is true that as far as the DOS file API is concerned, files on disk are purely binary and Ctrl-Z (or any other byte) has no special meaning. However, it is categorically untrue that DOS has no concept of Ctrl-Z as an EOF marker. When performing I/O to or from the console, AUX, or PRN device, Ctrl-Z is in fact an EOF marker, and this logic is built into the DOS kernel itself.

The command shell, COMMAND.COM, is an integral part of DOS and treats Ctrl-Z as an EOF marker in text files, including batch files.

The reason why Ctrl-Z is understood to be an EOF marker in text files is historical; Ctrl-Z was necessary on systems which did not track the exact file size and only managed fixed size (usually 128-byte) records.

Addendum: Manual Support

To further underscore the point, here are several quotes from the MS-DOS 3.21 User’s Reference manual (chosen simply because that’s the first I had on hand in electronic copy).

The /A switch of the COPY command is documented as follows:

Causes the file to be treated as an ASCII (text) file. Data in the file is copied up to but not including the first end-of-file mark (in edlin this is CONTROL-Z). The remainder of the file is not copied.

Documenting the F6 key functionality in the DOS input line editor (NB: F6 still behaves the same way in Windows 10):

Puts a CONTROL-Z (1AH) end-of-file character in the new template.

Explaining how to use COPY CON to create a batch file:

After the last line, press CONTROL-Z and then press RETURN to save the batch file. MS-DOS displays the message “1 File(s) copied” to show that it created the file.

While the first example, the COPY command, is implemented in COMMAND.COM, the other two (input line editing and file termination when copying from the console) are built into the DOS kernel itself.

It is clear that Microsoft understood Ctrl-Z to be an end-of-file character in DOS text files, and that the appropriate logic was built into the DOS kernel itself, even though the DOS file API does not treat any character specially.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK