COM file (DOS)
   HOME

TheInfoList



OR:

A COM file is a type of simple executable file. On the Digital Equipment Corporation (DEC) VAX operating systems of the 1970s, .COM was used as a filename extension for text files containing commands to be issued to the operating system (similar to a batch file). With the introduction of Digital Research's
CP/M CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/ 85-based microcomputers by Gary Kildall of Digital Research, Inc. Initial ...
(a
microcomputer A microcomputer is a small, relatively inexpensive computer having a central processing unit (CPU) made out of a microprocessor. The computer also includes memory and input/output (I/O) circuitry together mounted on a printed circuit board (PC ...
operating system), the type of files commonly associated with COM extension changed to that of executable files. This convention was later carried over to DOS. Even when complemented by the more general
EXE Exe or EXE may refer to: * .exe, a file extension * exe., abbreviation for executive Places * River Exe, in England * Exe Estuary, in England * Exe Island, in Exeter, England Transportation and vehicles * Exe (locomotive), a British locomotive ...
file format A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file formats ...
for executables, the compact COM files remained viable and frequently used under DOS. The .COM file name extension has no relation to the
.com The domain name .com is a top-level domain (TLD) in the Domain Name System (DNS) of the Internet. Added at the beginning of 1985, its name is derived from the word ''commercial'', indicating its original intended purpose for domains registere ...
(for "commercial") top-level Internet domain name. However, this similarity in name has been exploited by
malware Malware (a portmanteau for ''malicious software'') is any software intentionally designed to cause disruption to a computer, server, client, or computer network, leak private information, gain unauthorized access to information or systems, depri ...
writers.


DOS binary format

The COM format is the original binary executable format used in
CP/M CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/ 85-based microcomputers by Gary Kildall of Digital Research, Inc. Initial ...
(including
SCP SCP may refer to: Organizations Political parties * Soviet Communist Party, the leading political party in the former Soviet Union * Syrian Communist Party * Sudanese Communist Party * Scottish Christian Party Companies * Seattle Computer Produ ...
and MSX-DOS) as well as DOS. It is very simple; it has no header (with the exception of CP/M 3 files), and contains no standard
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
, only code and data. This simplicity exacts a price: the binary has a maximum size of 65,280 (FF00 h) bytes (256 bytes short of 64 KB) and stores all its
code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
and data in one segment. Since it lacks relocation information, it is loaded by the operating system at a pre-set address, at offset 0100h immediately following the PSP, where it is executed (hence the limitation of the executable's size): the entry point is fixed at 0100h. This was not an issue on 8-bit machines since they can address 64k of memory max, but 16-bit machines have a much larger address space, which is why the format fell out of use. In the Intel 8080 CPU architecture, only 65,536 bytes of memory could be addressed (address range 0000h to FFFFh). Under CP/M, the first 256 bytes of this memory, from 0000h to 00FFh were reserved for system use by the zero page, and any user program had to be loaded at exactly 0100h to be executed. COM files fit this model perfectly. Before the introduction of MP/M and Concurrent CP/M, there was no possibility of running more than one program or command at a time: the program loaded at 0100h was run, and no other. Although the file format is the same in DOS and CP/M, .COM files for the two operating systems are not compatible; DOS COM files contain x86 instructions and possibly DOS system calls, while CP/M COM files contain 8080 instructions and CP/M system calls (programs restricted to certain machines could also contain additional instructions for
8085 The Intel 8085 ("''eighty-eighty-five''") is an 8-bit microprocessor produced by Intel and introduced in March 1976. It is software-binary compatible with the more-famous Intel 8080 with only two minor instructions added to support its added i ...
or Z80). .COM files in DOS set all x86 segment registers to the same value and the SP (stack pointer) register to the offset of the last word available in the first 64 KB segment (typically FFFEh) or the maximum size of memory available in the block the program is loaded into for both, the program plus at least 256 bytes stack, whatever is smaller, thus the stack begins at the very top of the corresponding memory segment and works down from there. In the original DOS 1.x API, which was a derivative of the CP/M API, program termination of a .COM file would be performed by calling the INT 20h (Terminate Program) function or else INT 21h Function 0, which served the same purpose, and the programmer also had to ensure that the code and data segment registers contained the same value at program termination to avoid a potential system crash. Although this could be used in any DOS version, Microsoft recommended the use of INT 21h Function 4Ch for program termination from DOS 2.x onward, which did not require the data and code segment to be set to the same value. It is possible to make a .COM file to run under both operating systems in form of a fat binary. There is no true compatibility at the instruction level; the instructions at the entry point are chosen to be equal in functionality but different in both operating systems, and make program execution jump to the section for the operating system in use. It is basically two different programs with the same functionality in a single file, preceded by code selecting the one to use. Under CP/M 3, if the first byte of a COM file is C9h, there is a 256-byte header; since C9h corresponds to the 8080 instruction RET, this means that the COM file will immediately terminate if run on an earlier version of CP/M that does not support this extension. (Because the instruction sets of the 8085 and Z80 are supersets of the 8080 instruction set, this works on all three processors.) C9h is an
invalid opcode An illegal opcode, also called an unimplemented operation, unintended opcode or undocumented instruction, is an instruction to a CPU that is not mentioned in any official documentation released by the CPU's designer or manufacturer, which nev ...
on the 8088/8086, and it will cause a processor-generated interrupt 6 exception in v86 mode on the
386 __NOTOC__ Year 386 ( CCCLXXXVI) was a common year starting on Thursday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Honorius and Euodius (or, less frequently, year 113 ...
and later x86 chips. Since C9h is the opcode for LEAVE since the
80188 The Intel 80188 microprocessor was a variant of the Intel 80186. The 80188 had an 8-bit external data bus instead of the 16-bit bus of the 80186; this made it less expensive to connect to peripherals. The 16-bit registers and the one megabyte add ...
/ 80186 and therefore not used as the first instruction in a valid program, the executable loader in some versions of DOS rejects COM files that start with C9h, avoiding a crash. Files may have names ending in .COM, but not be in the simple format described above; this is indicated by a magic number at the start of the file. For example, the
COMMAND.COM COMMAND.COM is the default command-line interpreter for MS-DOS, Windows 95, Windows 98 and Windows Me. In the case of DOS, it is the default user interface as well. It has an additional role as the usual first program run after boot (init proc ...
file in DR DOS 6.0 is actually in DOS executable format, indicated by the first two bytes being ''MZ'' (4Dh 5Ah), the initials of
Mark Zbikowski Mark "Zibo" Joseph Zbikowski (born March 21, 1956) is a former Microsoft Architect and an early computer hacker. He started working at the company only a few years after its inception, leading efforts in MS-DOS, OS/2, Cairo and Windows NT. In 2006 ...
.


Large programs

Under DOS there is no memory management provided for COM files by the
loader Loader can refer to: * Loader (equipment) * Loader (computing) ** LOADER.EXE, an auto-start program loader optionally used in the startup process of Microsoft Windows ME * Loader (surname) * Fast loader * Speedloader * Boot loader ** LOADER.COM ...
or execution environment. All memory is simply available to the COM file. After execution, the operating system command shell,
COMMAND.COM COMMAND.COM is the default command-line interpreter for MS-DOS, Windows 95, Windows 98 and Windows Me. In the case of DOS, it is the default user interface as well. It has an additional role as the usual first program run after boot (init proc ...
, is reloaded. This leaves the possibilities that the COM file can either be very simple, using a single segment, or arbitrarily complex, providing its own memory management system. An example of a complex program is COMMAND.COM, the DOS shell, which provided a loader to load other COM or
EXE Exe or EXE may refer to: * .exe, a file extension * exe., abbreviation for executive Places * River Exe, in England * Exe Estuary, in England * Exe Island, in Exeter, England Transportation and vehicles * Exe (locomotive), a British locomotive ...
programs. In the .COM system, larger programs (up to the available memory size) can be loaded and run, but the system loader assumes that all code and data is in the first segment, and it is up to the .COM program to provide any further organization. Programs larger than available memory, or large data segments, can be handled by dynamic linking, if the necessary code is included in the .COM program. The advantage of using the .COM rather than .EXE format is that the binary image is usually smaller and easier to program using an assembler. Once compilers and linkers of sufficient power became available, it was no longer advantageous to use the .COM format for complex programs.


Platform support

The format is still
executable In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), instructi ...
on many modern Windows NT-based platforms, but it is run in an MS-DOS-emulating subsystem, NTVDM, which is not present in 64-bit variants. COM files can be executed also on DOS emulators such as DOSBox, on any platform supported by these emulators.


Use for compatibility reasons

Windows NT-based operating systems use the .com extension for a small number of commands carried over from MS-DOS days although they are in fact presently implemented as
.exe .exe is a common filename extension denoting an executable file (the main execution point of a computer program) for Microsoft Windows, OS/2, and DOS. File formats There are numerous file formats which may be used by a file with a extensi ...
files. The operating system will recognize the .exe file header and execute them correctly despite their technically incorrect .com extension. (In fact any .exe file can be renamed .com and still execute correctly.) The use of the original .com extensions for these commands ensures compatibility with older DOS batch files that may refer to them with their full original filenames. These commands are DISKCOMP, DISKCOPY,
FORMAT Format may refer to: Printing and visual media * Text formatting, the typesetting of text elements * Paper formats, or paper size standards * Newspaper format, the size of the paper page Computing * File format, particular way that informatio ...
, MODE,
MORE More or Mores may refer to: Computing * MORE (application), outline software for Mac OS * more (command), a shell command * MORE protocol, a routing protocol * Missouri Research and Education Network Music Albums * ''More!'' (album), by Booka ...
and TREE.


Execution preference

In DOS, if a directory contains both a COM file and an
EXE Exe or EXE may refer to: * .exe, a file extension * exe., abbreviation for executive Places * River Exe, in England * Exe Estuary, in England * Exe Island, in Exeter, England Transportation and vehicles * Exe (locomotive), a British locomotive ...
file with same name, when no extension is specified the COM file is preferentially selected for execution. For example, if a directory in the system path contains two files named foo.com and foo.exe, the following would execute foo.com: C:\>foo A user wishing to run foo.exe can explicitly use the complete filename: C:\>foo.exe Taking advantage of this default behaviour, virus writers and other malicious programmers have used names like notepad.com for their creations, hoping that if it is placed in the same directory as the corresponding EXE file, a command or batch file may accidentally trigger their program instead of the text editor notepad.exe. Again, these .com files may in fact contain a .exe format executable. On Windows NT and derivatives ( Windows 2000, Windows XP, Windows Vista, and Windows 7), the variable is used to override the order of preference (and acceptable extensions) for calling files without specifying the extension from the command line. The default value still places .com files before .exe files. This closely resembles a feature previously found in JP Software's line of extended command line processors
4DOS 4DOS is a command-line interpreter by JP Software, designed to replace the default command interpreter COMMAND.COM in Microsoft DOS and Windows. It was written by Rex C. Conn and Tom Rawson and first released in 1989. Compared to the default, ...
,
4OS2 4OS2 is the OS/2 analogue of 4NT and 4DOS by JP Software, Inc. JP Software discontinued 4OS2, TCMDOS2 and TCMD16, making version 3.0, 2.0, 2.0 the final version of these. The code for 4OS2 has been released, and is maintained, first by SciTech ...
, and 4NT.


Malicious usage of the .com extension

Some computer virus writers have hoped to take advantage of modern computer users' likely lack of knowledge of the file extension and associated binary format, along with their more likely familiarity with the
.com The domain name .com is a top-level domain (TLD) in the Domain Name System (DNS) of the Internet. Added at the beginning of 1985, its name is derived from the word ''commercial'', indicating its original intended purpose for domains registere ...
Internet domain name. E-mails have been sent with attachment names similar to "www.example.com". Unwary
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
users clicking on such an attachment would expect to begin browsing a site named http://www.example.com/, but instead would run the attached binary command file named www.example, giving it full permission to do to their machine whatever its author had in mind. There is nothing malicious about the COM file format itself; this is an exploitation of the coincidental name collision between .com ''com''mand files and .com ''com''mercial web sites.


See also

* DOS API * CMD file (CP/M) *
Comparison of executable file formats This is a comparison of binary executable file formats which, once loaded by a suitable executable loader, can be directly executed by the CPU rather than being interpreted by software. In addition to the binary application code, the executables ma ...
* Fat binary *
Executable compression Executable compression is any means of compressing an executable file and combining the compressed data with decompression code into a single executable. When this compressed executable is executed, the decompression code recreates the original ...


Notes


References


External links


COM 101 – a DOS executable walkthrough
{{Digital Research DOS files DOS technology CP/M files CP/M technology Executable file formats Filename extensions