HOME

TheInfoList



OR:

A COM file is a type of simple
executable file In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a da ...
. On the
Digital Equipment Corporation Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president until ...
(DEC)
VAX VAX (an acronym for virtual address extension) is a series of computers featuring a 32-bit instruction set architecture (ISA) and virtual memory that was developed and sold by Digital Equipment Corporation (DEC) in the late 20th century. The V ...
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
s of the 1970s, .COM was used as a
filename extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (for example, .txt, .mp3, .exe) that indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
for
text file A text file (sometimes spelled textfile; an old alternative name is flat file) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In ope ...
s containing commands to be issued to the operating system (similar to a
batch file A batch file is a Scripting language, script file in DOS, OS/2 and Microsoft Windows. It consists of a series of Command (computing), commands to be executed by the command-line interpreter, stored in a plain text file. A batch file may contain a ...
). With the introduction of
Digital Research Digital Research, Inc. (DR or DRI) was a privately held American software company created by Gary Kildall to market and develop his CP/M operating system and related 8-bit, 16-bit and 32-bit systems like MP/M, Concurrent DOS, FlexOS, Multiuser ...
's
CP/M CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/Intel 8085, 85-based microcomputers by Gary Kildall of Digital Research, Dig ...
(a
microcomputer A microcomputer is a small, relatively inexpensive computer having a central processing unit (CPU) made out of a microprocessor. The computer also includes memory and input/output (I/O) circuitry together mounted on a printed circuit board (P ...
operating system modeled after
TOPS-10 TOPS-10 System (Timesharing / Total Operating System-10) is a discontinued operating system from Digital Equipment Corporation (DEC) for the PDP-10 (or DECsystem-10) mainframe computer family. Launched in 1967, TOPS-10 evolved from the earlier "Mo ...
for the
PDP-10 Digital Equipment Corporation (DEC)'s PDP-10, later marketed as the DECsystem-10, is a mainframe computer family manufactured beginning in 1966 and discontinued in 1983. 1970s models and beyond were marketed under the DECsystem-10 name, especi ...
), the type of files commonly associated with COM extension changed to that of executable files. This convention was later carried over to
DOS DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...
. Even when complemented by the more general
EXE Exe or EXE may refer to: * .exe, a file extension * exe., abbreviation for Executive (disambiguation)#Role, title, or function, executive Places * River Exe, in England * Exe Estuary, in England * Exe Island, in Exeter, England Transportation a ...
file format A file format is a Computer standard, standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary format, pr ...
for executables, the compact COM files remained viable and frequently used under DOS. The .COM file name extension has no relation to the .com (for "commercial") top-level Internet domain name. However, this similarity in name has been exploited by
malware Malware (a portmanteau of ''malicious software'')Tahir, R. (2018)A study on malware and malware detection techniques . ''International Journal of Education and Management Engineering'', ''8''(2), 20. is any software intentionally designed to caus ...
writers.


DOS binary format

The COM format is the original binary executable format used in
CP/M CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/Intel 8085, 85-based microcomputers by Gary Kildall of Digital Research, Dig ...
(including SCP and
MSX-DOS MSX-DOS is a discontinued disk operating system developed by Microsoft's Japan subsidiary for the 8-bit home computer standard MSX, and is a cross between MS-DOS v1.25 and CP/M-80 v2.2. MSX-DOS MSX-DOS and the extended BASIC with 3½-in ...
) as well as
DOS DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...
. It is very simple; it has no header (with the exception of CP/M 3 files), and contains no standard
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
, only code and data. This simplicity exacts a price: the
binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two values (0 and 1) for each digit * Binary function, a function that takes two arguments * Binary operation, a mathematical op ...
has a maximum size of 65,280 (FF00 h) bytes (256 bytes short of 64 KiB) and stores all its
code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communicati ...
and
data Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
in one segment. Since it lacks relocation information, it is loaded by the operating system at a pre-set address, at offset 0100h immediately following the PSP, where it is executed (hence the limitation of the executable's size): the
entry point In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments. To start a program's execution, the loader or operating system passes co ...
is fixed at 0100h. This was not an issue on 8-bit machines since they can only address 64k of memory, but 16-bit machines have a much larger address space, which is why the format fell out of use. In the
Intel 8080 The Intel 8080 is Intel's second 8-bit computing, 8-bit microprocessor. Introduced in April 1974, the 8080 was an enhanced successor to the earlier Intel 8008 microprocessor, although without binary compatibility.'' Electronic News'' was a week ...
CPU architecture, only 65,536 bytes of memory could be addressed (address range 0000h to FFFFh). Under CP/M, the first 256 bytes of this memory, from 0000h to 00FFh were reserved for system use by the
zero page The zero page or base page is the block of memory at the very beginning of a computer's address space; that is, the page whose starting address is zero. The size of a page depends on the context, and the significance of zero page memory versus h ...
, and any user program had to be loaded at exactly 0100h to be executed. COM files fit this model perfectly. Before the introduction of
MP/M MP/M (Multi-Programming Monitor Control Program) is a discontinued multi-user version of the CP/M operating system, created by Digital Research developer Tom Rolander in 1979. It allowed multiple users to connect to a single computer, each u ...
and
Concurrent CP/M MP/M (Multi-Programming Monitor Control Program) is a discontinued multi-user version of the CP/M operating system, created by Digital Research Digital Research, Inc. (DR or DRI) was a privately held American software company created by ...
, there was no possibility of running more than one program or command at a time: the program loaded at 0100h was run, and no other. Although the file format is the same in DOS and CP/M, .COM files for the two operating systems are not compatible; DOS COM files contain
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
instructions and possibly DOS
system call In computing, a system call (syscall) is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services (for example, accessing a hard disk drive ...
s, while CP/M COM files contain
8080 The Intel 8080 is Intel's second 8-bit microprocessor. Introduced in April 1974, the 8080 was an enhanced successor to the earlier Intel 8008 microprocessor, although without binary compatibility.'' Electronic News'' was a weekly trade newspa ...
instructions and CP/M system calls (programs restricted to certain machines could also contain additional instructions for 8085 or
Z80 The Zilog Z80 is an 8-bit microprocessor designed by Zilog that played an important role in the evolution of early personal computing. Launched in 1976, it was designed to be software-compatible with the Intel 8080, offering a compelling altern ...
). .COM files in DOS set all x86 segment registers to the same value and the SP (stack pointer) register to the offset of the last word available in the first 64 KiB segment (typically FFFEh) or the maximum size of memory available in the block the program is loaded into for both, the program plus at least 256 bytes stack, whatever is smaller, thus the stack begins at the very top of the corresponding memory segment and works down from there. In the original DOS 1.x
API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
, which was a derivative of the CP/M API, program termination of a .COM file would be performed by calling the INT 20h (Terminate Program) function or else INT 21h Function 0, which served the same purpose, and the programmer also had to ensure that the code and data segment registers contained the same value at program termination to avoid a potential system crash. Although this could be used in any DOS version, Microsoft recommended the use of INT 21h Function 4Ch for program termination from DOS 2.x onward, which did not require the data and code segment to be set to the same value. It is possible to make a .COM file to run under both operating systems in form of a
fat binary A fat binary (or multiarchitecture binary) is a computer executable program or library which has been expanded (or "fattened") with code native to multiple instruction sets which can consequently be run on multiple processor types. This results ...
. There is no true compatibility at the instruction level; the instructions at the
entry point In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments. To start a program's execution, the loader or operating system passes co ...
are chosen to be equal in functionality but different in both operating systems, and make program execution jump to the section for the operating system in use. It is basically two different programs with the same functionality in a single file, preceded by code selecting the one to use. Under CP/M 3, if the first byte of a COM file is C9h, there is a 256-byte header; since C9h corresponds to the
8080 The Intel 8080 is Intel's second 8-bit microprocessor. Introduced in April 1974, the 8080 was an enhanced successor to the earlier Intel 8008 microprocessor, although without binary compatibility.'' Electronic News'' was a weekly trade newspa ...
instruction RET, this means that the COM file will immediately terminate if run on an earlier version of CP/M that does not support this extension. (Because the instruction sets of the 8085 and Z80 are supersets of the 8080 instruction set, this works on all three processors.) C9h is an invalid opcode on the 8088/8086, however it is the opcode for LEAVE since the 80188/
80186 The Intel 80186, also known as the iAPX 186, or just 186, is a microprocessor and microcontroller introduced in 1982. It was based on the Intel 8086 and, like it, had a 16-bit external data bus multiplexed with a 20-bit address bus. The 80188 ...
. Albeit possible, LEAVE is unlikely to be used as the first instruction in a valid program. Thus the executable loader in some versions of DOS rejects COM files that start with C9h , avoiding a crash. Files may have names ending in .COM, but not be in the simple format described above; this is indicated by a magic number at the start of the file. For example, the COMMAND.COM file in DR DOS 6.0 is actually in
DOS executable The DOS MZ executable format is the executable file format used for .EXE files in DOS. The file can be identified by the ASCII string "MZ" (hexadecimal: 4D 5A) at the beginning of the file (the "Magic number (programming), magic number"). "MZ" ...
format, indicated by the first two bytes being ''MZ'' (4Dh 5Ah), the initials of
Mark Zbikowski Mark "Zibo" Joseph Zbikowski (born March 21, 1956) is a former Microsoft Architect and an early computer Hacker culture, hacker. He started working at the company only a few years after its inception, leading efforts in MS-DOS, OS/2, Cairo (operat ...
.


Large programs

Under
DOS DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...
there is no
memory management Memory management (also dynamic memory management, dynamic storage allocation, or dynamic memory allocation) is a form of Resource management (computing), resource management applied to computer memory. The essential requirement of memory manag ...
provided for COM files by the loader or execution environment. All memory is simply available to the COM file. After execution, the operating system command shell, COMMAND.COM, is reloaded. This leaves the possibilities that the COM file can either be very simple, using a single segment, or arbitrarily complex, providing its own memory management system. An example of a complex program is COMMAND.COM, the DOS shell, which provided a loader to load other COM or
EXE Exe or EXE may refer to: * .exe, a file extension * exe., abbreviation for Executive (disambiguation)#Role, title, or function, executive Places * River Exe, in England * Exe Estuary, in England * Exe Island, in Exeter, England Transportation a ...
programs. In the .COM system, larger programs (up to the available memory size) can be loaded and run, but the system loader assumes that all code and data is in the first segment, and it is up to the .COM program to provide any further organization. Programs larger than available memory, or large
data segment In computing, a data segment (often denoted .data) is a portion of an object file or the corresponding address space of a program that contains initialized static variables, that is, global variables and static local variables. The size of thi ...
s, can be handled by
dynamic linking In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed (at " run time"), by copying the content of libraries from persistent storage to RAM, fill ...
, if the necessary code is included in the .COM program. The advantage of using the .COM rather than .EXE format is that the binary image is usually smaller and easier to program using an assembler. Once
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
s and linkers of sufficient power became available, it was no longer advantageous to use the .COM format for complex programs.


Platform support

The format is still
executable In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), in ...
on many modern
Windows NT Windows NT is a Proprietary software, proprietary Graphical user interface, graphical operating system produced by Microsoft as part of its Windows product line, the first version of which, Windows NT 3.1, was released on July 27, 1993. Original ...
-based platforms, but it is run in an
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few op ...
-emulating subsystem,
NTVDM Virtual DOS machines (VDM) refer to a technology that allows running 16-bit/32-bit DOS and 16-bit Windows programs when there is already another operating system running and controlling the hardware. Overview Virtual DOS machines can operate e ...
, which is not present in
64-bit In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit central processing units (CPU) and arithmetic logic units (ALU) are those that are based on processor registers, a ...
variants. COM files can be executed also on DOS emulators such as
DOSBox DOSBox is a free and open-source MS-DOS emulator. It supports running programs primarily video games that are otherwise inaccessible since hardware for running a compatible disk operating system (DOS) is obsolete and generally unavailab ...
, on any platform supported by these emulators.


Use for compatibility reasons

Windows NT Windows NT is a Proprietary software, proprietary Graphical user interface, graphical operating system produced by Microsoft as part of its Windows product line, the first version of which, Windows NT 3.1, was released on July 27, 1993. Original ...
-based operating systems use the .com extension for a small number of commands carried over from MS-DOS days although they are in fact presently implemented as .exe files. The operating system will recognize the .exe file header and execute them correctly despite their technically incorrect .com extension. (In fact any .exe file can be renamed .com and still execute correctly.) The use of the original .com extensions for these commands ensures compatibility with older DOS batch files that may refer to them with their full original filenames. These commands are CHCP, DISKCOMP, DISKCOPY, FORMAT,
MODE Mode ( meaning "manner, tune, measure, due measure, rhythm, melody") may refer to: Arts and entertainment * MO''D''E (magazine), a defunct U.S. women's fashion magazine * ''Mode'' magazine, a fictional fashion magazine which is the setting fo ...
, MORE and
TREE In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
.


Execution preference

In DOS, if a directory contains both a COM file and an
EXE Exe or EXE may refer to: * .exe, a file extension * exe., abbreviation for Executive (disambiguation)#Role, title, or function, executive Places * River Exe, in England * Exe Estuary, in England * Exe Island, in Exeter, England Transportation a ...
file with same name, when no extension is specified the COM file is preferentially selected for execution. For example, if a directory in the system path contains two files named foo.com and foo.exe, the following would execute foo.com: C:\>foo A user wishing to run foo.exe can explicitly use the complete filename: C:\>foo.exe Taking advantage of this default behaviour,
virus A virus is a submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Viruses are ...
writers and other malicious programmers have used names like notepad.com for their creations, hoping that if it is placed in the same directory as the corresponding EXE file, a command or batch file may accidentally trigger their program instead of the text editor notepad.exe. Again, these .com files may in fact contain a .exe format executable. On
Windows NT Windows NT is a Proprietary software, proprietary Graphical user interface, graphical operating system produced by Microsoft as part of its Windows product line, the first version of which, Windows NT 3.1, was released on July 27, 1993. Original ...
and derivatives (
Windows 2000 Windows 2000 is a major release of the Windows NT operating system developed by Microsoft, targeting the server and business markets. It is the direct successor to Windows NT 4.0, and was Software release life cycle#Release to manufacturing (RT ...
,
Windows XP Windows XP is a major release of Microsoft's Windows NT operating system. It was released to manufacturing on August 24, 2001, and later to retail on October 25, 2001. It is a direct successor to Windows 2000 for high-end and business users a ...
,
Windows Vista Windows Vista is a major release of the Windows NT operating system developed by Microsoft. It was the direct successor to Windows XP, released five years earlier, which was then the longest time span between successive releases of Microsoft W ...
, and
Windows 7 Windows 7 is a major release of the Windows NT operating system developed by Microsoft. It was Software release life cycle#Release to manufacturing (RTM), released to manufacturing on July 22, 2009, and became generally available on October 22, ...
), the variable is used to override the order of preference (and acceptable extensions) for calling files without specifying the extension from the command line. The default value still places .com files before .exe files. This closely resembles a feature previously found in JP Software's line of extended command line processors
4DOS 4DOS is a command-line interpreter by JP Software, designed to replace the default command interpreter COMMAND.COM in MS-DOS and Windows. It was written by Rex C. Conn and Tom Rawson and first released in 1989. Compared to the default, it has ...
,
4OS2 4OS2 is the OS/2 analogue of 4NT and 4DOS by JP Software, Inc. JP Software discontinued 4OS2, TCMDOS2 and TCMD16, making version 3.0, 2.0, 2.0 the final version of these. The code for 4OS2 has been released, and is maintained, first by Sci ...
, and 4NT.


Malicious usage of the .com extension

Some computer virus writers have hoped to take advantage of modern computer users' likely lack of knowledge of the file extension and associated binary format, along with their more likely familiarity with the .com Internet domain name. E-mails have been sent with attachment names similar to "www.example.com". Unwary
Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
users clicking on such an attachment would expect to begin browsing a site named http://www.example.com/, but instead would run the attached binary command file named www.example, giving it full permission to do to their machine whatever its author had in mind. There is nothing malicious about the COM file format itself; this is an exploitation of the coincidental name collision between .com ''com''mand files and .com ''com''mercial web sites.


See also

* DOS API * CMD file (CP/M) *
Comparison of executable file formats This is a comparison of binary executable file formats which, once loaded by a suitable executable loader, can be directly executed by the CPU rather than being interpreted by software. In addition to the binary application code, the executables ...
*
Fat binary A fat binary (or multiarchitecture binary) is a computer executable program or library which has been expanded (or "fattened") with code native to multiple instruction sets which can consequently be run on multiple processor types. This results ...
*
Executable compression Executable compression is any means of compressing an executable file and combining the compressed data with decompression code into a single executable. When this compressed executable is executed, the decompression code recreates the original ...


Notes


References


External links


COM 101 – a DOS executable walkthrough
{{Digital Research DOS files DOS technology CP/M files CP/M technology Executable file formats Filename extensions