A COM file is a type of simple
executable file
In computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithm
In mathematics
Mathematics is an area of knowledge ...
. On the
Digital Equipment Corporation
Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president un ...
(DEC)
VAX
VAX (an acronym for Virtual Address eXtension) is a series of computers featuring a 32-bit instruction set architecture (ISA) and virtual memory that was developed and sold by Digital Equipment Corporation
Digital Equipment Corporation ( ...
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating systems schedule tasks for efficient use of the system and may al ...
s of the 1970s,
.COM
was used as a
filename extension
A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically ...
for
text file
A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In opera ...
s containing commands to be issued to the operating system (similar to a
batch file
Batch may refer to:
Food and drink
* Batch (alcohol), an alcoholic fruit beverage
* Batch loaf, a type of bread popular in Ireland
* A dialect term for a bread roll used in North Warwickshire, Nuneaton and Coventry, as well as on the Wirr ...
).
With the introduction of
Digital Research
Digital Research, Inc. (DR or DRI) was a company created by Gary Kildall to market and develop his CP/M operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides comm ...
's
CP/M
CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system
An operating system (OS) is system software that manages computer hardware, software resources, and ...
(a
microcomputer
A microcomputer is a small, relatively inexpensive computer
A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers ...
operating system), the type of files commonly associated with COM extension changed to that of executable files. This convention was later carried over to
DOS. Even when complemented by the more general
EXE file format
A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free.
Some file form ...
for executables, the compact COM files remained viable and frequently used under DOS.
The
.COM
file name extension has no relation to the
.com (for "commercial") top-level Internet domain name. However, this similarity in name has been exploited by
malware
Malware (a portmanteau
A portmanteau word, or portmanteau (, ) is a blend of words writers.
DOS binary format
The COM format is the original binary executable format used in
CP/M
CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system
An operating system (OS) is system software that manages computer hardware, software resources, and ...
(including
SCP and
MSX-DOS) as well as
DOS. It is very simple; it has no header (with the exception of CP/M 3 files),
and contains no standard
metadata
Metadata is " data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
, only code and data. This simplicity exacts a price: the
binary has a maximum size of 65,280 (FF00
h) bytes (256 bytes short of 64 KB) and stores all its
code
In communication
Communication (from la, communicare, meaning "to share" or "to be in relation with") is usually defined as the transmission of information. The term may also refer to the message communicated through such transmissi ...
and
data
In the pursuit of knowledge
Knowledge can be defined as awareness of facts or as practical skills, and may also refer to familiarity with objects or situations. Knowledge of facts, also called propositional knowledge, is often def ...
in one
segment.
Since it lacks
relocation information, it is
loaded by the operating system at a pre-set address, at offset 0100h immediately following the
PSP, where it is executed (hence the limitation of the executable's size): the
entry point
In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments.
To start a program's execution, the loader or operating system
An o ...
is fixed at 0100h.
This was not an issue on 8-bit machines since they can address 64k of memory max, but 16-bit machines have a much larger address space, which is why the format fell out of use.
In the
Intel 8080
The Intel 8080 (''"eighty-eighty"'') is the second 8-bit microprocessor designed and manufactured by Intel
Intel Corporation is an American multinational corporation
A multinational company (MNC), also referred to as a multinational ...
CPU architecture, only 65,536 bytes of memory could be addressed (address range 0000h to FFFFh). Under CP/M, the first 256 bytes of this memory, from 0000h to 00FFh were reserved for system use by the
zero page, and any user program had to be loaded at exactly 0100h to be executed.
COM files fit this model perfectly. Before the introduction of
MP/M and
Concurrent CP/M, there was no possibility of running more than one program or command at a time: the program loaded at 0100h was run, and no other.
Although the file format is the same in DOS and CP/M, .COM files for the two operating systems are not compatible; DOS COM files contain
x86 instructions and possibly DOS
system call
In computing, a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program
A computer program is a sequence or set of instructions in a programming language
A programming language is a system of no ...
s, while CP/M COM files contain
8080
The Intel 8080 (''"eighty-eighty"'') is the second 8-bit
In computer architecture
In computer engineering, computer architecture is a description of the structure of a computer
A computer is a machine that can be programmed to ca ...
instructions and CP/M system calls (programs restricted to certain machines could also contain additional instructions for
8085 or
Z80).
.COM files in DOS set all x86 segment registers to the same value and the SP (stack pointer) register to the offset of the last word available in the first 64 KB segment (typically FFFEh) or the maximum size of memory available in the block the program is loaded into for both, the program plus at least 256 bytes stack, whatever is smaller, thus the stack begins at the very top of the corresponding memory segment and works down from there.
In the original DOS 1.x
API
An application programming interface (API) is a way for two or more computer program
A computer program is a sequence or set of instructions in a programming language
A programming language is a system of notation for writing computer p ...
, which was a derivative of the CP/M API, program termination of a .COM file would be performed by calling the INT 20h (Terminate Program) function or else INT 21h Function 0, which served the same purpose, and the programmer also had to ensure that the code and data segment registers contained the same value at program termination to avoid a potential system crash. Although this could be used in any DOS version, Microsoft recommended the use of INT 21h Function 4Ch for program termination from DOS 2.x onward, which did not require the data and code segment to be set to the same value.
It is possible to make a .COM file to run under both operating systems in form of a
fat binary. There is no true compatibility at the instruction level; the instructions at the
entry point
In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments.
To start a program's execution, the loader or operating system
An o ...
are chosen to be equal in functionality but different in both operating systems, and make program execution jump to the section for the operating system in use. It is basically two different programs with the same functionality in a single file, preceded by code selecting the one to use.
Under CP/M 3, if the first byte of a COM file is C9h, there is a 256-byte header;
since C9h corresponds to the
8080
The Intel 8080 (''"eighty-eighty"'') is the second 8-bit
In computer architecture
In computer engineering, computer architecture is a description of the structure of a computer
A computer is a machine that can be programmed to ca ...
instruction
RET
, this means that the COM file will immediately terminate if run on an earlier version of CP/M that does not support this extension. (Because the instruction sets of the 8085 and Z80 are supersets of the 8080 instruction set, this works on all three processors.) C9h is an
invalid opcode on the 8088/8086, and it will cause a processor-generated interrupt 6 exception in
v86 mode on the
386 and later x86 chips. Since C9h is the opcode for LEAVE since the
80188
The Intel 80188 microprocessor
A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit
An integrated circuit or monolithic integrated circuit (also referred to ...
/
80186
The Intel 80186, also known as the iAPX 186, or just 186, is a microprocessor
A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit
An integrated circui ...
and therefore not used as the first instruction in a valid program, the executable loader in some versions of DOS rejects COM files that start with C9h, avoiding a crash.
Files may have names ending in .COM, but not be in the simple format described above; this is indicated by a
magic number at the start of the file. For example, the
COMMAND.COM file in
DR DOS 6.0 is actually in
DOS executable format, indicated by the first two bytes being ''MZ'' (4Dh 5Ah), the initials of
Mark Zbikowski.
Large programs
Under
DOS there is no
memory management
Memory management is a form of resource management applied to computer memory
In computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation ...
provided for COM files by the
loader or execution environment. All memory is simply available to the COM file. After execution, the operating system command shell,
COMMAND.COM, is reloaded. This leaves the possibilities that the COM file can either be very simple, using a single
segment, or arbitrarily complex, providing its own memory management system. An example of a complex program is COMMAND.COM, the DOS shell, which provided a loader to load other COM or
EXE programs. In the .COM system, larger programs (up to the available memory size) can be loaded and run, but the system loader assumes that all code and data is in the first segment, and it is up to the .COM program to provide any further organization. Programs larger than available memory, or large
data segments, can be handled by
dynamic linking, if the necessary code is included in the .COM program. The advantage of using the .COM rather than .EXE format is that the binary image is usually smaller and easier to program using an
assembler.
Once
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based f ...
s and
linkers of sufficient power became available, it was no longer advantageous to use the .COM format for complex programs.
Platform support
The format is still
executable
In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data fi ...
on many modern
Windows NT
Windows NT is a proprietary graphical operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating syst ...
-based
platforms, but it is run in an
MS-DOS
MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services f ...
-emulating subsystem,
NTVDM
Virtual DOS machines (VDM) refer to a technology that allows running 16-bit/32-bit DOS and 16-bit Windows programs when there is already another operating system
An operating system (OS) is system software that manages computer hardware, ...
, which is not present in
64-bit
In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit CPUs and ALUs are those that are based on processor registers, address buses, or data buses of that size. A ...
variants. COM files can be executed also on DOS emulators such as
DOSBox
DOSBox is a free and open-source
Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software
Free software or libre software is computer software distributed under terms that a ...
, on any platform supported by these emulators.
Use for compatibility reasons
Windows NT
Windows NT is a proprietary graphical operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating syst ...
-based operating systems use the .com extension for a small number of commands carried over from MS-DOS days although they are in fact presently implemented as
.exe
.exe is a common filename extension
A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intende ...
files. The operating system will recognize the .exe file header and execute them correctly despite their technically incorrect .com extension. (In fact any .exe file can be renamed .com and still execute correctly.) The use of the original .com extensions for these commands ensures compatibility with older DOS batch files that may refer to them with their full original filenames. These commands are
DISKCOMP
,
DISKCOPY
,
FORMAT
,
MODE
,
MORE
and
TREE
In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that a ...
.
Execution preference
In DOS, if a directory contains both a COM file and an
EXE file with same name, when no extension is specified the COM file is preferentially selected for execution. For example, if a directory in the
system path contains two files named
foo.com
and
foo.exe
, the following would execute
foo.com
:
C:\>foo
A user wishing to run
foo.exe
can explicitly use the complete filename:
C:\>foo.exe
Taking advantage of this default behaviour,
virus
A virus is a submicroscopic infectious agent that replicates only inside the living cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria
Bacteria (; singular: bacte ...
writers and other malicious programmers have used names like
notepad.com
for their creations, hoping that if it is placed in the same directory as the corresponding EXE file, a command or batch file may accidentally trigger their program instead of the text editor
notepad.exe
. Again, these .com files may in fact contain a .exe format executable.
On
Windows NT
Windows NT is a proprietary graphical operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating syst ...
and derivatives (
Windows 2000
Windows 2000 is a major release of the Windows NT
Windows NT is a proprietary graphical operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common service ...
,
Windows XP
Windows XP is a major release of Microsoft's Windows NT operating system. It was release to manufacturing, released to manufacturing on August 24, 2001, and later to retail on October 25, 2001. It is a direct upgrade to its predecessors, Wind ...
,
Windows Vista
Windows Vista is a major release of the Windows NT operating system developed by Microsoft. It was the direct successor to Windows XP, which was released five years before, at the time being the longest time span between successive releases of ...
, and
Windows 7
Windows 7 is a major release of the Windows NT operating system developed by Microsoft. It was Software release life cycle#Release to manufacturing (RTM), released to manufacturing on July 22, 2009, and became generally available on October 22, ...
), the variable is used to override the order of preference (and acceptable extensions) for calling files without specifying the extension from the command line. The default value still places
.com
files before
.exe
files. This closely resembles a feature previously found in JP Software's line of extended command line processors
4DOS
4DOS is a command-line interpreter by JP Software, designed to replace the default command interpreter COMMAND.COM in Microsoft DOS
MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating sy ...
,
4OS2, and
4NT.
Malicious usage of the .com extension
Some computer virus writers have hoped to take advantage of modern computer users' likely lack of knowledge of the file extension and associated binary format, along with their more likely familiarity with the
.com Internet domain name. E-mails have been sent with attachment names similar to "www.example.com". Unwary
Microsoft Windows
Windows is a group of several Proprietary software, proprietary graphical user interface, graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, W ...
users clicking on such an attachment would expect to begin browsing a site named
http://www.example.com/
, but instead would run the attached binary command file named
www.example
, giving it full permission to do to their machine whatever its author had in mind.
There is nothing malicious about the COM file format itself; this is an exploitation of the coincidental name collision between .com ''com''mand files and .com ''com''mercial web sites.
See also
*
DOS API
The DOS API is an API which originated with 86-DOS and is used in MS-DOS
MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system
An operating system (OS) is system software that man ...
*
CMD file (CP/M)
*
Comparison of executable file formats
*
Fat binary
*
Executable compression
Notes
References
External links
COM 101 – a DOS executable walkthrough
{{Digital Research
DOS files
DOS technology
CP/M files
CP/M technology
Executable file formats
Filename extensions