The command in the
Unix
Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
family of computer
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s is a utility that is used to compare two
files for common and distinct lines. is specified in the
POSIX
The Portable Operating System Interface (POSIX; ) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines application programming interfaces (APIs), along with comm ...
standard. It has been widely available on
Unix-like
A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...
operating systems since the mid to late 1980s.
History
Written by
Lee E. McMahon, first appeared in
Version 4 Unix
Research Unix refers to the early versions of the Unix operating system for PDP-7, DEC PDP-7, PDP-11, VAX and Interdata 7/32 and 8/32 computers, developed in the Bell Labs Computing Sciences Research Center (CSRC). The term ''Research Unix'' first ...
.
The version of bundled in
GNU
GNU ( ) is an extensive collection of free software (394 packages ), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operating systems popu ...
coreutils
The GNU Core Utilities or coreutils is a collection of GNU software that implements many standard, Unix-based shell commands. The utilities generally provide POSIX compliant interface when the environment variable is set, but otherwise offers ...
was written by
Richard Stallman
Richard Matthew Stallman ( ; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
and David MacKenzie.
Usage
reads two files as input, regarded as lines of text. outputs one file, which contains three columns. The first two columns contain lines unique to the first and second file, respectively. The last column contains lines common to both. This functionally is similar to .
Columns are typically distinguished with the character. If the input files contain lines beginning with the separator character, the output columns can become ambiguous.
For efficiency, standard implementations of expect both input files to be sequenced in the same line
collation
Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office fi ...
order, sorted lexically. The
sort (Unix)
In computing, sort is a standard command line program of Unix and Unix-like operating systems, that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more so ...
command can be used for this purpose.
The algorithm makes use of the collating sequence of the current
locale. If the lines in the files are not both collated in accordance with the current locale, the result is undefined.
Return code
Unlike , the return code from has no logical significance concerning the relationship of the two files. A return code of 0 indicates success, a return code >0 indicates an error occurred during processing.
Example
$ cat foo
apple
banana
eggplant
$ cat bar
apple
banana
banana
zucchini
$ comm foo bar
apple
banana
banana
eggplant
zucchini
This shows that both files have one banana, but only bar has a second banana.
In more detail, the output file has the appearance that follows. Note that the column is interpreted by the number of leading tab characters. \t represents a tab character and \n represents a newline (
Escape character#Programming and data formats).
Comparison to diff
In general terms, is a more powerful utility than . The simpler is best suited for use in scripts.
The primary distinction between and is that discards information about the order of the lines prior to sorting.
A minor difference between and is that will not try to indicate that a line has "changed" between the two files; lines are either shown in the "from file #1", "from file #2", or "in both" columns. This can be useful if one wishes two lines to be considered different even if they only have subtle differences.
Other options
has
command-line option
A command-line interface (CLI) is a means of interacting with software via command (computing), commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user ...
s to suppress any of the three columns. This is useful for scripting.
There is also an option to read one file (but not both) from standard input.
Limits
Up to a full line must be buffered from each input file during line comparison, before the next output line is written.
Some implementations read lines with the function which does not impose any line length limits if system memory suffices.
Other implementations read lines with the function . This function requires a fixed buffer. For these implementations, the buffer is often sized according to the
POSIX
The Portable Operating System Interface (POSIX; ) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines application programming interfaces (APIs), along with comm ...
macro .
See also
*
Comparison of file comparison tools
*
List of Unix commands
This is a list of the shell commands of the most recent version of the Portable Operating System Interface (POSIX) IEEE Std 1003.1-2024 which is part of the Single UNIX Specification (SUS). These commands are implemented in many shells on moder ...
*
cmp (Unix)
In computing, cmp is a command-line utility on Unix and Unix-like operating systems that compares two files of any type and writes the results to the standard output. By default, cmp is silent if the files are the same; if they differ, the byte ...
– character oriented file comparison
*
cut (Unix) – splitting column-oriented files
References
External links
*
*
*
{{Core Utilities commands
Free file comparison tools
Comm
Unix SUS2008 utilities
Plan 9 commands
Inferno (operating system) commands