Comm
   HOME

TheInfoList



OR:

The command in the
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
family of computer
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s is a utility that is used to compare two files for common and distinct lines. is specified in the
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming interf ...
standard. It has been widely available on
Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
operating systems since the mid to late 1980s.


History

Written by
Lee E. McMahon Lee Edward McMahon (October 24, 1931–February 15, 1989) was an American computer scientist. __TOC__ Family and education McMahon was born in St. Louis, Missouri, to father Leo E. McMahon and mother Catherine McCarthy. He grew up in St. Louis a ...
, first appeared in
Version 4 Unix The term "Research Unix" refers to early versions of the Unix operating system for PDP-7, DEC PDP-7, PDP-11, VAX and Interdata 7/32 and 8/32 computers, developed in the Bell Labs Computing Sciences Research Center (CSRC). History The term ''Re ...
. The version of bundled in
GNU GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
coreutils The GNU Core Utilities or coreutils is a package of GNU software containing implementations for many of the basic tools, such as cat, ls, and rm, which are used on Unix-like operating systems. In September 2002, the ''GNU coreutils'' were cr ...
was written by
Richard Stallman Richard Matthew Stallman (; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
and David MacKenzie.


Usage

reads two files as input, regarded as lines of text. outputs one file, which contains three columns. The first two columns contain lines unique to the first and second file, respectively. The last column contains lines common to both. This functionally is similar to . Columns are typically distinguished with the character. If the input files contain lines beginning with the separator character, the output columns can become ambiguous. For efficiency, standard implementations of expect both input files to be sequenced in the same line
collation Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office fili ...
order, sorted lexically. The
sort (Unix) In computing, sort is a standard command line program of Unix and Unix-like operating systems, that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more so ...
command can be used for this purpose. The algorithm makes use of the collating sequence of the current locale. If the lines in the files are not both collated in accordance with the current locale, the result is undefined.


Return code

Unlike , the return code from has no logical significance concerning the relationship of the two files. A return code of 0 indicates success, a return code >0 indicates an error occurred during processing.


Example

$ cat foo apple banana eggplant $ cat bar apple banana banana zucchini $ comm foo bar apple banana banana eggplant zucchini This shows that both files have one banana, but only bar has a second banana. In more detail, the output file has the appearance that follows. Note that the column is interpreted by the number of leading tab characters. \t represents a tab character and \n represents a newline ( Escape character#Programming and data formats).


Comparison to diff

In general terms, is a more powerful utility than . The simpler is best suited for use in scripts. The primary distinction between and is that discards information about the order of the lines prior to sorting. A minor difference between and is that will not try to indicate that a line has "changed" between the two files; lines are either shown in the "from file #1", "from file #2", or "in both" columns. This can be useful if one wishes two lines to be considered different even if they only have subtle differences.


Other options

has
command-line option A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
s to suppress any of the three columns. This is useful for scripting. There is also an option to read one file (but not both) from standard input.


Limits

Up to a full line must be buffered from each input file during line comparison, before the next output line is written. Some implementations read lines with the function which does not impose any line length limits if system memory suffices. Other implementations read lines with the function . This function requires a fixed buffer. For these implementations, the buffer is often sized according to the
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming interf ...
macro .


See also

*
Comparison of file comparison tools This article compares computer software tools which are used for accomplishing comparisons of files of various types. The file types addressed by individual file comparison apps varies, but may include text, symbols, images, audio, or video. Th ...
*
List of Unix commands This is a list of Unix commands as specified by IEEE Std 1003.1-2008, which is part of the Single UNIX Specification (SUS). These commands can be found on Unix operating systems and most Unix-like operating systems. List See also * List of G ...
*
cmp (Unix) In computing, cmp is a command-line utility on Unix and Unix-like operating systems that compares two files of any type and writes the results to the standard output. By default, cmp is silent if the files are the same; if they differ, the byte a ...
– character oriented file comparison *
cut (Unix) In computing, cut is a command line utility on Unix and Unix-like operating systems which is used to extract sections from each line of input — usually from a file. It is currently part of the GNU coreutils package and the BSD Base System. Ext ...
– splitting column-oriented files


References


External links

* * * {{Core Utilities commands Free file comparison tools
Comm The command in the Unix family of computer operating systems is a utility that is used to compare two files for common and distinct lines. is specified in the POSIX standard. It has been widely available on Unix-like operating systems since ...
Unix SUS2008 utilities Plan 9 commands Inferno (operating system) commands