In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
, sort is a standard
command line
A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
program of
Unix
Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
and
Unix-like
A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s, that prints the lines of its input or concatenation of all
files listed in its
argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key. Blank space is the default field separator. The command supports a number of
command-line options that can vary by implementation. For instance the "
-r
" flag will reverse the sort order.
History
A command that invokes a general sort facility was first implemented within
Multics
Multics ("Multiplexed Information and Computing Service") is an influential early time-sharing operating system based on the concept of a single-level memory.Dennis M. Ritchie, "The Evolution of the Unix Time-sharing System", Communications of t ...
. Later, it appeared in
Version 1 Unix
The term "Research Unix" refers to early versions of the Unix operating system for DEC PDP-7, PDP-11, VAX and Interdata 7/32 and 8/32 computers, developed in the Bell Labs Computing Sciences Research Center (CSRC).
History
The term ''Resear ...
. This version was originally written by
Ken Thompson
Kenneth Lane Thompson (born February 4, 1943) is an American pioneer of computer science. Thompson worked at Bell Labs for most of his career where he designed and implemented the original Unix operating system. He also invented the B programmi ...
at
AT&T Bell Laboratories
Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984),
then AT&T Bell Laboratories (1984–1996)
and Bell Labs Innovations (1996–2007),
is an American industrial research and scientific development company owned by mult ...
. By
Version 4 Thompson had modified it to use
pipes
Pipe(s), PIPE(S) or piping may refer to:
Objects
* Pipe (fluid conveyance), a hollow cylinder following certain dimension rules
** Piping, the use of pipes in industry
* Smoking pipe
** Tobacco pipe
* Half-pipe and quarter pipe, semi-circula ...
, but sort retained an option to name the output file because it was used to sort a file in place. In
Version 5, Thompson invented "-" to represent
standard input
In computer programming, standard streams are interconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin ...
.
The version of bundled in
GNU
GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
coreutils
The GNU Core Utilities or coreutils is a package of GNU software containing implementations for many of the basic tools, such as cat, ls, and rm, which are used on Unix-like operating systems.
In September 2002, the ''GNU coreutils'' were cr ...
was written by Mike Haertel and Paul Eggert. This implementation employs the
merge sort
In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
algorithm.
Similar commands are available on many other operating systems, for example a command is part of
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
's ''MSX-DOS2 Tools'' for
MSX-DOS
MSX-DOS is a discontinued disk operating system developed by Microsoft for the 8-bit home computer standard MSX, and is a cross between MS-DOS 1.25 and CP/M-80 2.
MSX-DOS
MSX-DOS and the extended BASIC with 3½-inch floppy disk supp ...
version 2.
The command has also been ported to the
IBM i operating system.
Syntax
sort
PTION..
ILE
Ile may refer to:
* iLe, a Puerto Rican singer
* Ile District (disambiguation), multiple places
* Ilé-Ifẹ̀, an ancient Yoruba city in south-western Nigeria
* Interlingue (ISO 639:ile), a planned language
* Isoleucine, an amino acid
* Anothe ...
..
With no
FILE
, or when
FILE
is
-
, the command reads from
standard input
In computer programming, standard streams are interconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin ...
.
Parameters
Examples
Sort a file in alphabetical order
$ cat phonebook
Smith, Brett 555-4321
Doe, John 555-1234
Doe, Jane 555-3214
Avery, Cory 555-4132
Fogarty, Suzie 555-2314
$ sort phonebook
Avery, Cory 555-4132
Doe, Jane 555-3214
Doe, John 555-1234
Fogarty, Suzie 555-2314
Smith, Brett 555-4321
Sort by number
The
-n
option makes the program sort according to numerical value. The command produces output that starts with a number, the file size, so its output can be piped to to produce a list of files sorted by (ascending) file size:
$ du /bin/* , sort -n
4 /bin/domainname
24 /bin/ls
102 /bin/sh
304 /bin/csh
The command with the option prints file sizes in the 7th field, so a list of the files sorted by file size is produced by:
$ find . -name "*.tex" -ls , sort -k 7n
Columns or fields
Use the
-k
option to sort on a certain column. For example, use "
-k 2
" to sort on the second column. In old versions of sort, the
+1
option made the program sort on the second column of data (
+2
for the third, etc.). This usage is deprecated.
$ cat zipcode
Adam 12345
Bob 34567
Joe 56789
Sam 45678
Wendy 23456
$ sort -k 2n zipcode
Adam 12345
Wendy 23456
Bob 34567
Sam 45678
Joe 56789
Sort on multiple fields
The
-k m,n
option lets you sort on a key that is potentially composed of multiple fields (start at column
m
, end at column
n
):
$ cat quota
fred 2000
bob 1000
an 1000
chad 1000
don 1500
eric 500
$ sort -k2,2n -k1,1 quota
eric 500
an 1000
bob 1000
chad 1000
don 1500
fred 2000
Here the first sort is done using column 2.
-k2,2n
specifies sorting on the key starting and ending with column 2, and sorting numerically. If
-k2
is used instead, the sort key would begin at column 2 and extend to the end of the line, spanning all the fields in between.
-k1,1
dictates breaking ties using the value in column 1, sorting alphabetically by default. Note that bob, and chad have the same quota and are sorted alphabetically in the final output.
Sorting a pipe delimited file
$ sort -k2,2,-k1,1 -t', ' zipcode
Adam, 12345
Wendy, 23456
Sam, 45678
Joe, 56789
Bob, 34567
Sorting a tab delimited file
Sorting a file with
tab separated values
A tab-separated values (TSV) file is a simple text format for storing data in a tabular structure, e.g., a database table or spreadsheet data, and a way of exchanging information between databases. Each record in the table is one line of the text ...
requires a
tab character
The tab key (abbreviation of tabulator key or tabular key) on a keyboard is used to advance the cursor to the next tab stop.
History
The word ''tab'' derives from the word ''tabulate'', which means "to arrange data in a tabular, or table, fo ...
to be specified as the column delimiter. This illustration uses the shell's dollar-quote notation
[
][
]
to specify the tab as a
C escape sequence.
$ sort -k2,2 -t $'\t' phonebook
Doe, John 555-1234
Fogarty, Suzie 555-2314
Doe, Jane 555-3214
Avery, Cory 555-4132
Smith, Brett 555-4321
Sort in reverse
The
-r
option just reverses the order of the sort:
$ sort -rk 2n zipcode
Joe 56789
Sam 45678
Bob 34567
Wendy 23456
Adam 12345
Sort in random
The GNU implementation has a
-R --random-sort
option based on hashing; this is not a full random shuffle because it will sort identical lines together. A true random sort is provided by the Unix utility
shuf
is a command-line utility included in the textutils package of GNU Core Utilities for creating a standard output consisting of random permutations of the input.
The version of shuf bundled in GNU coreutils
The GNU Core Utilities or coreutils ...
.
Sort by version
The GNU implementation has a
-V --version-sort
option which is a natural sort of (version) numbers within text. Two text strings that are to be compared are split into blocks of letters and blocks of digits. Blocks of letters are compared alpha-numerically, and blocks of digits are compared numerically (i.e., skipping leading zeros, more digits means larger, otherwise the leftmost digits that differ determine the result). Blocks are compared left-to-right and the first non-equal block in that loop decides which text is larger. This happens to work for IP addresses, Debian package version strings and similar tasks where numbers of variable length are embedded in strings.
See also
*
Collation
Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office fili ...
*
List of Unix commands
This is a list of Unix commands as specified by IEEE Std 1003.1-2008, which is part of the Single UNIX Specification (SUS). These commands can be found on Unix operating systems and most Unix-like operating systems.
List
See also
* List of G ...
*
uniq
uniq is a utility command (computing), command on Unix, Plan 9 from Bell Labs, Plan 9, Inferno (operating system), Inferno, and Unix-like operating systems which, when fed a text file or Standard streams#Standard input (stdin), standard input, o ...
*
shuf
is a command-line utility included in the textutils package of GNU Core Utilities for creating a standard output consisting of random permutations of the input.
The version of shuf bundled in GNU coreutils
The GNU Core Utilities or coreutils ...
References
Further reading
*
*
External links
Original Sort manpageThe original BSD Unix program's
manpage
A man page (short for manual page) is a form of software documentation usually found on a Unix or Unix-like operating system. Topics covered include computer programs (including library and system calls), formal standards and conventions, and ev ...
*
*
*
Further details about sort at Softpanorama
{{Core Utilities commands
Computing commands
Sorting algorithms
Unix text processing utilities
Unix SUS2008 utilities
Plan 9 commands
Inferno (operating system) commands
IBM i Qshell commands