
In
Unix-like
A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
computer
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
s, a pipeline is a mechanism for
inter-process communication using message passing. A pipeline is a set of
processes chained together by their
standard streams, so that the output text of each process (''
stdout'') is passed directly as input (''
stdin'') to the next one. The second process is started as the first process is still executing, and they are executed
concurrently.
The concept of pipelines was championed by
Douglas McIlroy at
Unix
Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
's ancestral home of
Bell Labs
Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984),
then AT&T Bell Laboratories (1984–1996)
and Bell Labs Innovations (1996–2007),
is an American industrial research and scientific development company owned by mult ...
, during the development of Unix, shaping its
toolbox philosophy. It is named by analogy to a physical
pipeline. A key feature of these pipelines is their "hiding of internals" (Ritchie & Thompson, 1974). This in turn allows for more clarity and simplicity in the system.
This article is about
anonymous pipe
In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication (IPC). An implementation is often integrated into the operating system's file IO subsystem. Typically a paren ...
s, where data written by one process is buffered by the operating system until it is read by the next process, and this uni-directional channel disappears when the processes are completed. This differs from
named pipes, where messages are passed to or from a pipe that is named by making it a file, and remains after the processes are completed. The standard
shell syntax for
anonymous pipe
In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication (IPC). An implementation is often integrated into the operating system's file IO subsystem. Typically a paren ...
s is to list multiple commands, separated by vertical bars ("pipes" in common Unix verbiage):
command1 , command2 , command3
For example, to list files in the current directory (), retain only the lines of output containing the string (), and view the result in a scrolling page (), a user types the following into the command line of a terminal:
ls -l , grep key , less
The command
ls -l
is executed as a process, the output (stdout) of which is piped to the input (stdin) of the process for
grep key
; and likewise for the process for
less
. Each
process takes input from the previous process and produces output for the next process via ''
standard streams''. Each
,
tells the shell to connect the standard output of the command on the left to the standard input of the command on the right by an
inter-process communication mechanism called an
(anonymous) pipe, implemented in the operating system. Pipes are unidirectional; data flows through the pipeline from left to right.
Example
Below is an example of a pipeline that implements a kind of
spell checker for the
web resource indicated by a
URL
A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifie ...
. An explanation of what it does follows.
curl "https://en.wikipedia.org/wiki/Pipeline_(Unix)" ,
sed 's/ a-zA-Z /g' ,
tr 'A-Z ' 'a-z\n' ,
grep ' -z ,
sort -u ,
comm -23 - <(sort /usr/share/dict/words) ,
less
#
curl
obtains the
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
contents of a web page (could use
wget
on some systems).
#
sed
replaces all characters (from the web page's content) that are not spaces or letters, with spaces. (
Newlines are preserved.)
#
tr
changes all of the uppercase letters into lowercase and converts the spaces in the lines of text to newlines (each 'word' is now on a separate line).
#
grep
includes only lines that contain at least one lowercase
alphabetical character (removing any blank lines).
#
sort
sorts the list of 'words' into alphabetical order, and the
-u
switch removes duplicates.
#
comm
finds lines in common between two files,
-23
suppresses lines unique to the second file, and those that are common to both, leaving only those that are found only in the first file named. The
-
in place of a filename causes
comm
to use its standard input (from the pipe line in this case).
sort /usr/share/dict/words
sorts the contents of the
words
file alphabetically, as
comm
expects, and
<( ... )
outputs the results to a temporary file (via
process substitution), which
comm
reads. The result is a list of words (lines) that are not found in /usr/share/dict/words.
#
less
allows the user to page through the results.
Pipelines in command line interfaces
All widely used Unix shells have a special syntax construct for the creation of pipelines. In all usage one writes the commands in sequence, separated by the
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
vertical bar character
,
(which, for this reason, is often called "pipe character"). The shell starts the processes and arranges for the necessary connections between their standard streams (including some amount of
buffer storage).
Error stream
By default, the
standard error stream
In computer programming, standard streams are interconnected input and output communication channels between a computer program and its environment when it begins execution. The three input/output (I/O) connections are called standard input (stdin ...
s ("
stderr") of the processes in a pipeline are not passed on through the pipe; instead, they are merged and directed to the
console. However, many shells have additional syntax for changing this behavior. In the
csh shell, for instance, using
, &
instead of
,
signifies that the standard error stream should also be merged with the standard output and fed to the next process. The
Bash shell can also merge standard error with
, &
since version 4.0 or using
2>&1
, as well as redirect it to a different file.
Pipemill
In the most commonly used simple pipelines the shell connects a series of sub-processes via pipes, and executes external commands within each sub-process. Thus the shell itself is doing no direct processing of the data flowing through the pipeline.
However, it's possible for the shell to perform processing directly, using a so-called mill or pipemill (since a
while
command is used to "mill" over the results from the initial command). This construct generally looks something like:
command , while read -r var1 var2 ...; do
# process each line, using variables as parsed into var1, var2, etc
# (note that this may be a subshell: var1, var2 etc will not be available
# after the while loop terminates; some shells, such as zsh and newer
# versions of Korn shell, process the commands to the left of the pipe
# operator in a subshell)
done
Such pipemill may not perform as intended if the body of the loop includes commands, such as
cat
and
ssh
, that read from
stdin
: on the loop's first iteration, such a program (let's call it ''the drain'') will read the remaining output from
command
, and the loop will then terminate (with results depending on the specifics of the drain). There are a couple of possible ways to avoid this behavior. First, some drains support an option to disable reading from
stdin
(e.g.
ssh -n
). Alternatively, if the drain does not ''need'' to read any input from
stdin
to do something useful, it can be given
< /dev/null
as input.
As all components of a pipe are run in parallel, a shell typically forks a subprocess (a subshell) to handle its contents, making it impossible to propagate variable changes to the outside shell environment. To remedy this issue, the "pipemill" can instead be fed from a
here document containing a
command substitution, which waits for the pipeline to finish running before milling through the contents. Alternatively, a
named pipe or a
process substitution can be used for parallel execution.
GNU bash also has a option to disable forking for the last pipe component.
Creating pipelines programmatically
Pipelines can be created under program control. The Unix
pipe()
system call
In computing, a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services (for example, acc ...
asks the operating system to construct a new
anonymous pipe
In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication (IPC). An implementation is often integrated into the operating system's file IO subsystem. Typically a paren ...
object. This results in two new, opened file descriptors in the process: the read-only end of the pipe, and the write-only end. The pipe ends appear to be normal, anonymous
file descriptors, except that they have no ability to seek.
To avoid
deadlock and exploit parallelism, the Unix process with one or more new pipes will then, generally, call
fork()
to create new processes. Each process will then close the end(s) of the pipe that it will not be using before producing or consuming any data. Alternatively, a process might create new
threads
Thread may refer to:
Objects
* Thread (yarn), a kind of thin yarn used for sewing
** Thread (unit of measurement), a cotton yarn measure
* Screw thread, a helical ridge on a cylindrical fastener
Arts and entertainment
* ''Thread'' (film), 2016 ...
and use the pipe to communicate between them.
''
Named pipes'' may also be created using
mkfifo()
or
mknod()
and then presented as the input or output file to programs as they are invoked. They allow multi-path pipes to be created, and are especially effective when combined with standard error redirection, or with
tee
.
Implementation
In most Unix-like systems, all processes of a pipeline are started at the same time, with their streams appropriately connected, and managed by the
scheduler together with all other processes running on the machine. An important aspect of this, setting Unix pipes apart from other pipe implementations, is the concept of
buffering: for example a sending program may produce 5000
bytes per
second, and a receiving program may only be able to accept 100 bytes per second, but no data is lost. Instead, the output of the sending program is held in the buffer. When the receiving program is ready to read data, the next program in the pipeline reads from the buffer. If the buffer is filled, the sending program is stopped (blocked) until at least some data is removed from the buffer by the receiver. In Linux, the size of the buffer is 65,536 bytes (64KiB). An open source third-party filter calle
bfris available to provide larger buffers if required.
Network pipes
Tools like
netcat and
socat can connect pipes to TCP/IP
sockets.
History
The pipeline concept was invented by
Douglas McIlroy and first described in the
man pages of
Version 3 Unix.
McIlroy noticed that much of the time
command shells passed the output file from one program as input to another.
His ideas were implemented in 1973 when ("in one feverish night", wrote McIlroy)
Ken Thompson added the
pipe()
system call and pipes to the shell and several utilities in Version 3 Unix. "The next day", McIlroy continued, "saw an unforgettable orgy of one-liners as everybody joined in the excitement of plumbing." McIlroy also credits Thompson with the
,
notation, which greatly simplified the description of pipe syntax in
Version 4.
Although developed independently, Unix pipes are related to, and were preceded by, the 'communication files' developed by Ken Lochner in the 1960s for the
Dartmouth Time Sharing System.
In
Tony Hoare's communicating sequential processes (CSP) McIlroy's pipes are further developed.
[https://swtch.com/~rsc/thread/ Bell Labs and CSP Threads (Russ Cox)]
The robot in the icon for
Apple
An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus '' Malus''. The tree originated in Central Asia, where its wild ances ...
's
Automator, which also uses a pipeline concept to chain repetitive commands together, holds a pipe in homage to the original Unix concept.
Other operating systems
This feature of
Unix
Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
was borrowed by other operating systems, such as
MS-DOS
MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few oper ...
and the
CMS Pipelines
CMS Pipelines is a feature of the VM/CMS operating system that allows one to create and use a pipeline. The programs in a pipeline operate on a sequential stream of records. A program writes records that are read by the next program in the pipeli ...
package on
VM/CMS and
MVS, and eventually came to be designated the
pipes and filters design pattern of
software engineering
Software engineering is a systematic engineering approach to software development.
A software engineer is a person who applies the principles of software engineering to design, develop, maintain, test, and evaluate computer software. The term ' ...
.
See also
*
Everything is a file – describes one of the defining features of Unix; pipelines act on "files" in the Unix sense
*
Anonymous pipe
In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication (IPC). An implementation is often integrated into the operating system's file IO subsystem. Typically a paren ...
– a FIFO structure used for interprocess communication
*
GStreamer – a pipeline-based multimedia framework
*
CMS Pipelines
CMS Pipelines is a feature of the VM/CMS operating system that allows one to create and use a pipeline. The programs in a pipeline operate on a sequential stream of records. A program writes records that are read by the next program in the pipeli ...
*
Iteratee In functional programming, an iteratee is a composable abstraction for incrementally processing sequentially presented chunks of input data in a purely functional fashion. With iteratees, it is possible to lazily transform how a resource will em ...
*
Named pipe – persistent pipes used for interprocess communication
*
Process substitution — shell syntax for connecting multiple pipes to a process
*
GNU parallel
GNU parallel is a command-line driven utility for Linux and other Unix-like operating systems which allows the user to execute shell scripts or commands in parallel. GNU parallel is free software, written by Ole Tange in Perl. It is available un ...
*
Pipeline (computing) – other computer-related pipelines
*
Redirection (computing)
*
Tee (command) – a general command for tapping data from a pipeline
*
XML pipeline
In software, an XML pipeline is formed when XML (Extensible Markup Language) processes, especially XML transformations and XML validations, are connected.
For instance, given two transformations T1 and T2, the two can be connected so that an in ...
– for processing of XML files
*
xargs
xargs (short for "extended arguments" ) is a command on Unix and most Unix-like operating systems used to build and execute commands from standard input. It converts input from standard input into arguments to a command.
Some commands such as gr ...
References
*
Sal Soghoian on
MacBreak
''MacBreak'' was an internet television show hosted by Leo Laporte, Kendra Arimoto, Alex Lindsay, iJustine and Emery Wells from TWiT.tv and the Pixel Corps. The podcast was dedicated to Apple's Macintosh computers and other Apple products such ...
Episode 3 "Enter the Automatrix"
External links
History of Unix pipe notation*
Doug McIlroy’s original 1964 memo proposing the concept of a pipe for the first time
*{{man, sh, pipe, SUS, create an interprocess channel
by The Linux Information Project (LINFO)
Unix Pipes – powerful and elegant programming paradigm (Softpanorama)''Ad Hoc Data Analysis From The Unix Command Line'' at Wikibooks– Shows how to use pipelines composed of simple filters to do complex data analysis.
Use And Abuse Of Pipes With Audio Data– Gives an introduction to using and abusing pipes with netcat, nettee and fifos to play audio across a network.
stackoverflow.com– A Q&A about bash pipeline handling.
Inter-process communication
Unix
sv:Vertikalstreck#Datavetenskap