Sed In The ''Lancet'' Studies
   HOME

TheInfoList



OR:

sed ("stream editor") is a
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by
Lee E. McMahon Lee Edward McMahon (October 24, 1931–February 15, 1989) was an American computer scientist. __TOC__ Family and education McMahon was born in St. Louis, Missouri, to father Leo E. McMahon and mother Catherine McCarthy. He grew up in St. Louis a ...
of
Bell Labs Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984), then AT&T Bell Laboratories (1984–1996) and Bell Labs Innovations (1996–2007), is an American industrial research and scientific development company owned by mult ...
, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed ("editor", 1971) and the earlier qed ("quick editor", 1965–66). It was one of the earliest tools to support
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include
AWK AWK (''awk'') is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems. The AWK lang ...
and
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
.


History

First appearing in
Version 7 Unix Seventh Edition Unix, also called Version 7 Unix, Version 7 or just V7, was an important early release of the Unix operating system. V7, released in 1979, was the last Bell Laboratories release to see widespread distribution before the commercial ...
, sed is one of the early Unix commands built for command line processing of data files. It evolved as the natural successor to the popular
grep grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command ''g/re/p'' (''globally search for a regular expression and print matching lines''), which has the sam ...
command. The original motivation was an analogue of grep (g/re/p) for substitution, hence "g/re/s". Foreseeing that further special-purpose programs for each command would also arise, such as g/re/d, McMahon wrote a general-purpose line-oriented stream editor, which became sed. The syntax for sed, notably the use of / for
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
, and s/// for substitution, originated with ed, the precursor to sed, which was in common use at the time, and the regular expression syntax has influenced other languages, notably
ECMAScript ECMAScript (; ES) is a JavaScript standard intended to ensure the interoperability of web pages across different browsers. It is standardized by Ecma International in the documenECMA-262 ECMAScript is commonly used for client-side scripting o ...
and
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
. Later, the more powerful language
AWK AWK (''awk'') is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems. The AWK lang ...
developed, and these functioned as cousins, allowing powerful text processing to be done by
shell script A shell script is a computer program designed to be run by a Unix shell, a command-line interpreter. The various dialects of shell scripts are considered to be scripting languages. Typical operations performed by shell scripts include file manip ...
s. sed and AWK are often cited as progenitors and inspiration for Perl, and influenced Perl's syntax and semantics, notably in the matching and substitution operators.
GNU GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
sed added several new features, including
in-place editing In computer science, an in-place algorithm is an algorithm which transforms input using no auxiliary data structure. However, a small amount of extra storage space is allowed for auxiliary variables. The input is usually overwritten by the outpu ...
of files. ''Super-sed'' is an extended version of sed that includes regular expressions compatible with
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
. Another variant of sed is ''minised'', originally reverse-engineered from 4.1BSD sed by
Eric S. Raymond Eric Steven Raymond (born December 4, 1957), often referred to as ESR, is an American software developer, open-source software advocate, and author of the 1997 essay and 1999 book ''The Cathedral and the Bazaar''. He wrote a guidebook for the ...
and currently maintained by René Rebe. minised was used by the
GNU Project The GNU Project () is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and computing devices by collaborati ...
until the GNU Project wrote a new version of sed based on the new GNU regular expression library. The current minised contains some extensions to BSD sed but is not as
feature-rich In software, the term feature has several definitions. The Institute of Electrical and Electronics Engineers defines the term ''feature'' in IEEE 829 as " distinguishing characteristic of a software item (e.g., performance, portability, or functio ...
as GNU sed. Its advantage is that it is very fast and uses little memory. It is used on embedded systems and is the version of sed provided with Minix.


Mode of operation

sed is a line-oriented text processing utility: it reads text, line by line, from an
input stream In computer science, a stream is a sequence of data elements made available over time. A stream can be thought of as items on a conveyor belt being processed one at a time rather than in large batches. Streams are processed differently from ...
or file, into an internal buffer called the ''pattern space''. Each line read starts a ''cycle''. To the pattern space, sed applies one or more operations which have been specified via a ''sed script''. sed implements a
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
with about 25 ''commands'' that specify the operations on the text. For each input line, after running the script, sed ordinarily outputs the pattern space (the line as modified by the script) and begins the cycle again with the next line. Other end-of-script behaviors are available through sed options and script commands, e.g. d to delete the pattern space, q to quit, N to add the next line to the pattern space immediately, and so on. Thus a sed script corresponds to the body of a loop that iterates through the lines of a stream, where the loop itself and the loop variable (the current line number) are implicit and maintained by sed. The sed script can either be specified on the
command line A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
(-e option) or read from a separate file (-f option). Commands in the sed script may take an optional ''address,'' in terms of line numbers or
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s. The address determines when the command is run. For example, 2d would only run the d (delete) command on the second input line (printing all lines but the second), while /^ /d would delete all lines beginning with a space. A separate special buffer, the ''hold space'', may be used by a few sed commands to hold and accumulate text between cycles. sed's command language has only two variables (the "hold space" and the "pattern space") and
GOTO GoTo (goto, GOTO, GO TO or other case combinations, depending on the programming language) is a statement found in many computer programming languages. It performs a one-way transfer of control to another line of code; in contrast a function ca ...
-like branching functionality; nevertheless, the language is
Turing-complete In computability theory, a system of data-manipulation rules (such as a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be used to simulate any Tur ...
, and
esoteric Western esotericism, also known as esotericism, esoterism, and sometimes the Western mystery tradition, is a term scholars use to categorise a wide range of loosely related ideas and movements that developed within Western society. These ideas a ...
sed scripts exist for games such as
sokoban is a puzzle video game in which the player pushes boxes around in a warehouse, trying to get them to storage locations. The game was designed in 1981 by Hiroyuki Imabayashi, and first published in December 1982. Gameplay The game is played on a ...
,
arkanoid is a 1986 block breaker arcade game developed and published by Taito. In North America, it was published by Romstar. Controlling a paddle-like craft known as the Vaus, the player is tasked with clearing a formation of colorful blocks by deflect ...
,
chess Chess is a board game for two players, called White and Black, each controlling an army of chess pieces in their color, with the objective to checkmate the opponent's king. It is sometimes called international chess or Western chess to disti ...
, and
tetris ''Tetris'' (russian: link=no, Тетрис) is a puzzle video game created by Soviet software engineer Alexey Pajitnov in 1984. It has been published by several companies for multiple platforms, most prominently during a dispute over the approp ...
. A
main loop In computer science, the event loop is a programming construct or design pattern that waits for and dispatches events or messages in a program. The event loop works by making a request to some internal or external "event provider" (that generally ...
executes for each line of the input stream, evaluating the sed script on each line of the input. Lines of a sed script are each a pattern-action pair, indicating what pattern to match and which action to perform, which can be recast as a conditional statement. Because the main loop, working variables (pattern space and hold space), input and output streams, and default actions (copy line to pattern space, print pattern space) are implicit, it is possible to write terse
one-liner program In computer programming, a one-liner program originally was textual input to the command-line of an operating system shell that performed some function in just one line of input. In the present day, a one-liner can be * an expression written in t ...
s. For example, the sed program given by: 10q will print the first 10 lines of input, then stop.


Usage


Substitution command

The following example shows a typical, and the most common, use of sed: substitution. This usage was indeed the original motivation for sed: sed 's/regexp/replacement/g' inputFileName > outputFileName In some versions of sed, the expression must be preceded by -e to indicate that an expression follows. The s stands for substitute, while the g stands for global, which means that all matching occurrences in the line would be replaced. The
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
(i.e. pattern) to be searched is placed after the first delimiting symbol (slash here) and the replacement follows the second symbol. Slash (/) is the conventional symbol, originating in the character for "search" in ed, but any other could be used to make syntax more readable if it does not occur in the pattern or replacement; this is useful to avoid "
leaning toothpick syndrome In computer programming, leaning toothpick syndrome (LTS) is the situation in which a quoted expression becomes unreadable because it contains a large number of escape characters, usually backslashes ("\"), to avoid delimiter collision. The officia ...
". The substitution command, which originates in search-and-replace in ed, implements simple parsing and templating. The regexp provides both pattern matching and saving text via sub-expressions, while the replacement can be either literal text, or a format string containing the characters & for "entire match" or the special
escape sequence In computer science, an escape sequence is a combination of characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding (and possibly terminating) characters. Examples * In C and man ...
s \1 through \9 for the ''n''th saved sub-expression. For example, sed -r "s/(cat, dog)s?/\1s/g" replaces all occurrences of "cat" or "dog" with "cats" or "dogs", without duplicating an existing "s": (cat, dog) is the 1st (and only) saved sub-expression in the regexp, and \1 in the format string substitutes this into the output.


Other sed commands

Besides substitution, other forms of simple processing are possible, using some 25 sed commands. For example, the following uses the ''d'' command to filter out lines that only contain spaces, or only contain the end of line character: sed '/^ *$/d' inputFileName This example uses some of the following
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
metacharacter A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine. In POSIX extended regular expressions, there are 14 metacharacters that must be ''escaped'' (prec ...
s (sed supports the full range of regular expressions): * The
caret Caret is the name used familiarly for the character , provided on most QWERTY keyboards by typing . The symbol has a variety of uses in programming and mathematics. The name "caret" arose from its visual similarity to the original proofreade ...
(^) matches the beginning of the line. * The
dollar sign The dollar sign, also known as peso sign, is a symbol consisting of a capital " S" crossed with one or two vertical strokes ($ or ), used to indicate the unit of various currencies around the world, including most currencies denominated "pes ...
($) matches the end of the line. * The
asterisk The asterisk ( ), from Late Latin , from Ancient Greek , ''asteriskos'', "little star", is a typographical symbol. It is so called because it resembles a conventional image of a heraldic star. Computer scientists and mathematicians often voc ...
(*) matches zero or more occurrences of the previous character. * The
plus Plus may refer to: Mathematics * Addition * +, the mathematical sign Music * ''+'' (Ed Sheeran album), (pronounced "plus"), 2011 * ''Plus'' (Cannonball Adderley Quintet album), 1961 * ''Plus'' (Matt Nathanson EP), 2003 * ''Plus'' (Martin Ga ...
(+) matches one or more occurrence(s) of the previous character. * The
question mark The question mark (also known as interrogation point, query, or eroteme in journalism) is a punctuation mark that indicates an interrogative clause or phrase in many languages. History In the fifth century, Syriac Bible manuscripts used ques ...
(?) matches zero or one occurrence of the previous character. * The dot (.) matches exactly one character. Complex sed constructs are possible, allowing it to serve as a simple, but highly specialized,
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
. Flow of control, for example, can be managed by the use of a
label A label (as distinct from signage) is a piece of paper, plastic film, cloth, metal, or other material affixed to a container or product, on which is written or printed information or symbols about the product or item. Information printed dir ...
(a colon followed by a string) and the branch instruction b, as well as the conditional branch t. An instruction b followed by a valid label name will move processing to the command following that label. The t instruction will only do so if there was a successful substitution since the previous t (or the start of the program, in case of the first t encountered). Additionally, the ); in most cases, it will be conditioned by an address pattern.


sed used as a filter

Under Unix, sed is often used as a
filter Filter, filtering or filters may refer to: Science and technology Computing * Filter (higher-order function), in functional programming * Filter (software), a computer program to process a data stream * Filter (video), a software component tha ...
in a
pipeline Pipeline may refer to: Electronics, computers and computing * Pipeline (computing), a chain of data-processing stages or a CPU optimization found on ** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...
: $ generateData , sed 's/x/y/g' That is, a program such as "generateData" generates data, and then sed makes the small change of replacing ''x'' with ''y''. For example: $ echo xyz xyz , sed 's/x/y/g' yyz yyz In command line use, the quotes around the expression are not required, and are only necessary if the shell would otherwise not interpret the expression as a single word (token). For the script s/x/y/g there is no ambiguity, so generateData , sed s/x/y/g works correctly. However, quotes are usually included for clarity, and are often necessary, notably for whitespace (e.g., 's/x x/y y/'). Most often single quotes are used, to avoid having the shell interpret $ as a shell variable. Double quotes are used, such as "s/$1/$2/g", to allow the shell to substitute for a command line argument or other shell variable.


File-based sed scripts

It is often useful to put several sed commands, one command per line, into a script file such as subst.sed, and then use the -f option to run the commands (such as s/x/y/g) from the file: sed -f subst.sed inputFileName > outputFileName Any number of commands may be placed into the script file, and using a script file also avoids problems with shell escaping or substitutions. Such a script file may be made directly executable from the command line by prepending it with a " shebang line" containing the sed command and assigning the executable permission to the file. For example, a file subst.sed can be created with contents: #!/bin/sed -f s/x/y/g The file may then be made executable by the current user with the chmod command: chmod u+x subst.sed The file may then be executed directly from the command line: subst.sed inputFileName > outputFileName


In-place editing

The -i option, introduced in GNU sed, allows in-place editing of files (actually, a temporary output file is created in the background, and then the original file is replaced by the temporary file). For example: sed -i 's/abc/def/' fileName


Examples


Hello, world! example

# convert input text stream to "Hello, world!" s/.*/Hello, world!/ q This "Hello, world!" script is in a file (e.g., script.txt) and invoked with sed -f script.txt inputFileName, where "inputFileName" is the input text file. The script changes "inputFileName" line #1 to "Hello, world!" and then quits, printing the result before sed exits. Any input lines past line #1 are not read, and not printed. So the sole output is "Hello, world!". The example emphasizes many key characteristics of sed: * Typical sed programs are rather short and simple. * sed scripts can have comments (the line starting with the # symbol). * The s (substitute) command is the most important sed command. * sed allows simple programming, with commands such as q (quit). * sed uses regular expressions, such as .* (zero or more of any character).


Other simple examples

Below follow various sed scripts; these can be executed by passing as an argument to sed, or put in a separate file and executed via -f or by making the script itself executable. To replace any instance of a certain word in a file with "REDACTED", such as an IRC password, and save the result: sed -i s/yourpassword/REDACTED/ ./status.chat.log To delete any line containing the word "yourword" (the ''address'' is '/yourword/'): /yourword/ d To delete all instances of the word "yourword": s/yourword//g To delete two words from a file simultaneously: s/firstword//g s/secondword//g To express the previous example on one line, such as when entering at the command line, one may join two commands via the semicolon: sed "s/firstword//g; s/secondword//g" inputFileName


Multiline processing example

In the next example, sed, which usually only works on one line, removes newlines from sentences where the second line starts with one space. Consider the following text: This is my dog, whose name is Frank. This is my fish, whose name is George. This is my goat, whose name is Adam. The sed script below will turn the text above into the following text. Note that the script affects only the input lines that start with a space: This is my dog, whose name is Frank. This is my fish, whose name is George. This is my goat, whose name is Adam. The script is: N s/\n / / P D This is explained as: * (N) add the next line to the pattern space; * (s/\n / /) find a new line followed by a space, replace with one space; * (P) print the top line of the pattern space; * (D) delete the top line from the pattern space and run the script again. This can be expressed on a single line via semicolons: sed '' inputFileName


Limitations and alternatives

While simple and limited, sed is sufficiently powerful for a large number of purposes. For more sophisticated processing, more powerful languages such as
AWK AWK (''awk'') is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems. The AWK lang ...
or
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
are used instead. These are particularly used if transforming a line in a way more complicated than a regex extracting and template replacement, though arbitrarily complicated transforms are in principle possible by using the hold buffer. Conversely, for simpler operations, specialized Unix utilities such as
grep grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command ''g/re/p'' (''globally search for a regular expression and print matching lines''), which has the sam ...
(print lines matching a pattern),
head A head is the part of an organism which usually includes the ears, brain, forehead, cheeks, chin, eyes, nose, and mouth, each of which aid in various sensory functions such as sight, hearing, smell, and taste. Some very simple animals may ...
(print the first part of a file),
tail The tail is the section at the rear end of certain kinds of animals’ bodies; in general, the term refers to a distinct, flexible appendage to the torso. It is the part of the body that corresponds roughly to the sacrum and coccyx in mammals, r ...
(print the last part of a file), and tr (translate or delete characters) are often preferable. For the specific tasks they are designed to carry out, such specialized utilities are usually simpler, clearer, and faster than a more general solution such as sed. The ed/sed commands and syntax continue to be used in descendent programs, such as the text editors vi and vim. An analog to ed/sed is sam/ssam, where sam is the Plan 9 editor, and ssam is a stream interface to it, yielding functionality similar to sed.


See also

*
List of Unix commands This is a list of Unix commands as specified by IEEE Std 1003.1-2008, which is part of the Single UNIX Specification (SUS). These commands can be found on Unix operating systems and most Unix-like operating systems. List See also * List of G ...
*
Turing tarpit A Turing tarpit (or Turing tar-pit) is any programming language or computer interface that allows for flexibility in function but is difficult to learn and use because it offers little or no support for common tasks. The phrase was coined in 1982 ...


Notes


References


Further reading


Bell Lab's Eighth Edition (circa 1985) Unix sed(1) manual page

GNU sed documentation
o
the manual page
* * * *

the sed FAQ (March, 2003)


External links

* *

by Bruce Barnett * (includes manual) * * {{Authority control Cross-platform software Pattern matching programming languages Scripting languages Standard Unix programs Text-oriented programming languages Unix text processing utilities Unix SUS2008 utilities Plan 9 commands IBM i Qshell commands Console applications Programming languages created in 1974