A filter is a
computer program or
subroutine
In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.
Functions may ...
to process a
stream
A stream is a continuous body of water, body of surface water Current (stream), flowing within the stream bed, bed and bank (geography), banks of a channel (geography), channel. Depending on its location or certain characteristics, a stream ...
, producing another stream. While a single filter can be used individually, they are frequently strung together to form a
pipeline
Pipeline may refer to:
Electronics, computers and computing
* Pipeline (computing), a chain of data-processing stages or a CPU optimization found on
** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...
.
Some
operating systems such as
Unix are rich with filter programs.
Windows 7 and later are also rich with filters, as they include
Windows PowerShell. In comparison, however, few filters are built into
cmd.exe (the original
command-line interface
A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
of Windows), most of which have significant enhancements relative to the similar filter commands that were available in
MS-DOS.
OS X
macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
includes filters from its underlying Unix base but also has
Automator, which allows filters (known as "Actions") to be strung together to form a pipeline.
Unix
In
Unix and
Unix-like operating systems, a filter is a program that gets most of its data from its
standard input (the main input stream) and writes its main results to its
standard output (the main output stream). Auxiliary input may come from command line flags or configuration files, while auxiliary output may go to
standard error. The command syntax for getting data from a device or file other than standard input is the input operator (
<
). Similarly, to send data to a device or file other than standard output is the output operator (
>
). To append data lines to an existing output file, one can use the append operator (
>>
). Filters may be strung together into a
pipeline
Pipeline may refer to:
Electronics, computers and computing
* Pipeline (computing), a chain of data-processing stages or a CPU optimization found on
** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...
with the pipe operator ("
,
"). This operator signifies that the main output of the command to the left is passed as main input to the command on the right.
The
Unix philosophy encourages combining small, discrete tools to accomplish larger tasks. The classic filter in Unix is
Ken Thompson's , which
Doug McIlroy cites as what "ingrained the tools outlook irrevocably" in the operating system, with later tools imitating it.
at its simplest prints any lines containing a character string to its output. The following is an example:
cut -d : -f 1 /etc/passwd , grep foo
This finds all registered users that have "
foo" as part of their username by using the
cut command to take the first field (username) of each line of the Unix system password file and passing them all as input to grep, which searches its input for lines containing the character string "foo" and prints them on its output.
Common Unix filter programs are:
cat,
cut,
grep,
head
A head is the part of an organism which usually includes the ears, brain, forehead, cheeks, chin, eyes, nose, and mouth, each of which aid in various sensory functions such as sight, hearing, smell, and taste. Some very simple animals may ...
,
sort
Sort may refer to:
* Sorting, any process of arranging items in sequence or in sets
** Sorting algorithm, any algorithm for arranging elements in lists
** Sort (Unix), a Unix utility which sorts the lines of a file
** Sort (C++), a function in the ...
,
tail, and
uniq.
Programs like
awk and
sed can be used to build quite complex filters because they are fully programmable. Unix filters can also be used by
Data scientists to get a quick overview about a file based dataset.
List of Unix filter programs
*
awk
*
cat
*
comm
The command in the Unix family of computer operating systems is a utility that is used to compare two files for common and distinct lines. is specified in the POSIX standard. It has been widely available on Unix-like operating systems since ...
*
compress
*
cut
*
expand
*
fold
Fold, folding or foldable may refer to:
Arts, entertainment, and media
* ''Fold'' (album), the debut release by Australian rock band Epicure
*Fold (poker), in the game of poker, to discard one's hand and forfeit interest in the current pot
*Above ...
*
grep
*
head
A head is the part of an organism which usually includes the ears, brain, forehead, cheeks, chin, eyes, nose, and mouth, each of which aid in various sensory functions such as sight, hearing, smell, and taste. Some very simple animals may ...
*
nl
*
paste
*
perl
*
pr
*
sed
*
sh
*
sort
Sort may refer to:
* Sorting, any process of arranging items in sequence or in sets
** Sorting algorithm, any algorithm for arranging elements in lists
** Sort (Unix), a Unix utility which sorts the lines of a file
** Sort (C++), a function in the ...
*
split
*
strings
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
*
tac
*
tail
*
tee
A tee is a stand used in sport to support and elevate a stationary ball prior to striking with a foot, club or bat. Tees are used extensively in golf, tee-ball, baseball, American football, and rugby.
Etymology
The word tee is derived from the ...
*
tr
*
uniq
*
wc
*
zcat
gzip is a file format and a software application used for Data compression, file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Uni ...
DOS
Two standard filters from the early days of DOS-based computers are
find and
sort
Sort may refer to:
* Sorting, any process of arranging items in sequence or in sets
** Sorting algorithm, any algorithm for arranging elements in lists
** Sort (Unix), a Unix utility which sorts the lines of a file
** Sort (C++), a function in the ...
.
Examples:
find "keyword" < ''inputfilename'' > ''outputfilename''
sort "keyword" < ''inputfilename'' > ''outputfilename''
find /v "keyword" < ''inputfilename'' , sort > ''outputfilename''
Such filters may be used in
batch files
Batch may refer to:
Food and drink
* Batch (alcohol), an alcoholic fruit beverage
* Batch loaf, a type of bread popular in Ireland
* A dialect term for a bread roll used in North Warwickshire, Nuneaton and Coventry, as well as on the Wirral ...
(*.bat, *.cmd etc.).
For use in the same
command shell environment, there are many more filters available than those built into Windows. Some of these are
freeware, some
shareware
Shareware is a type of proprietary software that is initially shared by the owner for trial use at little or no cost. Often the software has limited functionality or incomplete documentation until the user sends payment to the software developer ...
and some are commercial programs. A number of these mimic the function and features of the filters in Unix. Some filtering programs have a
graphical user interface (GUI) to enable users to design a customized filter to suit their special
data processing
Data processing is the collection and manipulation of digital data to produce meaningful information.
Data processing is a form of ''information processing'', which is the modification (processing) of information in any manner detectable by an ...
and/or
data mining requirements.
Windows
Windows Command Prompt
Command Prompt, also known as cmd.exe or cmd, is the default command-line interpreter for the OS/2, eComStation, ArcaOS, Microsoft Windows (Windows NT family and Windows CE family), and ReactOS operating systems. On Windows CE .NET 4.2, W ...
inherited MS-DOS commands, improved some and added a few. For example,
Windows Server 2003 features six command-line filters for modifying
Active Directory that can be chained by piping: DSAdd, DSGet, DSMod, DSMove, DSRm and DSQuery.
Windows PowerShell adds an entire host of filters known as "cmdlets" which can be chained together with a pipe, except a few simple ones, e.g.
Clear-Screen
. The following example gets a list of files in the
C:\Windows
folder, gets the size of each and sorts the size in ascending order. It shows how three filters (
Get-ChildItem
,
ForEach-Object
and
Sort-Object
) are chained with pipes.
Get-ChildItem C:\Windows , ForEach-Object , Sort-Object -Ascending
References
{{Reflist
External links
* http://www.webopedia.com/TERM/f/filter.html
Software design patterns
Programming paradigms
Operating system technology