HOME

TheInfoList



OR:

agrep (approximate
grep grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command ''g/re/p'' (''globally search for a regular expression and print matching lines''), which has the sam ...
) is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
approximate string matching In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching i ...
program, developed by
Udi Manber Udi Manber ( he, אודי מנבר) is an Israeli computer scientist. He is one of the authors of agrep and GLIMPSE. After a career in engineering and management, he worked on medical research. Education He earned both his bachelor's degree in 1 ...
and Sun Wu between 1988 and 1991, for use with the
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
operating system. It was later ported to
OS/2 OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 ...
,
DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicat ...
, and
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
. It selects the best-suited algorithm for the current query from a variety of the known fastest (built-in)
string searching algorithm In computer science, string-searching algorithms, sometimes called string-matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger stri ...
s, including Manber and Wu's
bitap algorithm The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm) is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given patter ...
based on
Levenshtein distance In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-charact ...
s. agrep is also the
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
in the indexer program GLIMPSE. agrep is under a free
ISC License The ISC license is a permissive free software license published by the Internet Software Consortium, now called Internet Systems Consortium (ISC). It is functionally equivalent to the simplified BSD and MIT licenses, but without language dee ...
.


Alternative implementations

A more recent agrep is the command-line tool provided with the TRE regular expression library. TRE agrep is more powerful than Wu-Manber agrep since it allows weights and total costs to be assigned separately to individual groups in the pattern. It can also handle Unicode. Unlike Wu-Manber agrep, TRE agrep is licensed under a 2-clause BSD-like license. FREJ (Fuzzy Regular Expressions for Java) open-source library provides command-line interface which could be used in the way similar to agrep. Unlike agrep or TRE it could be used for constructing complex substitutions for matched text. However its syntax and matching abilities differs significantly from ones of ordinary
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s.


See also

*
Bitap algorithm The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm) is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given patter ...
*
TRE (computing) TRE is an open-source library for pattern matching in text, which works like a regular expression engine with the ability to do approximate string matching. It was developed by Ville Laurikari and is distributed under a 2-clause BSD-like license. ...


References

{{Reflist


External links

* Wu-Manber agrep
AGREP home page
** tp://ftp.cs.arizona.edu/agrep/ For Unix (To compile under OSX 10.8, add -Wno-return-type to the CFLAGs = -O line in the Makefile) *See also
TRE regexp matching package

cgrep a defunct command line approximate string matching tool

nrgrep
a command line approximate string matching tool

Information retrieval systems Unix text processing utilities Software using the ISC license