Agrep

	Agrep agrep (approximate grep) is an open-source approximate string matching program, developed by Udi Manber and Sun Wu between 1988 and 1991, for use with the Unix operating system. It was later ported to OS/2, DOS, and Windows. It selects the best-suited algorithm for the current query from a variety of the known fastest (built-in) string searching algorithms, including Manber and Wu's bitap algorithm based on Levenshtein distances. agrep is also the search engine in the indexer program GLIMPSE. agrep is under a free ISC License. Alternative implementations A more recent agrep is the command-line tool provided with the TRE regular expression library. TRE agrep is more powerful than Wu-Manber agrep since it allows weights and total costs to be assigned separately to individual groups in the pattern. It can also handle Unicode. Unlike Wu-Manber agrep, TRE agrep is licensed under a 2-clause BSD-like license. FREJ (Fuzzy Regular Expressions for Java) open-source library provides com ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Grep grep is a command-line utility for searching plain-text data sets for lines that match a regular expression. Its name comes from the ed command ''g/re/p'' (''globally search for a regular expression and print matching lines''), which has the same effect. grep was originally developed for the Unix operating system, but later available for all Unix-like systems and some others such as OS-9. History Before it was named, grep was a private utility written by Ken Thompson to search files for certain patterns. Doug McIlroy, unaware of its existence, asked Thompson to write such a program. Responding that he would think about such a utility overnight, Thompson actually corrected bugs and made improvements for about an hour on his own program called s (short for "search"). The next day he presented the program to McIlroy, who said it was exactly what he wanted. Thompson's account may explain the belief that grep was written overnight. Thompson wrote the first version in PDP-11 asse ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Bitap Algorithm The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm) is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern, where approximate equality is defined in terms of Levenshtein distance if the substring and pattern are within a given distance ''k'' of each other, then the algorithm considers them equal. The algorithm begins by precomputing a set of bitmasks containing one bit for each element of the pattern. Then it is able to do most of the work with bitwise operations, which are extremely fast. The bitap algorithm is perhaps best known as one of the underlying algorithms of the Unix utility agrep, written by Udi Manber, Sun Wu, and Burra Gopal. Manber and Wu's original paper gives extensions of the algorithm to deal with fuzzy matching of general regular expressions. Due to the data structures required by the algorithm, it performs best on pat ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	TRE (computing) TRE is an open-source library for pattern matching in text, which works like a regular expression engine with the ability to do approximate string matching. It was developed by Ville Laurikari and is distributed under a 2-clause BSD-like license. The library is written in C and provides functions which allow using regular expressions for searching over input text lines. The main difference from other regular expression engines is that TRE can match text fragments in an approximate way, that is, supposing that text could have some number of typos. Features TRE uses extended regular expression syntax with the addition of "directions" for matching preceding fragment in approximate way. Each of such directions specifies how many typos are allowed for this fragment. Approximate matching is performed in a way similar to Levenshtein distance, which means that there are three types of typos 'recognized': TRE allows specifying of ''cost'' for each of three typos type independently. T ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Levenshtein Distance In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965. Levenshtein distance may also be referred to as ''edit distance'', although that term may also denote a larger family of distance metrics known collectively as edit distance. It is closely related to pairwise string alignments. Definition The Levenshtein distance between two strings a, b (of length , a, and , b, respectively) is given by \operatorname(a, b) where : \operatorname(a, b) = \begin , a, & \text , b, = 0, \\ , b, & \text , a, = 0, \\ \operatorname\big(\operatorname(a),\operatorname(b)\big) & \text a = b \\ 1 + \min ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Regular Expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory. The concept of regular expressions began in the 1950s, when the American mathematician Stephen Cole Kleene formalized the concept of a regular language. They came into common use with Unix text-processing utilities. Different syntaxes for writing regular expressions have existed since the 1980s, one being the POSIX standard and another, widely used, being the Perl syntax. Regular expressions are used in search engines, in search and replace dialogs of word processors and text editors, in text processing utilities such as sed and AWK, and in lexical ana ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Udi Manber Udi Manber ( he, אודי מנבר) is an Israeli computer scientist. He is one of the authors of agrep and GLIMPSE. After a career in engineering and management, he worked on medical research. Education He earned both his bachelor's degree in 1975 in mathematics and his master's degree in 1978 from the Technion in Israel. At the University of Washington, he earned another master's degree in 1981 and his PhD in computer science in 1982. Career He has won a Presidential Young Investigator Award in 1985, 3 best-paper awards, and the Usenix annual Software Tools User Group Award software award in 1999. Together with Gene Myers he developed the suffix array, a data structure for string matching. He was a professor at the University of Arizona and authored several articles while there, including "Using Induction to Design Algorithms" summarizing his textbook (which remains in print) ''Introduction to Algorithms: A Creative Approach''. He became the chief scientist at Yahoo! in ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Approximate String Matching In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching is typically divided into two sub-problems: finding approximate substring matches inside a given string and finding dictionary strings that match the pattern approximately. Overview The closeness of a match is measured in terms of the number of primitive operations necessary to convert the string into an exact match. This number is called the edit distance between the string and the pattern. The usual primitive operations are: * insertion: ''cot'' → ''coat'' * deletion: ''coat'' → ''cot'' * substitution: ''coat'' → ''cost'' These three operations may be generalized as forms of substitution by adding a NULL character (here symbolized by ) wherever a character has been deleted or inserted: insertion: ''cot'' → ''coat'' de ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others. Initially intended for use inside the Bell System, AT&T licensed Unix to outside parties in the late 1970s, leading to a variety of both academic and commercial Unix variants from vendors including University of California, Berkeley ( BSD), Microsoft ( Xenix), Sun Microsystems (SunOS/ Solaris), HP/ HPE (HP-UX), and IBM ( AIX). In the early 1990s, AT&T sold its rights in Unix to Novell, which then sold the UNIX trademark to The Open Group, an industry consortium founded in 1996. The Open Group allows the use of the mark for certified operating systems that comply with the Single UNIX Specification (SUS). Unix systems are characterized by a modular design that is sometimes called the " Unix philosophy". According to this p ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	String Searching Algorithm In computer science, string-searching algorithms, sometimes called string-matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text. A basic example of string searching is when the pattern and the searched text are arrays of elements of an alphabet (finite set) Σ. Σ may be a human language alphabet, for example, the letters ''A'' through ''Z'' and other applications may use a ''binary alphabet'' (Σ = ) or a ''DNA alphabet'' (Σ = ) in bioinformatics. In practice, the method of feasible string-search algorithm may be affected by the string encoding. In particular, if a variable-width encoding is in use, then it may be slower to find the ''N''th character, perhaps requiring time proportional to ''N''. This may significantly slow some search algorithms. One of many possible solutions is to search for the sequence of code units instead, but doing so may prod ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Unix-like A Unix-like (sometimes referred to as UNX or nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-like application is one that behaves like the corresponding Unix command or shell. Although there are general philosophies for Unix design, there is no technical standard defining the term, and opinions can differ about the degree to which a particular operating system or application is Unix-like. Some well-known examples of Unix-like operating systems include Linux and BSD. These systems are often used on servers, as well as on personal computers and other devices. Many popular applications, such as the Apache web server and the Bash shell, are also designed to be used on Unix-like systems. One of the key features of Unix-like systems is their ability to support multiple users and processes simultaneously. This allows users to run mult ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Open-source Software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open-source software may be developed in a collaborative public manner. Open-source software is a prominent example of open collaboration, meaning any capable user is able to participate online in development, making the number of possible contributors indefinite. The ability to examine the code facilitates public trust in the software. Open-source software development can bring in diverse perspectives beyond those of a single company. A 2008 report by the Standish Group stated that adoption of open-source software models has resulted in savings of about $60 billion per year for consumers. Open source code can be used for studying and allows capable end users to adapt software to their personal needs in a similar way user scripts ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]