HOME

TheInfoList



OR:

wildmat is a
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
library developed by
Rich Salz InterNetNews (INN) is a Usenet news server package, originally released by Rich Salz in 1991, and presented at the Summer 1992 USENIX conference in San Antonio, Texas. It was the first news server with integrated NNTP functionality. While prev ...
. Based on the wildcard syntax already used in the
Bourne shell The Bourne shell (sh) is a Shell (computing), shell Command-line interface#Command-line interpreter, command-line interpreter for computer operating systems. The Bourne shell was the default Unix shell, shell for Version 7 Unix. Unix-like syste ...
, wildmat provides a uniform mechanism for matching patterns across applications with simpler syntax than that typically offered by
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s. Patterns are implicitly anchored at the beginning and end of each string when testing for a match. In June 2019, Rich Salz released the original version of the now-defunct library on
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
under a
public domain The public domain (PD) consists of all the creative work A creative work is a manifestation of creative effort including fine artwork (sculpture, paintings, drawing, sketching, performance art), dance, writing (literature), filmmaking, ...
dedication.


Pattern matching operations

There are five pattern matching operations other than a strict one-to-one match between the pattern and the source to be checked for a match. * Asterisk ( *) to match any sequence of zero or more characters. * Question mark ( ?) to match any single character. *
Set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
of specified characters. It is specified as a list of characters, or as a range of characters where the beginning and end of the range are separated by a minus (or dash) character, or as any combination of lists and ranges. The dash can also be included in the set as a character if it is the beginning or end of the set. This set is enclosed in square brackets. The close square bracket (]) may be used in a set if it is the first character in the set. *
Negation In logic, negation, also called the logical complement, is an operation that takes a proposition P to another proposition "not P", written \neg P, \mathord P or \overline. It is interpreted intuitively as being true when P is false, and false ...
of a set. It is specified the same way as the set with the addition of a caret character (^) at the beginning of the test string just inside the open square bracket. (NNTP specifies an alternative !. The implementation can be configured to do either.) * Backslash ( \) character to invalidate the special meaning of the open square bracket ([), the asterisk, backslash or the question mark. Two backslashes in sequence will result in the evaluation of the backslash as a character with no special meaning.


Examples

* ''*foo*'' matches string containing "foo". * ''mini*'' matches anything that begins with "mini" (including the string "mini" itself). * ''???*'' matches any string of three and more letters. * ''[0-9a-zA-Z]'' matches every single alphanumeric ASCII character. * ''[^]-]'' matches a character other than a close square bracket or a dash.


Usage

wildmat is most commonly seen in
NNTP The Network News Transfer Protocol (NNTP) is an application protocol used for transporting Usenet news articles (''netnews'') between news servers, and for reading/posting articles by the end user client applications. Brian Kantor of the Univers ...
implementations such as Salz's own
INN Inns are generally establishments or buildings where travelers can seek lodging, and usually, food and drink. Inns are typically located in the country or along a highway; before the advent of motorized transportation they also provided accommo ...
, also in unrelated software such as
GNU GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
tar and
Transmission Transmission may refer to: Medicine, science and technology * Power transmission ** Electric power transmission ** Propulsion transmission, technology allowing controlled application of power *** Automatic transmission *** Manual transmission *** ...
. GNU tar replaced wildmat with the POSIX fnmatch glob matcher in September 1992. The early version contained a potential out-of-bound access on unclosed . The original byte oriented wildmat implementation is unable to handle multibyte character sets, and poses problems when the text being searched may contain multiple incompatible character sets. A simplified version of wildmat oriented toward
UTF-8 UTF-8 is a variable-width encoding, variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit'' ...
encoding has been developed by the
IETF The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements and a ...
NNTP working group. It is a part of (section 4), the 2006 standard for NNTP. In the newer INN which supports UTF-8, a "uwildmat" was added which supports all the features of wildmat. This 2000 rewrite, performed by Russ Allbery, fixes the OOB in the original implementation. Tightly-wound C loops were written out into smaller statements.
Rsync rsync is a utility for efficiently transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. It is commonly found on Unix-like operat ...
includes a GPLv3-licensed wildmat descendant known as wildmatch, modified by Wayne Davison. The
Git Git () is a distributed version control system: tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data inte ...
version control system imports and makes use of it. It does not support UTF-8, but has the OOB fixed and has additional support for character classes and star globs (** for arbitrary-depth).


See also

*
glob (programming) In computer programming, glob () patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txt textfiles/ moves (mv) all files with names ending in .txt from the current directory to the directory ...
*
Kleene star In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid c ...
*
Matching wildcards In computer science, an algorithm for matching wildcards (also known as globbing) is useful in comparing text strings that may contain wildcard syntax. Common uses of these algorithms include command-line interfaces, e.g. the Bourne shell or Micro ...


References


External links

* * {{cite newsgroup, author=Rich Salz, newsgroup=comp.sources.misc, title=v17i034: wildmat - a /bin/sh-style pattern matcher, Part01/01, date=March 9, 1991, message-id=1991Mar9.044016.2409@sparky.IMD.Sterling.COM, url=https://groups.google.com/forum/#!topic/comp.sources.misc/ZzYKqnKCpf4 Pattern matching