History
Early programming languages with pattern matching constructs includePrimitive patterns
The simplest pattern in pattern matching is an explicit value or a variable. For an example, consider a simple function definition in Haskell syntax (function parameters are not in parentheses but are separated by spaces, = is not assignment but definition):n
is a single variable pattern, which will match absolutely any argument and bind it to name n to be used in the rest of the definition. In Haskell (unlike at least n * f (n-1)
with n being the argument.
The wildcard pattern (often written as _
) is also simple: like a variable name, it matches any value, but does not bind the value to any name. Algorithms for Tree patterns
More complex patterns can be built from the primitive ones of the previous section, usually in the same way as values are built by combining other values. The difference then is that with variable and wildcard parts, a pattern doesn't build into a single value, but matches a group of values that are the combination of the concrete elements and the elements that are allowed to vary within the structure of the pattern. A tree pattern describes a part of a tree by starting with a node and specifying some branches and nodes and leaving some unspecified with a variable or wildcard pattern. It may help to think of theColor
that has a single data constructor ColorConstructor
that wraps an integer and a string.
Color
an Color
.
If we pass a variable that is of type Color, how can we get the data out of this variable? For example, for a function to get the integer part of Color
, we can use a simple tree pattern and write:
Filtering data with patterns
Pattern matching can be used to filter data of a certain structure. For instance, in Haskell aPattern matching in Mathematica
In[]
, so that for instance a[b,c]
is a tree with a as the parent, and b and c as the children.
A pattern in Mathematica involves putting "_" at positions in that tree. For instance, the pattern
A
will match elements such as A A or more generally A 'x''where ''x'' is any entity. In this case, A
is the concrete element, while _
denotes the piece of tree that can be varied. A symbol prepended to _
binds the match to that variable name while a symbol appended to _
restricts the matches to nodes of that symbol. Note that even blanks themselves are internally represented as Blank[]
for _
and Blank[x]
for _x
.
The Mathematica function Cases
filters elements of the first argument that match the pattern in the second argument:
a[b[_],_]
above.
In Mathematica, it is also possible to extract structures as they are created in the course of computation, regardless of how or where they appear. The function Trace
can be used to monitor a computation, and return the elements that arise which match a pattern. For example, we can define the fib[_">">ib fib[_
returns a structure that represents the occurrences of the pattern fib[_/code> in the computational structure:
Declarative programming
In symbolic programming languages, it is easy to have patterns as arguments to functions or as elements of data structures. A consequence of this is the ability to use patterns to declaratively make statements about pieces of data and to flexibly instruct functions how to operate.
For instance, the Mathematica
Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimizat ...
function Compile
can be used to make more efficient versions of the code. In the following example the details do not particularly matter; what matters is that the subexpression
instructs Compile
that expressions of the form com[_]
can be assumed to be integers for the purposes of compilation:
com[i_] := Binomial[2i, i]
Compile[, x^com[i], ]
Mailboxes in Erlang programming language, Erlang also work this way.
The Curry–Howard correspondence
In programming language theory and proof theory, the Curry–Howard correspondence (also known as the Curry–Howard isomorphism or equivalence, or the proofs-as-programs and propositions- or formulae-as-types interpretation) is the direct relati ...
between proofs and programs relates ML-style pattern matching to case analysis and proof by exhaustion
Proof by exhaustion, also known as proof by cases, proof by case analysis, complete induction or the brute force method, is a method of mathematical proof in which the statement to be proved is split into a finite number of cases or sets of equiv ...
.
Pattern matching and strings
By far the most common form of pattern matching involves strings of characters. In many programming languages, a particular syntax of strings is used to represent regular expressions, which are patterns describing string characters.
However, it is possible to perform some string pattern matching within the same framework that has been discussed throughout this article.
Tree patterns for strings
In Mathematica, strings are represented as trees of root StringExpression and all the characters in order as children of the root. Thus, to match "any amount of trailing characters", a new wildcard ___ is needed in contrast to _ that would match only a single character.
In Haskell and functional programming languages in general, strings are represented as functional lists
A ''list'' is any set of items in a row. List or lists may also refer to:
People
* List (surname)
Organizations
* List College, an undergraduate division of the Jewish Theological Seminary of America
* SC Germania List, German rugby unio ...
of characters. A functional list is defined as an empty list, or an element constructed on an existing list. In Haskell syntax:
[] -- an empty list
x:xs -- an element x constructed on a list xs
The structure for a list with some elements is thus element:list
. When pattern matching, we assert that a certain piece of data is equal to a certain pattern. For example, in the function:
head (element:list) = element
We assert that the first element of head
's argument is called element, and the function returns this. We know that this is the first element because of the way lists are defined, a single element constructed onto a list. This single element must be the first. The empty list would not match the pattern at all, as an empty list does not have a head (the first element that is constructed).
In the example, we have no use for list
, so we can disregard it, and thus write the function:
head (element:_) = element
The equivalent Mathematica transformation is expressed as
head lement, =element
Example string patterns
In Mathematica, for instance,
StringExpression a",_
will match a string that has two characters and begins with "a".
The same pattern in Haskell:
a', _
Symbolic entities can be introduced to represent many different classes of relevant features of a string. For instance,
StringExpression etterCharacter, DigitCharacter
will match a string that consists of a letter first, and then a number.
In Haskell, guards could be used to achieve the same matches:
etter, digit, isAlpha letter && isDigit digit
The main advantage of symbolic string manipulation is that it can be completely integrated with the rest of the programming language, rather than being a separate, special purpose subunit. The entire power of the language can be leveraged to build up the patterns themselves or analyze and transform the programs that contain them.
SNOBOL
SNOBOL (''StriNg Oriented and symBOlic Language'') is a computer programming language developed between 1962 and 1967 at AT&T
AT&T Inc. is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the world's largest telecommunications company by revenue and the third largest provider of mobile tel ...
Bell Laboratories
Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984),
then AT&T Bell Laboratories (1984–1996)
and Bell Labs Innovations (1996–2007),
is an American industrial research and scientific development company owned by mult ...
by David J. Farber
David J. Farber (born April 17, 1934) is a professor of computer science, noted for his major contributions to programming languages and computer networking. He is currently the Distinguished Professor and Co-Director of Cyber Civilization Res ...
, Ralph E. Griswold and Ivan P. Polonsky
Ivan () is a Slavic male given name, connected with the variant of the Greek name (English: John) from Hebrew meaning 'God is gracious'. It is associated worldwide with Slavic countries. The earliest person known to bear the name was Bulgari ...
.
SNOBOL4 stands apart from most programming languages by having patterns as a first-class data type
In programming language design, a first-class citizen (also type, object, entity, or value) in a given programming language is an entity which supports all the operations generally available to other entities. These operations typically include ...
(''i.e.'' a data type whose values can be manipulated in all ways permitted to any other data type in the programming language) and by providing operators for pattern concatenation
In formal language, formal language theory and computer programming, string concatenation is the operation of joining character string (computer science), character strings wikt:end-to-end, end-to-end. For example, the concatenation of "sno ...
and alternation. Strings generated during execution can be treated as programs and executed.
SNOBOL was quite widely taught in larger US universities in the late 1960s and early 1970s and was widely used in the 1970s and 1980s as a text manipulation language in the humanities
Humanities are academic disciplines that study aspects of human society and culture. In the Renaissance, the term contrasted with divinity and referred to what is now called classics, the main area of secular study in universities at the t ...
.
Since SNOBOL's creation, newer languages such as Awk
AWK (''awk'') is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.
The AWK langu ...
and Perl
Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
have made string manipulation by means of regular expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s fashionable. SNOBOL4 patterns, however, subsume BNF grammars, which are equivalent to context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
s and more powerful than regular expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s.Gimpel, J. F. 1973. A theory of discrete patterns and their implementation in SNOBOL4. Commun. ACM 16, 2 (Feb. 1973), 91–100. DOI=http://doi.acm.org/10.1145/361952.361960.
See also
* AIML
The All-India Muslim League (AIML) was a political party established in Dhaka in 1906 when a group of prominent Muslim politicians met the Viceroy of British India, Lord Minto, with the goal of securing Muslim interests on the Indian subcontin ...
for an AI language based on matching patterns in speech
* AWK language
* Coccinelle
Jacqueline Charlotte Dufresnoy (23 August 1931 – 9 October 2006), better known by her stage name Coccinelle, was a French actress, entertainer and singer. She was transgender, and was the first widely publicized post-war gender reassignment ca ...
pattern matches C source code
* Matching wildcards
In computer science, an algorithm for matching wildcards (also known as globbing) is useful in comparing text strings that may contain wildcard syntax. Common uses of these algorithms include command-line interfaces, e.g. the Bourne shell or Micro ...
* glob (programming)
In computer programming, glob () patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txt textfiles/ moves (mv) all files with names ending in .txt from the current directory to the directory ...
* Pattern calculus
Pattern calculus bases all computation on pattern matching of a very general kind. Like lambda calculus, it supports a
uniform treatment of function evaluation. Also, it allows functions to be
passed as arguments and returned as results. In addit ...
* Pattern recognition
Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphi ...
for fuzzy patterns
* PCRE
Perl Compatible Regular Expressions (PCRE) is a library written in C, which implements a regular expression engine, inspired by the capabilities of the Perl programming language. Philip Hazel started writing PCRE in summer 1997. PCRE's syntax i ...
Perl Compatible Regular Expressions, a common modern implementation of string pattern matching ported to many languages
* REBOL parse dialect for pattern matching used to implement language dialects
* Symbolic integration
In calculus, symbolic integration is the problem of finding a formula for the antiderivative, or ''indefinite integral'', of a given function ''f''(''x''), i.e. to find a differentiable function ''F''(''x'') such that
:\frac = f(x).
This is also ...
* Tagged union
In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. O ...
* Tom (pattern matching language)
Tom is a programming language particularly well-suited for programming various transformations on tree structures and XML-based documents. Tom is a language extension which adds new matching primitives to C and Java as well as support for rewri ...
* SNOBOL
SNOBOL ("StriNg Oriented and symBOlic Language") is a series of programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky, culminating in SNOBOL4. It was one of ...
for a programming language based on one kind of pattern matching
* Pattern language
A pattern language is an organized and coherent set of ''patterns'', each of which describes a problem and the core of a solution that can be used in many ways within a specific field of expertise. The term was coined by architect Christopher Alexa ...
— metaphoric, drawn from architecture
* Graph matching
Graph matching is the problem of finding a similarity between graphs.Endika Bengoetxea"Inexact Graph Matching Using Estimation of Distribution Algorithms" Ph.
D., 2002Chapter 2:The graph matching problem(retrieved June 28, 2017)
Graphs are comm ...
References
* The Mathematica Book, chapte
Section 2.3: Patterns
* The Haskell 98 Report, chapte
* Python Reference Manual, chapte
* The Pure
Pure may refer to:
Computing
* A pure function
* A pure virtual function
* PureSystems, a family of computer systems introduced by IBM in 2012
* Pure Software, a company founded in 1991 by Reed Hastings to support the Purify tool
* Pure-FTPd, F ...
Programming Language, chapte
4.3: Patterns
External links
* Nikolaas N. Oosterhof, Philip K. F. Hölzenspies, and Jan Kuper
Application patterns
A presentation at Trends in Functional Programming, 2005
JMatch
the Java programming language
Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let programmers ''write once, run anywh ...
extended with pattern matching
ShowTrend
Online pattern matching for stock prices
by Dennis Ritchie
Dennis MacAlistair Ritchie (September 9, 1941 – October 12, 2011) was an American computer scientist. He is most well-known for creating the C programming language and, with long-time colleague Ken Thompson, the Unix operating system and B p ...
- provides the history of regular expressions in computer programs
The Implementation of Functional Programming Languages, pages 53–103
Simon Peyton Jones, published by Prentice Hall, 1987.
Nemerle, pattern matching
PatMat: a C++ pattern matching library based on
SNOBOL
SNOBOL ("StriNg Oriented and symBOlic Language") is a series of programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky, culminating in SNOBOL4. It was one of ...
/SPITBOL
SPITBOL (Speedy Implementation of SNOBOL) is a compiled implementation of the SNOBOL4 programming language. Originally targeted for the IBM System/360 and System/370 family of computers, it has now been ported to most major microprocessors includi ...
* Temur Kutsia
Flat Matching
Journal of Symbolic Computation 43(12): 858–873. Describes in details flat matching in Mathematica.
pattern matching language for non-programmers
{{DEFAULTSORT:Pattern Matching
Conditional constructs
Articles with example Haskell code
Functional programming