Pattern recognition is the automated recognition of
pattern
A pattern is a regularity in the world, in human-made design, or in abstract ideas. As such, the elements of a pattern repeat in a predictable manner. A geometric pattern is a kind of pattern formed of geometric shapes and typically repeated li ...
s and regularities in
data
In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
. It has applications in statistical
data analysis
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, en ...
,
signal processing
Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing '' signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...
,
image analysis
Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as sophi ...
,
information retrieval,
bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
,
data compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
,
computer graphics
Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great deal ...
and
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
, due to the increased availability of
big data and a new abundance of
processing power. These activities can be viewed as two facets of the same field of application, and they have undergone substantial development over the past few decades.
Pattern recognition systems are commonly trained from labeled "training" data. When no
labeled data are available, other algorithms can be used to discover previously unknown patterns.
KDD and data mining have a larger focus on unsupervised methods and stronger connection to business use. Pattern recognition focuses more on the signal and also takes acquisition and
signal processing
Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing '' signals'', such as sound, images, and scientific measurements. Signal processing techniques are used to optimize transmissions, ...
into consideration. It originated in
engineering
Engineering is the use of scientific method, scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The discipline of engineering encompasses a broad rang ...
, and the term is popular in the context of
computer vision
Computer vision is an Interdisciplinarity, interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate t ...
: a leading computer vision conference is named
Conference on Computer Vision and Pattern Recognition.
In
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
, pattern recognition is the assignment of a label to a given input value. In statistics,
discriminant analysis was introduced for this same purpose in 1936. An example of pattern recognition is
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood.
Classification is the grouping of related facts into classes.
It may also refer to:
Business, organizat ...
, which attempts to assign each input value to one of a given set of ''classes'' (for example, determine whether a given email is "spam"). Pattern recognition is a more general problem that encompasses other types of output as well. Other examples are
regression
Regression or regressions may refer to:
Science
* Marine regression, coastal advance due to falling sea level, the opposite of marine transgression
* Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...
, which assigns a
real-valued
In mathematics, value may refer to several, strongly related notions.
In general, a mathematical value may be any definite mathematical object. In elementary mathematics, this is most often a number – for example, a real number such as or an ...
output to each input;
sequence labeling, which assigns a class to each member of a sequence of values (for example,
part of speech tagging, which assigns a
part of speech
In grammar, a part of speech or part-of-speech ( abbreviated as POS or PoS, also known as word class or grammatical category) is a category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are as ...
to each word in an input sentence); and
parsing
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
, which assigns a
parse tree
A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in comp ...
to an input sentence, describing the
syntactic structure of the sentence.
Pattern recognition algorithms generally aim to provide a reasonable answer for all possible inputs and to perform "most likely" matching of the inputs, taking into account their statistical variation. This is opposed to ''
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
'' algorithms, which look for exact matches in the input with pre-existing patterns. A common example of a pattern-matching algorithm is
regular expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
matching, which looks for patterns of a given sort in textual data and is included in the search capabilities of many
text editor
A text editor is a type of computer program that edits plain text. Such programs are sometimes known as "notepad" software (e.g. Windows Notepad). Text editors are provided with operating systems and software development packages, and can be u ...
s and
word processor
A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features.
Word processor (electronic device), Early word processors were stand-alone devices ded ...
s.
Overview
A modern definition of pattern recognition is:
Pattern recognition is generally categorized according to the type of learning procedure used to generate the output value. ''
Supervised learning'' assumes that a set of training data (the
training set
In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from ...
) has been provided, consisting of a set of instances that have been properly labeled by hand with the correct output. A learning procedure then generates a model that attempts to meet two sometimes conflicting objectives: Perform as well as possible on the training data, and generalize as well as possible to new data (usually, this means being as simple as possible, for some technical definition of "simple", in accordance with
Occam's Razor
Occam's razor, Ockham's razor, or Ocham's razor ( la, novacula Occami), also known as the principle of parsimony or the law of parsimony ( la, lex parsimoniae), is the problem-solving principle that "entities should not be multiplied beyond neces ...
, discussed below).
Unsupervised learning, on the other hand, assumes training data that has not been hand-labeled, and attempts to find inherent patterns in the data that can then be used to determine the correct output value for new data instances. A combination of the two that has been explored is
semi-supervised learning, which uses a combination of labeled and unlabeled data (typically a small set of labeled data combined with a large amount of unlabeled data). In cases of unsupervised learning, there may be no training data at all.
Sometimes different terms are used to describe the corresponding supervised and unsupervised learning procedures for the same type of output. The unsupervised equivalent of classification is normally known as ''
clustering'', based on the common perception of the task as involving no training data to speak of, and of grouping the input data into clusters based on some inherent
similarity measure (e.g. the
distance
Distance is a numerical or occasionally qualitative measurement of how far apart objects or points are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two counties over"). ...
between instances, considered as vectors in a multi-dimensional
vector space
In mathematics and physics, a vector space (also called a linear space) is a set whose elements, often called '' vectors'', may be added together and multiplied ("scaled") by numbers called '' scalars''. Scalars are often real numbers, but ...
), rather than assigning each input instance into one of a set of pre-defined classes. In some fields, the terminology is different. In
community ecology
In ecology, a community is a group or association of populations of two or more different species occupying the same geographical area at the same time, also known as a biocoenosis, biotic community, biological community, ecological communit ...
, the term ''classification'' is used to refer to what is commonly known as "clustering".
The piece of input data for which an output value is generated is formally termed an ''instance''. The instance is formally described by a
vector of features, which together constitute a description of all known characteristics of the instance. These feature vectors can be seen as defining points in an appropriate
multidimensional space, and methods for manipulating vectors in
vector space
In mathematics and physics, a vector space (also called a linear space) is a set whose elements, often called '' vectors'', may be added together and multiplied ("scaled") by numbers called '' scalars''. Scalars are often real numbers, but ...
s can be correspondingly applied to them, such as computing the
dot product
In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an alg ...
or the angle between two vectors. Features typically are either
categorical (also known as
nominal, i.e., consisting of one of a set of unordered items, such as a gender of "male" or "female", or a blood type of "A", "B", "AB" or "O"),
ordinal (consisting of one of a set of ordered items, e.g., "large", "medium" or "small"),
integer-valued (e.g., a count of the number of occurrences of a particular word in an email) or
real-valued
In mathematics, value may refer to several, strongly related notions.
In general, a mathematical value may be any definite mathematical object. In elementary mathematics, this is most often a number – for example, a real number such as or an ...
(e.g., a measurement of blood pressure). Often, categorical and ordinal data are grouped together, and this is also the case for integer-valued and real-valued data. Many algorithms work only in terms of categorical data and require that real-valued or integer-valued data be ''discretized'' into groups (e.g., less than 5, between 5 and 10, or greater than 10).
Probabilistic classifiers
Many common pattern recognition algorithms are ''probabilistic'' in nature, in that they use
statistical inference to find the best label for a given instance. Unlike other algorithms, which simply output a "best" label, often probabilistic algorithms also output a
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
of the instance being described by the given label. In addition, many probabilistic algorithms output a list of the ''N''-best labels with associated probabilities, for some value of ''N'', instead of simply a single best label. When the number of possible labels is fairly small (e.g., in the case of
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood.
Classification is the grouping of related facts into classes.
It may also refer to:
Business, organizat ...
), ''N'' may be set so that the probability of all possible labels is output. Probabilistic algorithms have many advantages over non-probabilistic algorithms:
*They output a confidence value associated with their choice. (Note that some other algorithms may also output confidence values, but in general, only for probabilistic algorithms is this value mathematically grounded in
probability theory
Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set o ...
. Non-probabilistic confidence values can in general not be given any specific meaning, and only used to compare against other confidence values output by the same algorithm.)
*Correspondingly, they can ''abstain'' when the confidence of choosing any particular output is too low.
*Because of the probabilities output, probabilistic pattern-recognition algorithms can be more effectively incorporated into larger machine-learning tasks, in a way that partially or completely avoids the problem of ''error propagation''.
Number of important feature variables
Feature selection
In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construc ...
algorithms attempt to directly prune out redundant or irrelevant features. A general introduction to
feature selection
In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construc ...
which summarizes approaches and challenges, has been given. The complexity of feature-selection is, because of its non-monotonous character, an
optimization problem
In mathematics, computer science and economics, an optimization problem is the problem of finding the ''best'' solution from all feasible solutions.
Optimization problems can be divided into two categories, depending on whether the variables ...
where given a total of
features the
powerset
In mathematics, the power set (or powerset) of a set is the set of all subsets of , including the empty set and itself. In axiomatic set theory (as developed, for example, in the ZFC axioms), the existence of the power set of any set is p ...
consisting of all
subsets of features need to be explored. The
Branch-and-Bound algorithm
Branch and bound (BB, B&B, or BnB) is an algorithm algorithmic paradigm, design paradigm for discrete optimization, discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a ...
does reduce this complexity but is intractable for medium to large values of the number of available features
Techniques to transform the raw feature vectors (feature extraction) are sometimes used prior to application of the pattern-matching algorithm.
Feature extraction algorithms attempt to reduce a large-dimensionality feature vector into a smaller-dimensionality vector that is easier to work with and encodes less redundancy, using mathematical techniques such as
principal components analysis
Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
(PCA). The distinction between feature selection and feature extraction is that the resulting features after feature extraction has taken place are of a different sort than the original features and may not easily be interpretable, while the features left after feature selection are simply a subset of the original features.
Problem statement
The problem of pattern recognition can be stated as follows: Given an unknown function
(the ''ground truth'') that maps input instances
to output labels
, along with training data
assumed to represent accurate examples of the mapping, produce a function
that approximates as closely as possible the correct mapping
. (For example, if the problem is filtering spam, then
is some representation of an email and
is either "spam" or "non-spam"). In order for this to be a well-defined problem, "approximates as closely as possible" needs to be defined rigorously. In
decision theory
Decision theory (or the theory of choice; not to be confused with choice theory) is a branch of applied probability theory concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical ...
, this is defined by specifying a
loss function
In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "co ...
or cost function that assigns a specific value to "loss" resulting from producing an incorrect label. The goal then is to minimize the
expected loss, with the expectation taken over the
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
of
. In practice, neither the distribution of
nor the ground truth function
are known exactly, but can be computed only empirically by collecting a large number of samples of
and hand-labeling them using the correct value of
(a time-consuming process, which is typically the limiting factor in the amount of data of this sort that can be collected). The particular loss function depends on the type of label being predicted. For example, in the case of
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood.
Classification is the grouping of related facts into classes.
It may also refer to:
Business, organizat ...
, the simple
zero-one loss function is often sufficient. This corresponds simply to assigning a loss of 1 to any incorrect labeling and implies that the optimal classifier minimizes the
error rate on independent test data (i.e. counting up the fraction of instances that the learned function
labels wrongly, which is equivalent to maximizing the number of correctly classified instances). The goal of the learning procedure is then to minimize the error rate (maximize the
correctness) on a "typical" test set.
For a probabilistic pattern recognizer, the problem is instead to estimate the probability of each possible output label given a particular input instance, i.e., to estimate a function of the form
:
where the
feature vector
In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern ...
input is
, and the function ''f'' is typically parameterized by some parameters
. In a
discriminative approach to the problem, ''f'' is estimated directly. In a
generative
Generative may refer to:
* Generative actor, a person who instigates social change
* Generative art, art that has been created using an autonomous system that is frequently, but not necessarily, implemented using a computer
* Generative music, ...
approach, however, the inverse probability
is instead estimated and combined with the
prior probability
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...
using
Bayes' rule
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For exampl ...
, as follows:
:
When the labels are
continuously distributed (e.g., in
regression analysis
In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
), the denominator involves
integration
Integration may refer to:
Biology
* Multisensory integration
* Path integration
* Pre-integration complex, viral genetic material used to insert a viral genome into a host genome
*DNA integration, by means of site-specific recombinase technolo ...
rather than summation:
:
The value of
is typically learned using
maximum a posteriori (MAP) estimation. This finds the best value that simultaneously meets two conflicting objects: To perform as well as possible on the training data (smallest
error-rate) and to find the simplest possible model. Essentially, this combines
maximum likelihood
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed sta ...
estimation with a
regularization procedure that favors simpler models over more complex models. In a
Bayesian context, the regularization procedure can be viewed as placing a
prior probability
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into ...
on different values of
. Mathematically:
:
where
is the value used for
in the subsequent evaluation procedure, and
, the
posterior probability
The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
of
, is given by
:
In the
Bayesian approach to this problem, instead of choosing a single parameter vector
, the probability of a given label for a new instance
is computed by integrating over all possible values of
, weighted according to the posterior probability:
:
Frequentist or Bayesian approach to pattern recognition
The first pattern classifier – the linear discriminant presented by
Fisher – was developed in the
frequentist tradition. The frequentist approach entails that the model parameters are considered unknown, but objective. The parameters are then computed (estimated) from the collected data. For the linear discriminant, these parameters are precisely the mean vectors and the
covariance matrix. Also the probability of each class
is estimated from the collected dataset. Note that the usage of '
Bayes rule' in a pattern classifier does not make the classification approach Bayesian.
Bayesian statistics
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
has its origin in Greek philosophy where a distinction was already made between the '
a priori
("from the earlier") and ("from the later") are Latin phrases used in philosophy to distinguish types of knowledge, justification, or argument by their reliance on empirical evidence or experience. knowledge is independent from current ex ...
' and the '
a posteriori' knowledge. Later
Kant
Immanuel Kant (, , ; 22 April 1724 – 12 February 1804) was a German philosopher and one of the central Enlightenment thinkers. Born in Königsberg, Kant's comprehensive and systematic works in epistemology, metaphysics, ethics, and aest ...
defined his distinction between what is a priori known – before observation – and the empirical knowledge gained from observations. In a Bayesian pattern classifier, the class probabilities
can be chosen by the user, which are then a priori. Moreover, experience quantified as a priori parameter values can be weighted with empirical observations – using e.g., the
Beta- (
conjugate prior
In Bayesian probability theory, if the posterior distribution p(\theta \mid x) is in the same probability distribution family as the prior probability distribution p(\theta), the prior and posterior are then called conjugate distributions, and t ...
) and
Dirichlet-distributions. The Bayesian approach facilitates a seamless intermixing between expert knowledge in the form of subjective probabilities, and objective observations.
Probabilistic pattern classifiers can be used according to a frequentist or a Bayesian approach.
Uses
Within medical science, pattern recognition is the basis for
computer-aided diagnosis (CAD) systems. CAD describes a procedure that supports the doctor's interpretations and findings. Other typical applications of pattern recognition techniques are automatic
speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ma ...
,
speaker identification,
classification of text into several categories (e.g., spam or non-spam email messages), the
automatic recognition of handwriting on postal envelopes, automatic
recognition of images of human faces, or handwriting image extraction from medical forms. The last two examples form the subtopic
image analysis
Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as sophi ...
of pattern recognition that deals with digital images as input to pattern recognition systems.
Optical character recognition is an example of the application of a pattern classifier. The method of signing one's name was captured with stylus and overlay starting in 1990. The strokes, speed, relative min, relative max, acceleration and pressure is used to uniquely identify and confirm identity. Banks were first offered this technology, but were content to collect from the FDIC for any bank fraud and did not want to inconvenience customers.
Pattern recognition has many real-world applications in image processing. Some examples include:
* identification and authentication: e.g.,
license plate recognition
Automatic number-plate recognition (ANPR; see also other names below) is a technology that uses optical character recognition on images to read vehicle registration plates to create vehicle location data. It can use existing closed-circuit tele ...
, fingerprint analysis,
face detection
Face detection is a computer technology being used in a variety of applications that identifies human faces in digital images. Face detection also refers to the psychological process by which humans locate and attend to faces in a visual scene. ...
/verification;, and
voice-based authentication
Speaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to ''speaker recognition'' or speech recognition. Speaker verification ...
.
* medical diagnosis: e.g., screening for cervical cancer (Papnet), breast tumors or heart sounds;
* defense: various navigation and guidance systems,
target recognition
Automatic target recognition (ATR) is the ability for an algorithm or device to recognize targets or other objects based on data obtained from sensors.
Target recognition was initially done by using an audible representation of the received signal ...
systems, shape recognition technology etc.
* mobility:
advanced driver assistance systems,
autonomous vehicle technology, etc.
In psychology,
pattern recognition is used to make sense of and identify objects, and is closely related to perception. This explains how the sensory inputs humans receive are made meaningful. Pattern recognition can be thought of in two different ways. The first concerns template matching and the second concerns feature detection. A template is a pattern used to produce items of the same proportions. The template-matching hypothesis suggests that incoming stimuli are compared with templates in the long-term memory. If there is a match, the stimulus is identified. Feature detection models, such as the Pandemonium system for classifying letters (Selfridge, 1959), suggest that the stimuli are broken down into their component parts for identification. One observation is a capital E having three horizontal lines and one vertical line.
Algorithms
Algorithms for pattern recognition depend on the type of label output, on whether learning is supervised or unsupervised, and on whether the algorithm is statistical or non-statistical in nature. Statistical algorithms can further be categorized as
generative
Generative may refer to:
* Generative actor, a person who instigates social change
* Generative art, art that has been created using an autonomous system that is frequently, but not necessarily, implemented using a computer
* Generative music, ...
or
discriminative.
Classification methods (methods predicting categorical labels)
Parametric:
*
Linear discriminant analysis
*
Quadratic discriminant analysis
*
Maximum entropy classifier (aka
logistic regression
In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear function (calculus), linear combination of one or more independent var ...
,
multinomial logistic regression
In statistics, multinomial logistic regression is a statistical classification, classification method that generalizes logistic regression to multiclass classification, multiclass problems, i.e. with more than two possible discrete outcomes. T ...
): Note that logistic regression is an algorithm for classification, despite its name. (The name comes from the fact that logistic regression uses an extension of a linear regression model to model the probability of an input being in a particular class.)
Nonparametric:
[No distributional assumption regarding shape of feature distributions per class.]
*
Decision tree
A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains co ...
s,
decision lists
*
Kernel estimation and
K-nearest-neighbor
In statistics, the ''k''-nearest neighbors algorithm (''k''-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regressi ...
algorithms
*
Naive Bayes classifier
*
Neural network
A neural network is a network or neural circuit, circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up ...
s (multi-layer perceptrons)
*
Perceptrons
*
Support vector machines
*
Gene expression programming
Clustering methods (methods for classifying and predicting categorical labels)
*Categorical
mixture models
*
Hierarchical clustering
In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into tw ...
(agglomerative or divisive)
*
K-means clustering
*
Correlation clustering Clustering is the problem of partitioning data points into groups based on their similarity. Correlation clustering provides a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance.
De ...
*
Kernel principal component analysis (Kernel PCA)
Ensemble learning algorithms (supervised meta-algorithms for combining multiple learning algorithms together)
*
Boosting (meta-algorithm)
In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Boosting is based on the q ...
*
Bootstrap aggregating
Bootstrap aggregating, also called bagging (from bootstrap aggregating), is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regress ...
("bagging")
*
Ensemble averaging
In machine learning, particularly in the creation of artificial neural networks, ensemble averaging is the process of creating multiple models and combining them to produce a desired output, as opposed to creating just one model. Frequently an ens ...
*
Mixture of experts Mixture of experts (MoE) refers to a machine learning technique where multiple expert networks (learners) are used to divide a problem space into homogeneous regions. It differs from ensemble techniques in that typically only a few, or 1, expert m ...
,
hierarchical mixture of experts Mixture of experts (MoE) refers to a machine learning technique where multiple expert networks (learners) are used to divide a problem space into homogeneous regions. It differs from ensemble techniques in that typically only a few, or 1, expert mo ...
General methods for predicting arbitrarily-structured (sets of) labels
*
Bayesian networks
*
Markov random fields
Multilinear subspace learning algorithms (predicting labels of multidimensional data using tensor representations)
Unsupervised:
*
Multilinear principal component analysis (MPCA)
Real-valued sequence labeling methods (predicting sequences of real-valued labels)
*
Kalman filters
*
Particle filters
Regression methods (predicting real-valued labels)
*
Gaussian process regression
In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging g ...
(kriging)
*
Linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is ...
and extensions
*
Independent component analysis (ICA)
*
Principal components analysis
Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
(PCA)
Sequence labeling methods (predicting sequences of categorical labels)
*
Conditional random fields (CRFs)
*
Hidden Markov model
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
s (HMMs)
*
Maximum entropy Markov models (MEMMs)
*
Recurrent neural networks (RNNs)
*
Dynamic time warping (DTW)
See also
*
Adaptive resonance theory
*
Black box
In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings. Its implementation is "opaque" (black). The te ...
*
Cache language model
A cache language model is a type of statistical language model. These occur in the natural language processing subfield of computer science and assign probabilities to given sequences of words by means of a probability distribution. Statistical lan ...
*
Compound-term processing Compound-term processing, in information-retrieval, is search result matching on the basis of compound terms. Compound terms are built by combining two or more simple terms; for example, "triple" is a single word term, but "triple heart bypass" is ...
*
Computer-aided diagnosis
*
Data mining
*
Deep Learning
*
Information theory
Information theory is the scientific study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. ...
*
List of numerical-analysis software
*
List of numerical libraries
*
Neocognitron
*
Perception
Perception () is the organization, identification, and interpretation of sensory information in order to represent and understand the presented information or environment. All perception involves signals that go through the nervous system, ...
*
Perceptual learning
*
Predictive analytics
Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modeling, and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events.
In busin ...
*
Prior knowledge for pattern recognition
*
Sequence mining
*
Template matching
*
Contextual image classification Contextual image classification, a topic of pattern recognition in computer vision, is an approach of classification based on contextual information in images. "Contextual" means this approach is focusing on the relationship of the nearby pixels, wh ...
*
List of datasets for machine learning research
References
Further reading
*
*
*
*
*
*
*
An introductory tutorial to classifiers (introducing the basic terms, with numeric example)*Kovalevsky, V. A. (1980). ''Image Pattern Recognition''. New York, NY: Springer New York.
ISBN
The International Standard Book Number (ISBN) is a numeric commercial book identifier that is intended to be unique. Publishers purchase ISBNs from an affiliate of the International ISBN Agency.
An ISBN is assigned to each separate edition an ...
978-1-4612-6033-2.
OCLC
OCLC, Inc., doing business as OCLC, See also: is an American nonprofit cooperative organization "that provides shared technology services, original research, and community programs for its membership and the library community at large". It wa ...
852790446.
External links
The International Association for Pattern RecognitionJournal of Pattern Recognition ResearchPattern Recognition InfoPattern Recognition(Journal of the Pattern Recognition Society)
International Journal of Pattern Recognition and Artificial IntelligenceInternational Journal of Applied Pattern RecognitionOpen Pattern Recognition Project intended to be an open source platform for sharing algorithms of pattern recognition
Improved Fast Pattern MatchingImproved Fast Pattern Matching
{{Authority control
Machine learning
Formal sciences
Computational fields of study