Pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphi ...

is a very active field of research intimately bound to

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

. Also known as classification or

statistical classification In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagno ...

, pattern recognition aims at building a classifier that can determine the class of an input pattern. This procedure, known as training, corresponds to learning an unknown decision function based only on a set of input-output pairs

(\boldsymbol_i,y_i)

that form the training data (or training set). Nonetheless, in real world applications such as

character recognition Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scen ...

, a certain amount of information on the problem is usually known beforehand. The incorporation of this prior knowledge into the training is the key element that will allow an increase of performance in many applications.

Prior Knowledge

Prior knowledgeB. Scholkopf and A. Smola,
Learning with Kernels
, MIT Press 2002. refers to all information about the problem available in addition to the training data. However, in this most general form, determining a

model A model is an informative representation of an object, person or system. The term originally denoted the Plan_(drawing), plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a mea ...

from a finite set of samples without prior knowledge is an

ill-posed The mathematical term well-posed problem stems from a definition given by 20th-century French mathematician Jacques Hadamard. He believed that mathematical models of physical phenomena should have the properties that: # a solution exists, # the sol ...

problem, in the sense that a unique model may not exist. Many classifiers incorporate the general smoothness assumption that a test pattern similar to one of the training samples tends to be assigned to the same class. The importance of prior knowledge in machine learning is suggested by its role in search and optimization. Loosely, the

no free lunch theorem In mathematical folklore, the "no free lunch" (NFL) theorem (sometimes pluralized) of David Wolpert and William Macready appears in the 1997 "No Free Lunch Theorems for Optimization".Wolpert, D.H., Macready, W.G. (1997),No Free Lunch Theorems for ...

states that all search algorithms have the same average performance over all problems, and thus implies that to gain in performance on a certain application one must use a specialized algorithm that includes some prior knowledge about the problem. The different types of prior knowledge encountered in pattern recognition are now regrouped under two main categories: class-invariance and knowledge on the data.

Class-invariance

A very common type of prior knowledge in pattern recognition is the invariance of the class (or the output of the classifier) to a

transformation Transformation may refer to: Science and mathematics In biology and medicine * Metamorphosis, the biological process of changing physical form after birth or hatching * Malignant transformation, the process of cells becoming cancerous * Trans ...

of the input pattern. This type of knowledge is referred to as transformation-invariance. The mostly used transformations used in image recognition are: *

translation Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...

; *

rotation Rotation, or spin, is the circular movement of an object around a '' central axis''. A two-dimensional rotating object has only one possible central axis and can rotate in either a clockwise or counterclockwise direction. A three-dimensional ...

; *

skewing Skew may refer to: In mathematics * Skew lines, neither parallel nor intersecting. * Skew normal distribution, a probability distribution * Skew field or division ring * Skew-Hermitian matrix * Skew lattice * Skew polygon, whose vertices do not ...

; *

scaling Scaling may refer to: Science and technology Mathematics and physics * Scaling (geometry), a linear transformation that enlarges or diminishes objects * Scale invariance, a feature of objects or laws that do not change if scales of length, energ ...

. Incorporating the invariance to a transformation

T_: \boldsymbol \mapsto T_\boldsymbol

parametrized in

\theta

into a classifier of output

f(\boldsymbol)

for an input pattern

\boldsymbol

corresponds to enforcing the equality :

f(\boldsymbol) = f(T_\boldsymbol), \quad \forall \boldsymbol, \theta .

Local invariance can also be considered for a transformation centered at

\theta=0

, so that

T_0\boldsymbol = \boldsymbol

, by using the constraint :

\left.\frac\_ f(T_ \boldsymbol) = 0 .

The function

f

in these equations can be either the decision function of the classifier or its real-valued output. Another approach is to consider class-invariance with respect to a "domain of the input space" instead of a transformation. In this case, the problem becomes finding

f

so that :

f(\boldsymbol) = y_,\ \forall \boldsymbol\in \mathcal ,

where

y_

is the membership class of the region

\mathcal{P}

of the input space. A different type of class-invariance found in pattern recognition is permutation-invariance, i.e. invariance of the class to a permutation of elements in a structured input. A typical application of this type of prior knowledge is a classifier invariant to permutations of rows of the matrix inputs.

Knowledge of the data

Other forms of prior knowledge than class-invariance concern the data more specifically and are thus of particular interest for real-world applications. The three particular cases that most often occur when gathering data are: * Unlabeled samples are available with supposed class-memberships; * Imbalance of the training set due to a high proportion of samples of a class; * Quality of the data may vary from a sample to another. Prior knowledge of these can enhance the quality of the recognition if included in the learning. Moreover, not taking into account the poor quality of some data or a large imbalance between the classes can mislead the decision of a classifier.

Notes

References

* E. Krupka and N. Tishby,
Incorporating Prior Knowledge on Features into Learning
, Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 07) Machine learning Statistical classification