statistical classification When classification is performed by a computer, statistical methods are normally used to develop the algorithm. Often, the individual observations are analyzed into a set of quantifiable properties, known variously as explanatory variables or ''f ...

, the Fisher kernel, named after

Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...

, is a function that measures the similarity of two objects on the basis of sets of measurements for each object and a statistical model. In a classification procedure, the class for a new object (whose real class is unknown) can be estimated by minimising, across classes, an average of the Fisher kernel distance from the new object to each known member of the given class. The Fisher kernel was introduced in 1998. It combines the advantages of generative statistical models (like the

hidden Markov model A hidden Markov model (HMM) is a Markov model in which the observations are dependent on a latent (or ''hidden'') Markov process (referred to as X). An HMM requires that there be an observable process Y whose outcomes depend on the outcomes of X ...

) and those of discriminative methods (like

support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborato ...

s): * generative models can process data of variable length (adding or removing data is well-supported) * discriminative methods can have flexible criteria and yield better results.

Derivation

Fisher score

The Fisher kernel makes use of the Fisher

score SCORE may refer to: *SCORE (software), a music scorewriter program * SCORE (television), a weekend sports service of the defunct Financial News Network *SCORE! Educational Centers *SCORE International, an offroad racing organization *Sarawak Corrido ...

, defined as :

U_X = \nabla_ \log P(X, \theta)

with ''θ'' being a set (vector) of parameters. The function taking ''θ'' to log P(''X'', ''θ'') is the

log-likelihood A likelihood function (often simply called the likelihood) measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the j ...

of the probabilistic model.

Fisher kernel

The Fisher kernel is defined as :

K(X_i, X_j) = U_^T \mathcal^ U_

with ''

\mathcal

'' being the

Fisher information In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that models ''X''. Formally, it is the variance ...

matrix.

Applications

Information retrieval

The Fisher kernel is the kernel for a generative probabilistic model. As such, it constitutes a bridge between generative and probabilistic models of documents. Fisher kernels exist for numerous models, notably

tf–idf In information retrieval, tf–idf (term frequency–inverse document frequency, TF*IDF, TFIDF, TF–IDF, or Tf–idf) is a measure of importance of a word to a document in a collection or Text corpus, corpus, adjusted for the fact that some words ...

Naive Bayes In statistics, naive (sometimes simple or idiot's) Bayes classifiers are a family of " probabilistic classifiers" which assumes that the features are conditionally independent, given the target class. In other words, a naive Bayes model assumes th ...

and

probabilistic latent semantic analysis Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing (PLSI, especially in information retrieval circles) is a statistical technique for the analysis of two-mode and co-occurrence data. In effect, one c ...

Image classification and retrieval

The Fisher kernel can also be applied to image representation for classification or retrieval problems. Currently, the most popular bag-of-visual-words representation suffers from sparsity and high dimensionality. The Fisher kernel can result in a compact and dense representation, which is more desirable for image classification and retrieval problems. The Fisher Vector (FV), a special, approximate, and improved case of the general Fisher kernel, is an image representation obtained by pooling local image

features Feature may refer to: Computing * Feature recognition, could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (machine learning), in statistics: individual measurable properties of the phenome ...

. The FV encoding stores the mean and the covariance deviation vectors per component k of the Gaussian-Mixture-Model (GMM) and each element of the local feature descriptors together. In a systematic comparison, FV outperformed all compared encoding methods ( Bag of Visual Words (BoW), Kernel Codebook encoding (KCB), Locality Constrained Linear Coding (LLC), Vector of Locally Aggregated Descriptors (VLAD)) showing that the encoding of second order information (aka codeword covariances) indeed benefits classification performance.

Notes and references

* * {{Cite journal , last=Sánchez , first=Jorge , last2=Perronnin , first2=Florent , last3=Mensink , first3=Thomas , last4=Verbeek , first4=Jakob , date=December 2013 , title=Image Classification with the Fisher Vector: Theory and Practice , url=http://link.springer.com/10.1007/s11263-013-0636-x , journal=International Journal of Computer Vision , language=en , volume=105 , issue=3 , pages=222–245 , doi=10.1007/s11263-013-0636-x , issn=0920-5691, hdl=11336/12271 , hdl-access=free Kernel methods for machine learning