Fisher kernel
   HOME

TheInfoList



OR:

In
statistical classification In statistics, classification is the problem of identifying which of a set of categories (sub-populations) an observation (or observations) belongs to. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagn ...
, the Fisher kernel, named after
Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who ...
, is a function that measures the similarity of two objects on the basis of sets of measurements for each object and a statistical model. In a classification procedure, the class for a new object (whose real class is unknown) can be estimated by minimising, across classes, an average of the Fisher kernel distance from the new object to each known member of the given class. The Fisher kernel was introduced in 1998. It combines the advantages of generative statistical models (like the
hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ...
) and those of discriminative methods (like
support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laborat ...
s): * generative models can process data of variable length (adding or removing data is well-supported) * discriminative methods can have flexible criteria and yield better results.


Derivation


Fisher score

The Fisher kernel makes use of the Fisher
score Score or scorer may refer to: *Test score, the result of an exam or test Business * Score Digital, now part of Bauer Radio * Score Entertainment, a former American trading card design and manufacturing company * Score Media, a former Canadian ...
, defined as : U_X = \nabla_ \log P(X, \theta) with ''θ'' being a set (vector) of parameters. The function taking ''θ'' to log P(''X'', ''θ'') is the
log-likelihood The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
of the probabilistic model.


Fisher kernel

The Fisher kernel is defined as : K(X_i, X_j) = U_^T \mathcal^ U_ with ''\mathcal'' being the
Fisher information In mathematical statistics, the Fisher information (sometimes simply called information) is a way of measuring the amount of information that an observable random variable ''X'' carries about an unknown parameter ''θ'' of a distribution that model ...
matrix.


Applications


Information retrieval

The Fisher kernel is the kernel for a generative probabilistic model. As such, it constitutes a bridge between generative and probabilistic models of documents. Fisher kernels exist for numerous models, notably
tf–idf In information retrieval, tf–idf (also TF*IDF, TFIDF, TF–IDF, or Tf–idf), short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or c ...
,
Naive Bayes In statistics, naive Bayes classifiers are a family of simple " probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features (see Bayes classifier). They are among the simplest Baye ...
and
probabilistic latent semantic analysis Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing (PLSI, especially in information retrieval circles) is a statistical technique for the analysis of two-mode and co-occurrence data. In effect, one ca ...
.


Image classification and retrieval

The Fisher kernel can also be applied to image representation for classification or retrieval problems. Currently, the most popular bag-of-visual-words representation suffers from sparsity and high dimensionality. The Fisher kernel can result in a compact and dense representation, which is more desirable for image classification and retrieval problems. The Fisher Vector (FV), a special, approximate, and improved case of the general Fisher kernel, is an image representation obtained by pooling local image
features Feature may refer to: Computing * Feature (CAD), could be a hole, pocket, or notch * Feature (computer vision), could be an edge, corner or blob * Feature (software design) is an intentional distinguishing characteristic of a software ite ...
. The FV encoding stores the mean and the covariance deviation vectors per component k of the Gaussian-Mixture-Model (GMM) and each element of the local feature descriptors together. In a systematic comparison, FV outperformed all compared encoding methods ( Bag of Visual Words (BoW), Kernel Codebook encoding (KCB), Locality Constrained Linear Coding (LLC), Vector of Locally Aggregated Descriptors (VLAD)) showing that the encoding of second order information (aka codeword covariances) indeed benefits classification performance.


See also

*
Fisher information metric In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, ''i.e.'', a smooth manifold whose points are probability measures defined on a common probability spa ...


Notes and references

* Nello Cristianini and John Shawe-Taylor. ''An Introduction to Support Vector Machines and other kernel-based learning methods''. Cambridge University Press, 2000. {{ISBN, 0-521-78019-5 '

SVM Book)'' Kernel methods for machine learning