mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...

, a Relevance Vector Machine (RVM) is a

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

technique that uses

Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, a ...

to obtain

parsimonious Occam's razor, Ockham's razor, or Ocham's razor ( la, novacula Occami), also known as the principle of parsimony or the law of parsimony ( la, lex parsimoniae), is the problem-solving principle that "entities should not be multiplied beyond neces ...

solutions for

regression Regression or regressions may refer to: Science * Marine regression, coastal advance due to falling sea level, the opposite of marine transgression * Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...

and

probabilistic classification In machine learning, a probabilistic classifier is a classifier that is able to predict, given an observation of an input, a probability distribution over a set of classes, rather than only outputting the most likely class that the observation sho ...

. The RVM has an identical functional form to the

support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...

, but provides probabilistic classification. It is actually equivalent to a

Gaussian process In probability theory and statistics, a Gaussian process is a stochastic process (a collection of random variables indexed by time or space), such that every finite collection of those random variables has a multivariate normal distribution, i.e. e ...

model with

covariance function In probability theory and statistics, the covariance function describes how much two random variables change together (their ''covariance'') with varying spatial or temporal separation. For a random field or stochastic process ''Z''(''x'') on a doma ...

: :

k(\mathbf,\mathbf) = \sum_^N \frac \varphi(\mathbf,\mathbf_j)\varphi(\mathbf',\mathbf_j)

where

\varphi

is the

kernel function In operator theory, a branch of mathematics, a positive-definite kernel is a generalization of a positive-definite function or a positive-definite matrix. It was first introduced by James Mercer in the early 20th century, in the context of solving ...

(usually Gaussian),

\alpha_j

are the variances of the prior on the weight vector

w \sim N(0,\alpha^I)

, and

\mathbf_1,\ldots,\mathbf_N

are the input vectors of the

training set In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from ...

. Compared to that of

s (SVM), the Bayesian formulation of the RVM avoids the set of free parameters of the SVM (that usually require cross-validation-based post-optimizations). However RVMs use an

expectation maximization Expectation or Expectations may refer to: Science * Expectation (epistemic) * Expected value, in mathematical probability theory * Expectation value (quantum mechanics) * Expectation–maximization algorithm, in statistics Music * ''Expectation' ...

(EM)-like learning method and are therefore at risk of local minima. This is unlike the standard

sequential minimal optimization Sequential minimal optimization (SMO) is an algorithm for solving the quadratic programming (QP) problem that arises during the training of support-vector machines (SVM). It was invented by John Platt in 1998 at Microsoft Research. SMO is widely u ...

(SMO)-based algorithms employed by SVMs, which are guaranteed to find a global optimum (of the convex problem). The relevance vector machine was patented in the United States by

Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...

(patent expired September 4, 2019).

References

{{reflist

Software

dlib

C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...

Library
The Kernel-Machine Library

R package for binary classification
scikit-rvm

fast-scikit-rvmrvm tutorial

External links

Tipping's webpage on Sparse Bayesian Models and the RVMA Tutorial on RVM by Tristan FletcherApplied tutorial on RVMComparison of RVM and SVM
Classification algorithms Kernel methods for machine learning Nonparametric Bayesian statistics

See also

References

Software

External links