Similarity learning is an area of
supervised machine learning in
artificial intelligence
Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
. It is closely related to
regression and
classification
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
, but the goal is to learn a
similarity function that measures how
similar or related two objects are. It has applications in
ranking
A ranking is a relationship between a set of items, often recorded in a list, such that, for any two items, the first is either "ranked higher than", "ranked lower than", or "ranked equal to" the second. In mathematics, this is known as a weak ...
, in
recommendation systems
A recommender system (RecSys), or a recommendation system (sometimes replacing ''system'' with terms such as ''platform'', ''engine'', or ''algorithm'') and sometimes only called "the algorithm" or "algorithm", is a subclass of information fi ...
, visual identity tracking, face verification, and speaker verification.
Learning setup
There are four common setups for similarity and metric distance learning.
; ''
Regression similarity learning''
: In this setup, pairs of objects are given
together with a measure of their similarity
. The goal is to learn a function that approximates
for every new labeled triplet example
. This is typically achieved by minimizing a regularized loss
.
; ''
Classification
Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
similarity learning''
: Given are pairs of similar objects
and non similar objects
. An equivalent formulation is that every pair
is given together with a binary label
that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not.
; ''Ranking similarity learning''
: Given are triplets of objects
whose relative similarity obey a predefined order:
is known to be more similar to
than to
. The goal is to learn a function
such that for any new triplet of objects
, it obeys
(
contrastive learning). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact
measure of similarity, one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications.
;
Locality sensitive hashing (LSH)
:
Hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in
nearest neighbor search
Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: ...
on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases.
A common approach for learning similarity is to model the similarity function as a
bilinear form
In mathematics, a bilinear form is a bilinear map on a vector space (the elements of which are called '' vectors'') over a field ''K'' (the elements of which are called '' scalars''). In other words, a bilinear form is a function that is linea ...
. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function
. When data is abundant, a common approach is to learn a
siamese network – a deep network model with parameter sharing.
Metric learning
Similarity learning is closely related to ''distance metric learning''. Metric learning is the task of learning a distance function over objects. A
metric
Metric or metrical may refer to:
Measuring
* Metric system, an internationally adopted decimal system of measurement
* An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement
Mathematics
...
or
distance function has to obey four axioms:
non-negativity,
identity of indiscernibles
The identity of indiscernibles is an ontological principle that states that there cannot be separate objects or entities that have all their properties in common. That is, entities ''x'' and ''y'' are identical if every predicate possessed by ...
,
symmetry
Symmetry () in everyday life refers to a sense of harmonious and beautiful proportion and balance. In mathematics, the term has a more precise definition and is usually used to refer to an object that is Invariant (mathematics), invariant und ...
and
subadditivity
In mathematics, subadditivity is a property of a function that states, roughly, that evaluating the function for the sum of two elements of the domain always returns something less than or equal to the sum of the function's values at each element ...
(or the triangle inequality). In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric.
When the objects
are vectors in
, then any matrix
in the symmetric positive semi-definite cone
defines a distance pseudo-metric of the space of x through the form
. When
is a symmetric positive definite matrix,
is a metric. Moreover, as any symmetric positive semi-definite matrix
can be decomposed as
where
and
, the distance function
can be rewritten equivalently
. The distance
corresponds to the Euclidean distance between the transformed
feature vector
In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a data set. Choosing informative, discriminating, and independent features is crucial to produce effective algorithms for pattern re ...
s
and
.
Many formulations for metric learning have been proposed.
Some well-known approaches for metric learning include learning from relative comparisons,
which is based on the
triplet loss,
large margin nearest neighbor
Large margin nearest neighbor (LMNN) classification is a statistical machine learning algorithm for similarity learning, metric learning. It learns a Pseudometric space, pseudometric designed for k-nearest neighbor classification. The algorithm is ...
,
and information theoretic metric learning (ITML).
In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, the
covariance
In probability theory and statistics, covariance is a measure of the joint variability of two random variables.
The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. If greater values of one ...
matrix of the data is sometimes used to define a distance metric called
Mahalanobis distance
The Mahalanobis distance is a distance measure, measure of the distance between a point P and a probability distribution D, introduced by Prasanta Chandra Mahalanobis, P. C. Mahalanobis in 1936. The mathematical details of Mahalanobis distance ...
.
Applications
Similarity learning is used in information retrieval for
learning to rank
Learning to rank. Slides from Tie-Yan Liu's talk at World Wide Web Conference, WWW 2009 conference aravailable online or machine-learned ranking (MLR) is the application of machine learning, typically Supervised learning, supervised, Semi-supervi ...
, in face verification or face identification,
and in
recommendation systems
A recommender system (RecSys), or a recommendation system (sometimes replacing ''system'' with terms such as ''platform'', ''engine'', or ''algorithm'') and sometimes only called "the algorithm" or "algorithm", is a subclass of information fi ...
. Also, many machine learning approaches rely on some metric. This includes
unsupervised learning
Unsupervised learning is a framework in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Other frameworks in the spectrum of supervisions include weak- or semi-supervision, wh ...
such as
clustering, which groups together close or similar objects. It also includes supervised approaches like
K-nearest neighbor algorithm
In statistics, the ''k''-nearest neighbors algorithm (''k''-NN) is a non-parametric supervised learning method. It was first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover.
Most often, it is used for cl ...
which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches.
Scalability
Metric and similarity learning scale quadratically with the dimension of the input space, as can easily see when the learned metric has a bilinear form
. Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL,
and with COMET.
Software
*metric-learn is a
free software
Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...
Python library which offers efficient implementations of several supervised and weakly-supervised similarity and metric learning algorithms. The API of metric-learn is compatible with
scikit-learn
scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language.
It features various classification, regression and clustering algorithms including support ...
.
* OpenMetricLearning
is a
Python framework to train and validate the models producing high-quality embeddings.
Further information
For further information on this topic, see the surveys on metric and similarity learning by Bellet et al.
[ and Kulis.][
]
See also
* Kernel method
*Latent semantic analysis
Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the d ...
*Learning to rank
Learning to rank. Slides from Tie-Yan Liu's talk at World Wide Web Conference, WWW 2009 conference aravailable online or machine-learned ranking (MLR) is the application of machine learning, typically Supervised learning, supervised, Semi-supervi ...
References
{{reflist
Machine learning
Semantic relations