Similarity learning
   HOME

TheInfoList



OR:

Similarity learning is an area of
supervised machine learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
in
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
. It is closely related to regression and classification, but the goal is to learn a
similarity function In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such mea ...
that measures how similar or related two objects are. It has applications in
ranking A ranking is a relationship between a set of items such that, for any two items, the first is either "ranked higher than", "ranked lower than" or "ranked equal to" the second. In mathematics, this is known as a weak order or total preorder of ...
, in recommendation systems, visual identity tracking, face verification, and speaker verification.


Learning setup

There are four common setups for similarity and metric distance learning. ; '' Regression similarity learning'' : In this setup, pairs of objects are given (x_i^1, x_i^2) together with a measure of their similarity y_i \in R . The goal is to learn a function that approximates f(x_i^1, x_i^2) \sim y_i for every new labeled triplet example (x_i^1, x_i^2, y_i). This is typically achieved by minimizing a regularized loss \min_W \sum_i loss(w;x_i^1, x_i^2,y_i) + reg(w). ; '' Classification similarity learning'' : Given are pairs of similar objects (x_i, x_i^+) and non similar objects (x_i, x_i^-). An equivalent formulation is that every pair (x_i^1, x_i^2) is given together with a binary label y_i \in \ that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not. ; ''Ranking similarity learning'' : Given are triplets of objects (x_i, x_i^+, x_i^-) whose relative similarity obey a predefined order: x_i is known to be more similar to x_i^+ than to x_i^-. The goal is to learn a function f such that for any new triplet of objects (x, x^+, x^-), it obeys f(x, x^+) > f(x, x^-) ( contrastive learning). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact
measure of similarity In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such meas ...
, one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications. ; Locality sensitive hashing (LSH) : Hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in
nearest neighbor search Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function ...
on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. A common approach for learning similarity is to model the similarity function as a bilinear form. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function f_W(x, z) = x^T W z . When data is abundant, a common approach is to learn a siamese network - A deep network model with parameter sharing.


Metric learning

Similarity learning is closely related to ''distance metric learning''. Metric learning is the task of learning a distance function over objects. A
metric Metric or metrical may refer to: * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics In mathem ...
or
distance function In mathematics, a metric space is a set together with a notion of ''distance'' between its elements, usually called points. The distance is measured by a function called a metric or distance function. Metric spaces are the most general setting ...
has to obey four axioms: non-negativity,
identity of indiscernibles The identity of indiscernibles is an ontological principle that states that there cannot be separate objects or entities that have all their properties in common. That is, entities ''x'' and ''y'' are identical if every predicate possessed by ''x'' ...
, symmetry and subadditivity (or the triangle inequality). In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric. When the objects x_i are vectors in R^d, then any matrix W in the symmetric positive semi-definite cone S_+^d defines a distance pseudo-metric of the space of x through the form D_W(x_1, x_2)^2 = (x_1-x_2)^ W (x_1-x_2). When W is a symmetric positive definite matrix, D_W is a metric. Moreover, as any symmetric positive semi-definite matrix W \in S_+^d can be decomposed as W = L^L where L \in R^ and e \geq rank(W), the distance function D_W can be rewritten equivalently D_W(x_1, x_2)^2 = (x_1-x_2)^ L^L (x_1-x_2) = \, L (x_1-x_2) \, _2^2. The distance D_W(x_1, x_2)^2=\, x_1' - x_2' \, _2^2 corresponds to the Euclidean distance between the transformed
feature vector In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...
s x_1'= Lx_1 and x_2'= Lx_2. Many formulations for metric learning have been proposed. Some well-known approaches for metric learning include Learning from relative comparisons which is based on the Triplet loss, Large margin nearest neighbor, Information theoretic metric learning (ITML). In statistics, the
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the ...
matrix of the data is sometimes used to define a distance metric called Mahalanobis distance.


Applications

Similarity learning is used in information retrieval for
learning to rank Learning to rank. Slides from Tie-Yan Liu's talk at WWW 2009 conference aravailable online or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construc ...
, in face verification or face identification, and in recommendation systems. Also, many machine learning approaches rely on some metric. This includes
unsupervised learning Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and t ...
such as clustering, which groups together close or similar objects. It also includes supervised approaches like
K-nearest neighbor algorithm In statistics, the ''k''-nearest neighbors algorithm (''k''-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and reg ...
which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches.


Scalability

Metric and similarity learning naively scale quadratically with the dimension of the input space, as can easily see when the learned metric has a bilinear form f_W(x, z) = x^T W z . Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL, and with COMET.


Software


metric-learn
is a
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
library which offers efficient implementations of several supervised and weakly-supervised similarity and metric learning algorithms. The API of metric-learn is compatible with
scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector ...
.
OpenMetricLearning
is a
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
framework to train and validate the models producing high-quality embeddings.


See also

*
Kernel method In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example ...
*
Learning to rank Learning to rank. Slides from Tie-Yan Liu's talk at WWW 2009 conference aravailable online or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construc ...
*
Latent semantic analysis Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the do ...


Further reading

For further information on this topic, see the surveys on metric and similarity learning by Bellet et al. and Kulis.


References

{{reflist Machine learning Semantic relations