A Siamese neural network (sometimes called a twin neural network) is an

artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected units ...

that uses the same weights while working in tandem on two different input vectors to compute comparable output vectors. Often one of the output vectors is precomputed, thus forming a baseline against which the other output vector is compared. This is similar to comparing

fingerprint A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surfa ...

s but can be described more technically as a distance function for

locality-sensitive hashing In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Since ...

. It is possible to build an architecture that is functionally similar to a siamese network but implements a slightly different function. This is typically used for comparing similar instances in different type sets. Uses of similarity measures where a twin network might be used are such things as recognizing handwritten checks, automatic detection of faces in camera images, and matching queries with indexed documents. The perhaps most well-known application of twin networks are

face recognition A facial recognition system is a technology capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, an ...

, where known images of people are precomputed and compared to an image from a turnstile or similar. It is not obvious at first, but there are two slightly different problems. One is recognizing a person among a large number of other persons, that is the facial recognition problem.

DeepFace DeepFace is a deep learning facial recognition system created by a research group at Facebook. It identifies human faces in digital images. The program employs a nine-layer neural network with over 120 million connection weights and was trained on ...

is an example of such a system. In its most extreme form this is recognizing a single person at a train station or airport. The other is

face verification A facial recognition system is a technology capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and wor ...

, that is to verify whether the photo in a pass is the same as the person claiming he or she is the same person. The twin network might be the same, but the implementation can be quite different.

Learning

Learning in twin networks can be done with

triplet loss Triplet loss is a loss function for machine learning algorithms where a reference input (called anchor) is compared to a matching input (called positive) and a non-matching input (called negative). The distance from the anchor to the positive is mi ...

contrastive loss Contrastive may refer to one of several concepts in linguistics: * Contrast (linguistics) *Contrastive linguistics * Contrastive distribution *Contrastive analysis *Contrastive rhetoric *Contrastive focus reduplication *Contrastive stress *Contrast ...

. For learning by triplet loss a baseline vector (anchor image) is compared against a positive vector (truthy image) and a negative vector (falsy image). The negative vector will force learning in the network, while the positive vector will act like a regularizer. For learning by contrastive loss there must be a weight decay to regularize the weights, or some similar operation like a normalization. A distance metric for a loss function may have the following properties * Non-negativity:

\delta ( x, y ) \ge 0

* Identity of Non-discernibles:

\delta ( x, y ) = 0 \iff x=y

* Symmetry:

\delta ( x, y ) = \delta ( y, x )

* Triangle inequality:

\delta ( x, z ) \le \delta ( x, y ) + \delta ( y, z )

In particular, the triplet loss algorithm is often defined with squared Euclidean (which unlike Euclidean, does not have triangle inequality) distance at its core.

Predefined metrics, Euclidean distance metric

The common learning goal is to minimize a distance metric for similar objects and maximize for distinct ones. This gives a loss function like :

\begin

\delta(x^, x^)=
\begin 
\min \ \,  \operatorname \left ( x^ \right ) - \operatorname \left ( x^ \right ) \,  \, , i = j  \\
\max \ \,  \operatorname \left ( x^ \right ) - \operatorname \left ( x^ \right ) \,  \, , i \neq j
\end
\end

i,j

are indexes into a set of vectors :

\operatorname(\cdot)

function implemented by the twin network The most common distance metric used is

Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefore o ...

, in case of which the loss function can be rewritten in matrix form as :

\operatorname ( \mathbf^, \mathbf^ ) \approx (\mathbf^ - \mathbf^)^(\mathbf^ - \mathbf^)

Learned metrics, nonlinear distance metric

A more general case is where the output vector from the twin network is passed through additional network layers implementing non-linear distance metrics. :

\, \text \end

i,j

are indexes into a set of vectors :

\operatorname(\cdot)

function implemented by the twin network :

\operatorname(\cdot)

function implemented by the network joining outputs from the twin network On a matrix form the previous is often approximated as a

Mahalanobis distance The Mahalanobis distance is a measure of the distance between a point ''P'' and a distribution ''D'', introduced by P. C. Mahalanobis in 1936. Mahalanobis's definition was prompted by the problem of identifying the similarities of skulls based ...

for a linear space as :

\operatorname ( \mathbf^, \mathbf^ ) \approx (\mathbf^ - \mathbf^)^\mathbf(\mathbf^ - \mathbf^)

This can be further subdivided in at least

Unsupervised learning Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and t ...

and

Supervised learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...

Learned metrics, half-twin networks

This form also allows the twin network to be more of a half-twin, implementing a slightly different functions :

\, \text \end

i,j

are indexes into a set of vectors :

\operatorname(\cdot), \operatorname(\cdot)

function implemented by the half-twin network :

\operatorname(\cdot)

function implemented by the network joining outputs from the twin network

Twin networks for object tracking

Twin networks have been used in object tracking because of its unique two tandem inputs and similarity measurement. In object tracking, one input of the twin network is user pre-selected exemplar image, the other input is a larger search image, which twin network's job is to locate exemplar inside of search image. By measuring the similarity between exemplar and each part of the search image, a map of similarity score can be given by the twin network. Furthermore, using a Fully Convolutional Network, the process of computing each sector's similarity score can be replaced with only one cross correlation layer. After being first introduced in 2016, Twin fully convolutional network has been used in many High-performance Real-time Object Tracking Neural Networks. Like CFnet, StructSiam, SiamFC-tri, DSiam, SA-Siam, SiamRPN, DaSiamRPN, Cascaded SiamRPN, SiamMask, SiamRPN++, Deeper and Wider SiamRPN.

References

Neural network architectures

Learning

Predefined metrics, Euclidean distance metric

Learned metrics, nonlinear distance metric

Learned metrics, half-twin networks

Twin networks for object tracking

See also

Further reading

References