Sinkhorn's theorem states that every

square matrix In mathematics, a square matrix is a matrix with the same number of rows and columns. An ''n''-by-''n'' matrix is known as a square matrix of order Any two square matrices of the same order can be added and multiplied. Square matrices are ofte ...

with positive entries can be written in a certain standard form.

Theorem

If ''A'' is an ''n'' × ''n'' matrix with strictly positive elements, then there exist

diagonal matrices In linear algebra, a diagonal matrix is a matrix in which the entries outside the main diagonal are all zero; the term usually refers to square matrices. Elements of the main diagonal can either be zero or nonzero. An example of a 2×2 diagonal m ...

''D''₁ and ''D''₂ with strictly positive diagonal elements such that ''D''₁''AD''₂ is doubly stochastic. The matrices ''D''₁ and ''D''₂ are unique modulo multiplying the first matrix by a positive number and dividing the second one by the same number.Sinkhorn, Richard. (1964). "A relationship between arbitrary positive matrices and doubly stochastic matrices." ''Ann. Math. Statist.'' 35, 876–879. Marshall, A.W., & Olkin, I. (1967). "Scaling of matrices to achieve specified row and column sums." ''Numerische Mathematik''. 12(1), 83–90.

Sinkhorn–Knopp algorithm

A simple iterative method to approach the double stochastic matrix is to alternately rescale all rows and all columns of ''A'' to sum to 1. Sinkhorn and Knopp presented this algorithm and analyzed its convergence.Sinkhorn, Richard, & Knopp, Paul. (1967). "Concerning nonnegative matrices and doubly stochastic matrices". ''Pacific J. Math.'' 21, 343–348. This is essentially the same as the

Iterative proportional fitting The iterative proportional fitting procedure (IPF or IPFP, also known as biproportional fitting or biproportion in statistics or economics (input-output analysis, etc.), RAS algorithm in economics, raking in survey statistics, and matrix scaling in ...

algorithm, well known in survey statistics.

Analogues and extensions

The following analogue for unitary matrices is also true: for every

unitary matrix In linear algebra, a Complex number, complex Matrix (mathematics), square matrix is unitary if its conjugate transpose is also its Invertible matrix, inverse, that is, if U^* U = UU^* = UU^ = I, where is the identity matrix. In physics, esp ...

''U'' there exist two diagonal unitary matrices ''L'' and ''R'' such that ''LUR'' has each of its columns and rows summing to 1. The following extension to maps between matrices is also true (see Theorem 5 and also Theorem 4.7): given a

Kraus operator In quantum mechanics, a quantum operation (also known as quantum dynamical map or quantum process) is a mathematical formalism used to describe a broad class of transformations that a quantum mechanical system can undergo. This was first discussed ...

that represents the quantum operation Φ mapping a

density matrix In quantum mechanics, a density matrix (or density operator) is a matrix that describes the quantum state of a physical system. It allows for the calculation of the probabilities of the outcomes of any measurement performed upon this system, usin ...

into another, :

S \mapsto \Phi(S) = \sum_i B_i S B_i^*,

that is trace preserving, :

\sum_i B_i^* B_i = I,

and, in addition, whose range is in the interior of the positive definite cone (strict positivity), there exist scalings ''x''_''j'', for ''j'' in , that are positive definite so that the rescaled

S \mapsto x_1\Phi(x_0^Sx_0^)x_1 = \sum_i (x_1B_ix_0^) S (x_1B_ix_0^)^*

is doubly stochastic. In other words, it is such that both, :

x_1\Phi(x_0^I x_0^)x_1 = I,

as well as for the adjoint, :

x_0^\Phi^*(x_1I x_1)x_0^ = I,

where I denotes the identity operator.

Applications

In the 2010s Sinkhorn's theorem came to be used to find solutions of entropy-regularised

optimal transport In mathematics and economics, transportation theory or transport theory is a name given to the study of optimal transportation and allocation of resources. The problem was formalized by the French mathematician Gaspard Monge in 1781.G. Monge. ...

problems. This has been of interest in

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

because such "Sinkhorn distances" can be used to evaluate the difference between data distributions and permutations. This improves the training of machine learning algorithms, in situations where

maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed sta ...

training may not be the best method.

References

{{DEFAULTSORT:Sinkhorn's Theorem Matrix theory Theorems in linear algebra