Computational Complexity Of Matrix Multiplication
   HOME

TheInfoList



OR:

In
theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...
, the computational complexity of matrix multiplication dictates how quickly the operation of
matrix multiplication In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...
can be performed.
Matrix multiplication algorithm Because matrix multiplication is such a central operation in many numerical algorithms, much work has been invested in making matrix multiplication algorithms efficient. Applications of matrix multiplication in computational problems are found in m ...
s are a central subroutine in theoretical and numerical algorithms for
numerical linear algebra Numerical linear algebra, sometimes called applied linear algebra, is the study of how matrix operations can be used to create computer algorithms which efficiently and accurately provide approximate answers to questions in continuous mathematics. ...
and
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
, so finding the right amount of time it should take is of major practical relevance. Directly applying the mathematical definition of matrix multiplication gives an algorithm that requires
field Field may refer to: Expanses of open ground * Field (agriculture), an area of land used for agricultural purposes * Airfield, an aerodrome that lacks the infrastructure of an airport * Battlefield * Lawn, an area of mowed grass * Meadow, a grass ...
operations to multiply two matrices over that field ( in
big O notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Lan ...
). Surprisingly, algorithms exist that provide better running times than this straightforward "schoolbook algorithm". The first to be discovered was
Strassen's algorithm In linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication algorithm for large matrices, with a better asymptotic complexity, although th ...
, devised by
Volker Strassen Volker Strassen (born April 29, 1936) is a German mathematician, a professor emeritus in the department of mathematics and statistics at the University of Konstanz. For important contributions to the analysis of algorithms he has received many aw ...
in 1969 and often referred to as "fast matrix multiplication". The optimal number of field operations needed to multiply two square matrices up to constant factors is still unknown. This is a major open question in
theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...
. , the best announced bound on the
asymptotic complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations) ...
of a matrix multiplication algorithm is time, given by Duan, Wu and Zhou announced in a preprint. This improves on the bound of time, given by Josh Alman and
Virginia Vassilevska Williams Virginia Vassilevska Williams (née Virginia Panayotova Vassilevska) is a theoretical computer scientist and mathematician known for her research in computational complexity theory and algorithms. She is currently the Steven and Renee Finn Care ...
. However, this and similar improvements to Strassen are not used in practice, because they are
galactic algorithm A galactic algorithm is one that outperforms any other algorithm for problems that are sufficiently large, but where "sufficiently large" is so big that the algorithm is never used in practice. Galactic algorithms were so named by Richard Lipton a ...
s: the constant coefficient hidden by the
Big O notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Lan ...
is so large that they are only worthwhile for matrices that are too large to handle on present-day computers.


Simple algorithms

If ''A'', ''B'' are matrices over a field, then their product ''AB'' is also an matrix over that field, defined entrywise as : (AB)_ = \sum_^n A_ B_.


Schoolbook algorithm

The simplest approach to computing the product of two matrices ''A'' and ''B'' is to compute the arithmetic expressions coming from the definition of matrix multiplication. In pseudocode: input ''A'' and ''B'', both ''n'' by ''n'' matrices initialize ''C'' to be an ''n'' by ''n'' matrix of all zeros for ''i'' from 1 to ''n'': for ''j'' from 1 to ''n'': for ''k'' from 1 to ''n'': ''C'' 'i''''j''] = ''C'' 'i''''j''] + ''A'' 'i''''k'']*''B'' 'k''''j''] output ''C'' (as A*B) This
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
requires, in the
worst case In computer science, best, worst, and average cases of a given algorithm express what the resource usage is ''at least'', ''at most'' and ''on average'', respectively. Usually the resource being considered is running time, i.e. time complexity, b ...
, multiplications of scalars and additions for computing the product of two square matrices. Its
computational complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations) ...
is therefore , in a
model of computation In computer science, and more specifically in computability theory and computational complexity theory, a model of computation is a model which describes how an output of a mathematical function is computed given an input. A model describes how ...
where field operations (addition and multiplication) take constant time (in practice, this is the case for
floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
numbers, but not necessarily for integers).


Strassen's algorithm

Strassen's algorithm improves on naive matrix multiplication through a divide-and-conquer approach. The key observation is that multiplying two matrices can be done with only 7 multiplications, instead of the usual 8 (at the expense of 11 additional addition and subtraction operations). This means that, treating the input matrices as block matrices, the task of multiplying matrices can be reduced to 7 subproblems of multiplying matrices. Applying this recursively gives an algorithm needing O( n^) \approx O(n^) field operations. Unlike algorithms with faster asymptotic complexity, Strassen's algorithm is used in practice. The
numerical stability In the mathematical subfield of numerical analysis, numerical stability is a generally desirable property of numerical algorithms. The precise definition of stability depends on the context. One is numerical linear algebra and the other is algorit ...
is reduced compared to the naive algorithm, but it is faster in cases where or so and appears in several libraries, such as
BLAS Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix ...
. Fast matrix multiplication algorithms cannot achieve ''component-wise stability'', but some can be shown to exhibit ''norm-wise stability''. It is very useful for large matrices over exact domains such as
finite field In mathematics, a finite field or Galois field (so-named in honor of Évariste Galois) is a field that contains a finite number of elements. As with any field, a finite field is a set on which the operations of multiplication, addition, subtr ...
s, where numerical stability is not an issue.


Matrix multiplication exponent

The ''matrix multiplication exponent'', usually denoted , is the smallest real number for which any n\times n matrix over a field can be multiplied together using n^ field operations. This notation is commonly used in
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s research, so that algorithms using matrix multiplication as a subroutine have meaningful bounds on running time regardless of the true value of . Using a naive lower bound and schoolbook matrix multiplication for the upper bound, one can straightforwardly conclude that . Whether is a major open question in
theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...
, and there is a line of research developing matrix multiplication algorithms to get improved bounds on . The previous best bound on is , by Josh Alman and
Virginia Vassilevska Williams Virginia Vassilevska Williams (née Virginia Panayotova Vassilevska) is a theoretical computer scientist and mathematician known for her research in computational complexity theory and algorithms. She is currently the Steven and Renee Finn Care ...
. This algorithm, like all other recent algorithms in this line of research, uses the ''laser method'', a generalization of the Coppersmith–Winograd algorithm, which was given by
Don Coppersmith Don Coppersmith (born 1950) is a cryptographer and mathematician. He was involved in the design of the Data Encryption Standard block cipher at IBM, particularly the design of the S-boxes, strengthening them against differential cryptanalysis ...
and
Shmuel Winograd __NOTOC__ Shmuel Winograd ( he, שמואל וינוגרד; January 4, 1936 – March 25, 2019) was an Israeli-American computer scientist, noted for his contributions to computational complexity. He has proved several major results regarding the ...
in 1990 and was the best matrix multiplication algorithm until 2010. The conceptual idea of these algorithms is similar to Strassen's algorithm: a way is devised for multiplying two -matrices with fewer than multiplications, and this technique is applied recursively. The laser method has limitations to its power, and cannot be used to show that . Duan, Wu and Zhou identify a source of potential optimization in the laser method termed ''combination loss''. They find a way to exploit this to devise a variant of the laser method which they use to show , breaking the barrier for any conventional laser method algorithm. With this newer approach another bound applies according to Duan, Wu and Zhou and showing will not be possible only addressing combination loss in the laser method.


Group theory reformulation of matrix multiplication algorithms

Henry Cohn Henry Cohn is an American mathematician. He is a principal researcher at Microsoft Research and an adjunct professor at MIT. In collaboration with Abhinav Kumar, Stephen D. Miller, Danylo Radchenko, and Maryna Viazovska, he solved the sphere pac ...
,
Robert Kleinberg Robert David Kleinberg (also referred to as Bobby Kleinberg) is an American theoretical computer scientist and professor of Computer Science at Cornell University. Early life Robert Kleinberg was one of the finalists at the 1989 Mathcounts. He wa ...
,
Balázs Szegedy Balázs Szegedy is a Hungarian mathematician whose research concerns combinatorics and graph theory. Szegedy earned a master's degree in 1998 and a PhD in 2003 from Eötvös Loránd University in Budapest.
and
Chris Umans Christopher Umans is a professor of Computer Science in the Computing and Mathematical Sciences Department at the California Institute of Technology. He is known for work on algorithms, computational complexity, algebraic complexity, and hardness ...
put methods such as the Strassen and Coppersmith–Winograd algorithms in an entirely different
group-theoretic In abstract algebra, group theory studies the algebraic structures known as groups. The concept of a group is central to abstract algebra: other well-known algebraic structures, such as rings, fields, and vector spaces, can all be seen as g ...
context, by utilising triples of subsets of finite groups which satisfy a disjointness property called the triple product property (TPP). They also give conjectures that, if true, would imply that there are matrix multiplication algorithms with essentially quadratic complexity. This implies that the optimal exponent of matrix multiplication is 2, which most researchers believe is indeed the case. One such conjecture is that families of
wreath product In group theory, the wreath product is a special combination of two groups based on the semidirect product. It is formed by the action of one group on many copies of another group, somewhat analogous to exponentiation. Wreath products are used in ...
s of
Abelian group In mathematics, an abelian group, also called a commutative group, is a group in which the result of applying the group operation to two group elements does not depend on the order in which they are written. That is, the group operation is commut ...
s with symmetric groups realise families of subset triples with a simultaneous version of the TPP. Several of their conjectures have since been disproven by Blasiak, Cohn, Church, Grochow, Naslund, Sawin, and Umans using the Slice Rank method. Further, Alon, Shpilka and
Chris Umans Christopher Umans is a professor of Computer Science in the Computing and Mathematical Sciences Department at the California Institute of Technology. He is known for work on algorithms, computational complexity, algebraic complexity, and hardness ...
have recently shown that some of these conjectures implying fast matrix multiplication are incompatible with another plausible conjecture, the
sunflower conjecture The common sunflower (''Helianthus annuus'') is a large annual forb of the genus ''Helianthus'' grown as a crop for its edible oily seeds. Apart from cooking oil production, it is also used as livestock forage (as a meal or a silage plant), as ...
.


Lower bounds for ω

There is a trivial lower bound of . Since any algorithm for multiplying two -matrices has to process all entries, there is a trivial asymptotic lower bound of operations for any matrix multiplication algorithm. Thus . It is unknown whether . The best known lower bound for matrix-multiplication complexity is , for bounded coefficient arithmetic circuits over the real or complex numbers, and is due to
Ran Raz Ran Raz ( he, רָן רָז) is a computer scientist who works in the area of computational complexity theory. He was a professor in the faculty of mathematics and computer science at the Weizmann Institute. He is now a professor of computer scie ...
.


Rectangular matrix multiplication

Similar techniques also apply to rectangular matrix multiplication. The central object of study is \omega(k), which is the smallest c such that one can multiply a matrix of size n\times \lceil n^k\rceil with a matrix of size \lceil n^k\rceil \times n with O(n^) arithmetic operations. A result in algebraic complexity states that multiplying matrices of size n\times \lceil n^k\rceil and \lceil n^k\rceil \times n requires the same number of arithmetic operations as multiplying matrices of size n\times \lceil n^k\rceil and n \times n and of size n \times n and n\times \lceil n^k\rceil, so this encompasses the complexity of rectangular matrix multiplication. This generalizes the square matrix multiplication exponent, since \omega(1) = \omega. Since the output of the matrix multiplication problem is size n^2, we have \omega(k) \geq 2 for all values of k. If one can prove for some values of k between 0 and 1 that \omega(k) \leq 2, then such a result shows that \omega(k) = 2 for those k. The largest ''k'' such that \omega(k) = 2 is known as the ''dual matrix multiplication exponent'', usually denoted ''α''. ''α'' is referred to as the "
dual Dual or Duals may refer to: Paired/two things * Dual (mathematics), a notion of paired concepts that mirror one another ** Dual (category theory), a formalization of mathematical duality *** see more cases in :Duality theories * Dual (grammatical ...
" because showing that \alpha = 1 is equivalent to showing that \omega = 2. Like the matrix multiplication exponent, the dual matrix multiplication exponent sometimes appears in the complexity of algorithms in numerical linear algebra and optimization. The first bound on ''α'' is by
Coppersmith A coppersmith, also known as a brazier, is a person who makes artifacts from copper and brass. Brass is an alloy of copper and zinc. The term "redsmith" is used for a tinsmith that uses tinsmithing tools and techniques to make copper items. Hi ...
in 1982, who showed that \alpha > 0.17227. The current best bound on ''α'' is \alpha > 0.31389, given by Le Gall and Urrutia. This paper also contains bounds on \omega(k).


Related problems

Problems that have the same asymptotic complexity as matrix multiplication include
determinant In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and ...
,
matrix inversion In linear algebra, an -by- square matrix is called invertible (also nonsingular or nondegenerate), if there exists an -by- square matrix such that :\mathbf = \mathbf = \mathbf_n \ where denotes the -by- identity matrix and the multiplicati ...
,
Gaussian elimination In mathematics, Gaussian elimination, also known as row reduction, is an algorithm for solving systems of linear equations. It consists of a sequence of operations performed on the corresponding matrix of coefficients. This method can also be used ...
(see next section). Problems with complexity that is expressible in terms of \omega include characteristic polynomial, eigenvalues (but not eigenvectors),
Hermite normal form In linear algebra, the Hermite normal form is an analogue of reduced echelon form for matrices over the integers Z. Just as reduced echelon form can be used to solve problems about the solution to the linear system Ax=b where x is in R''n'', the Her ...
, and
Smith normal form In mathematics, the Smith normal form (sometimes abbreviated SNF) is a normal form that can be defined for any matrix (not necessarily square) with entries in a principal ideal domain (PID). The Smith normal form of a matrix is diagonal, and can b ...
.


Matrix inversion, determinant and Gaussian elimination

In his 1969 paper, where he proved the complexity O(n^) \approx O(n^) for matrix computation, Strassen proved also that
matrix inversion In linear algebra, an -by- square matrix is called invertible (also nonsingular or nondegenerate), if there exists an -by- square matrix such that :\mathbf = \mathbf = \mathbf_n \ where denotes the -by- identity matrix and the multiplicati ...
,
determinant In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and ...
and
Gaussian elimination In mathematics, Gaussian elimination, also known as row reduction, is an algorithm for solving systems of linear equations. It consists of a sequence of operations performed on the corresponding matrix of coefficients. This method can also be used ...
have, up to a multiplicative constant, the same
computational complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations) ...
as matrix multiplication. The proof does not make any assumptions on matrix multiplication that is used, except that its complexity is O(n^\omega) for some \omega \ge 2 The starting point of Strassen's proof is using
block matrix In mathematics, a block matrix or a partitioned matrix is a matrix that is '' interpreted'' as having been broken into sections called blocks or submatrices. Intuitively, a matrix interpreted as a block matrix can be visualized as the original mat ...
multiplication. Specifically, a matrix of even dimension may be partitioned in four blocks :\begin & \\ & \end. Under this form, its inverse is : \begin & \\ & \end^ = \begin ^+^(-^)^^ & -^(-^)^ \\ -(-^)^^ & (-^)^ \end, provided that and -^ are invertible. Thus, the inverse of a matrix may be computed with two inversions, six multiplications and four additions or additive inverses of matrices. It follows that, denoting respectively by , and the number of operations needed for inverting, multiplying and adding matrices, one has :I(2n) \le 2I(n) + 6M(n)+ 4 A(n). If n=2^k, one may apply this formula recursively: :\begin I(2^k) &\le 2I(2^) + 6M(2^)+ 4 A(2^)\\ &\le 2^2I(2^) + 6(M(2^)+2M(2^)) + 4(A(2^) + 2A(2^))\\ &\,\,\,\vdots \end If M(n)\le cn^\omega, and \alpha=2^\omega\ge 4, one gets eventually :\begin I(2^k) &\le 2^k I(1) + 6c(\alpha^+2\alpha^ + \cdots +2^\alpha^0) + k 2^\\ &\le 2^k + 6c\frac + k 2^\\ &\le d(2^k)^\omega \end for some constant . For matrices whose dimension is not a power of two, the same complexity is reached by increasing the dimension of the matrix to a power of two, by padding the matrix with rows and columns whose entries are 1 on the diagonal and 0 elsewhere. This proves the asserted complexity for matrices such that all submatrices that have to be inverted are indeed invertible. This complexity is thus proved for almost all matrices, as a matrix with randomly chosen entries is invertible with probability one. The same argument applies to
LU decomposition In numerical analysis and linear algebra, lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix (see matrix decomposition). The product sometimes includes a pe ...
, as, if the matrix is invertible, the equality :\begin & \\ & \end = \beginI & 0\\CA^&I\end\,\beginA&B\\0&D-CA^B\end defines a block LU decomposition that may be applied recursively to A and D-CA^B, for getting eventually a true LU decomposition of the original matrix. The argument applies also for the determinant, since it results from the block LU decomposition that :\det \begin & \\ & \end = \det(A)\det(D-CA^B).


Minimizing number of multiplications

Related to the problem of minimizing the number of arithmetic operations is minimizing the number of multiplications, which is typically a more costly operation than addition. A O(n^\omega) algorithm for matrix multiplication must necessarily only use O(n^\omega) multiplication operations, but these algorithms are impractical. Improving from the naive n^3 multiplications for schoolbook multiplication, 4\times 4 matrices in \mathbb/2\mathbb can be done with 47 multiplications, 3\times 3 matrix multiplication over a commutative ring can be done in 21 multiplications (23 if non-commutative). The lower bound of multiplications needed is 2''mn''+2''n''−''m''−2 (multiplication of ''n''×''m''-matrices with ''m''×''n''-matrices using the substitution method, ''m''⩾''n''⩾3), which means n=3 case requires at least 19 multiplications and n=4 at least 34. For n=2 optimal 7 multiplications 15 additions are minimal, compared to only 4 additions for 8 multiplications.


See also

*
Computational complexity of mathematical operations The following tables list the computational complexity of various algorithms for common mathematical operations. Here, complexity refers to the time complexity of performing computations on a multitape Turing machine. See big O notation for an ...
* CYK algorithm, §Valiant's algorithm * Freivalds' algorithm, a simple
Monte Carlo algorithm In computing, a Monte Carlo algorithm is a randomized algorithm whose output may be incorrect with a certain (typically small) probability. Two examples of such algorithms are Karger–Stein algorithm and Monte Carlo algorithm for minimum Feedba ...
that, given matrices , and , verifies in time if . *
Matrix chain multiplication Matrix chain multiplication (or the matrix chain ordering problem) is an optimization problem concerning the most efficient way to multiply a given sequence of matrices. The problem is not actually to ''perform'' the multiplications, but merely t ...
*
Matrix multiplication In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...
, for abstract definitions *
Matrix multiplication algorithm Because matrix multiplication is such a central operation in many numerical algorithms, much work has been invested in making matrix multiplication algorithms efficient. Applications of matrix multiplication in computational problems are found in m ...
, for practical implementation details * Sparse matrix–vector multiplication


References


External links


Yet another catalogue of fast matrix multiplication algorithms
*{{cite journal , last1=Fawzi , first1=A. , last2=Balog , first2=M. , last3=Huang , first3=A. , first4=T. , last4=Hubert , first5=B. , last5=Romera-Paredes , first6=M. , last6=Barekatain , first7=A. , last7=Novikov , first8=F.J.R. , last8=Ruiz , first9=J. , last9=Schrittwieser , first10=G. , last10=Swirszcz , first11=D. , last11=Silver , first12=D. , last12=Hassabis , first13=P. , last13=Kohli , title=Discovering faster matrix multiplication algorithms with reinforcement learning , journal=Nature , volume=610 , issue= 7930, pages=47–53 , date=2022 , doi=10.1038/s41586-022-05172-4 , pmid=36198780 , pmc=9534758 , url= Computer arithmetic algorithms Computational complexity theory Matrix theory Unsolved problems in computer science