number theory Number theory (or arithmetic or higher arithmetic in older usage) is a branch of pure mathematics devoted primarily to the study of the integers and integer-valued functions. German mathematician Carl Friedrich Gauss (1777–1855) said, "Math ...

, integer factorization is the decomposition of a

composite number A composite number is a positive integer that can be formed by multiplying two smaller positive integers. Equivalently, it is a positive integer that has at least one divisor other than 1 and itself. Every positive integer is composite, prime, ...

into a

product Product may refer to: Business * Product (business), an item that serves as a solution to a specific consumer problem. * Product (project management), a deliverable or set of deliverables that contribute to a business solution Mathematics * Prod ...

of smaller integers. If these

factors Factor, a Latin word meaning "who/which acts", may refer to: Commerce * Factor (agent), a person who acts for, notably a mercantile and colonial agent * Factor (Scotland), a person or firm managing a Scottish estate * Factors of production, suc ...

are further restricted to

prime number A prime number (or a prime) is a natural number greater than 1 that is not a product of two smaller natural numbers. A natural number greater than 1 that is not prime is called a composite number. For example, 5 is prime because the only way ...

s, the process is called prime factorization. When the numbers are sufficiently large, no efficient non-quantum integer

factorization In mathematics, factorization (or factorisation, see English spelling differences) or factoring consists of writing a number or another mathematical object as a product of several ''factors'', usually smaller or simpler objects of the same kind ...

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

is known. However, it has not been proven that such an algorithm does not exist. The presumed difficulty of this problem is important for the algorithms used in

cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adve ...

such as RSA public-key encryption and the RSA digital signature. Many areas of mathematics and

computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...

have been brought to bear on the problem, including

elliptic curve In mathematics, an elliptic curve is a smooth, projective, algebraic curve of genus one, on which there is a specified point . An elliptic curve is defined over a field and describes points in , the Cartesian product of with itself. If ...

s, algebraic number theory, and

quantum computing Quantum computing is a type of computation whose operations can harness the phenomena of quantum mechanics, such as superposition, interference, and entanglement. Devices that perform quantum computations are known as quantum computers. Thou ...

. In 2019, Fabrice Boudot, Pierrick Gaudry, Aurore Guillevic, Nadia Heninger, Emmanuel Thomé and Paul Zimmermann factored a 240-digit (795-bit) number (

RSA-240 In mathematics, the RSA numbers are a set of large semiprimes (numbers with exactly two prime factors) that were part of the RSA Factoring Challenge. The challenge was to find the prime factors of each number. It was created by RSA Laboratories i ...

) utilizing approximately 900 core-years of computing power. The researchers estimated that a 1024-bit RSA modulus would take about 500 times as long. Not all numbers of a given length are equally hard to factor. The hardest instances of these problems (for currently known techniques) are

semiprime In mathematics, a semiprime is a natural number that is the product of exactly two prime numbers. The two primes in the product may equal each other, so the semiprimes include the squares of prime numbers. Because there are infinitely many prime ...

s, the product of two prime numbers. When they are both large, for instance more than two thousand

bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...

s long, randomly chosen, and about the same size (but not too close, for example, to avoid efficient factorization by

Fermat's factorization method Fermat's factorization method, named after Pierre de Fermat, is based on the representation of an odd integer as the difference of two squares: :N = a^2 - b^2. That difference is algebraically factorable as (a+b)(a-b); if neither factor equals one ...

), even the fastest prime factorization algorithms on the fastest computers can take enough time to make the search impractical; that is, as the number of digits of the primes being factored increases, the number of operations required to perform the factorization on any computer increases drastically. Many cryptographic protocols are based on the difficulty of factoring large composite integers or a related problem—for example, the

RSA problem In cryptography, the RSA problem summarizes the task of performing an RSA private-key operation given only the public key. The RSA algorithm raises a ''message'' to an ''exponent'', modulo a composite number ''N'' whose factors are not known. Thus ...

. An algorithm that efficiently factors an arbitrary integer would render

RSA RSA may refer to: Organizations Academia and education * Rabbinical Seminary of America, a yeshiva in New York City *Regional Science Association International (formerly the Regional Science Association), a US-based learned society *Renaissance S ...

-based

public-key Public-key cryptography, or asymmetric cryptography, is the field of cryptographic systems that use pairs of related keys. Each key pair consists of a public key and a corresponding private key. Key pairs are generated with cryptographic alg ...

cryptography insecure.

Prime decomposition

By the

fundamental theorem of arithmetic In mathematics, the fundamental theorem of arithmetic, also called the unique factorization theorem and prime factorization theorem, states that every integer greater than 1 can be represented uniquely as a product of prime numbers, up to the o ...

, every positive integer has a unique

prime factor A prime number (or a prime) is a natural number greater than 1 that is not a product of two smaller natural numbers. A natural number greater than 1 that is not prime is called a composite number. For example, 5 is prime because the only ways ...

ization. (By convention, 1 is the

empty product In mathematics, an empty product, or nullary product or vacuous product, is the result of multiplying no factors. It is by convention equal to the multiplicative identity (assuming there is an identity for the multiplication operation in questio ...

.) Testing whether the integer is prime can be done in

polynomial time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...

, for example, by the

AKS primality test The AKS primality test (also known as Agrawal–Kayal–Saxena primality test and cyclotomic AKS test) is a deterministic primality-proving algorithm created and published by Manindra Agrawal, Neeraj Kayal, and Nitin Saxena, computer scientists ...

. If composite, however, the polynomial time tests give no insight into how to obtain the factors. Given a general algorithm for integer factorization, any integer can be factored into its constituent

s by repeated application of this algorithm. The situation is more complicated with special-purpose factorization algorithms, whose benefits may not be realized as well or even at all with the factors produced during decomposition. For example, if where are very large primes,

trial division Trial division is the most laborious but easiest to understand of the integer factorization algorithms. The essential idea behind trial division tests to see if an integer ''n'', the integer to be factored, can be divided by each number in turn ...

will quickly produce the factors 3 and 19 but will take ''p'' divisions to find the next factor. As a contrasting example, if ''n'' is the product of the primes 13729, 1372933, and 18848997161, where , Fermat's factorization method will begin with

\lceil\sqrt\rceil = 18848997159

which immediately yields

b = \sqrt = \sqrt = 2b

and hence the factors and . While these are easily recognized as composite and prime respectively, Fermat's method will take much longer to factor the composite number because the starting value of

\lceil\sqrt\rceil = 137292

for ''a'' is nowhere near 1372933.

Current state of the art

Among the ''b''-bit numbers, the most difficult to factor in practice using existing algorithms are those that are products of two primes of similar size. For this reason, these are the integers used in cryptographic applications. The largest such semiprime yet factored was

RSA-250 In mathematics, the RSA numbers are a set of large semiprimes (numbers with exactly two prime factors) that were part of the RSA Factoring Challenge. The challenge was to find the prime factors of each number. It was created by RSA Laboratories in ...

, an 829-bit number with 250 decimal digits, in February 2020. The total computation time was roughly 2700 core-years of computing using Intel Xeon Gold 6130 at 2.1 GHz. Like all recent factorization records, this factorization was completed with a highly optimized implementation of the

general number field sieve In number theory, the general number field sieve (GNFS) is the most efficient classical algorithm known for factoring integers larger than . Heuristically, its complexity for factoring an integer (consisting of bits) is of the form :\exp\le ...

run on hundreds of machines.

Difficulty and complexity

has been published that can factor all integers in

, that is, that can factor a ''b''-bit number ''n'' in time O(''b''^''k'') for some constant ''k''. Neither the existence nor non-existence of such algorithms has been proved, but it is generally suspected that they do not exist and hence that the problem is not in class P. The problem is clearly in class NP, but it is generally suspected that it is not

NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryin ...

, though this has not been proven. There are published algorithms that are faster than O((1 + ''ε'')^''b'') for all positive ''ε'', that is, sub-exponential. , the algorithm with best theoretical asymptotic running time is the

(GNFS), first published in 1993, running on a ''b''-bit number ''n'' in time: :

+ o(1)\right)(\ln n)^(\ln \ln n)^\right).

For current computers, GNFS is the best published algorithm for large ''n'' (more than about 400 bits). For a

quantum computer Quantum computing is a type of computation whose operations can harness the phenomena of quantum mechanics, such as superposition, interference, and entanglement. Devices that perform quantum computations are known as quantum computers. Thoug ...

, however,

Peter Shor Peter Williston Shor (born August 14, 1959) is an American professor of applied mathematics at MIT. He is known for his work on quantum computation, in particular for devising Shor's algorithm, a quantum algorithm for factoring exponentially ...

discovered an algorithm in 1994 that solves it in polynomial time. This will have significant implications for cryptography if quantum computation becomes scalable.

Shor's algorithm Shor's algorithm is a quantum computer algorithm for finding the prime factors of an integer. It was developed in 1994 by the American mathematician Peter Shor. On a quantum computer, to factor an integer N , Shor's algorithm runs in polynomial ...

takes only O(''b''³) time and O(''b'') space on ''b''-bit number inputs. In 2001, Shor's algorithm was implemented for the first time, by using NMR techniques on molecules that provide 7 qubits. It is not known exactly which

complexity class In computational complexity theory, a complexity class is a set of computational problems of related resource-based complexity. The two most commonly analyzed resources are time and memory. In general, a complexity class is defined in terms ...

es contain the

decision version In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm whe ...

of the integer factorization problem (that is: does have a factor smaller than ?). It is known to be in both NP and

co-NP In computational complexity theory, co-NP is a complexity class. A decision problem X is a member of co-NP if and only if its complement is in the complexity class NP. The class can be defined as follows: a decision problem is in co-NP precise ...

, meaning that both "yes" and "no" answers can be verified in polynomial time. An answer of "yes" can be certified by exhibiting a factorization with . An answer of "no" can be certified by exhibiting the factorization of ''n'' into distinct primes, all larger than ''k''; one can verify their primality using the

, and then multiply them to obtain ''n''. The

guarantees that there is only one possible string of increasing primes that will be accepted, which shows that the problem is in both UP and co-UP. It is known to be in BQP because of Shor's algorithm. The problem is suspected to be outside all three of the complexity classes P, NP-complete, and

co-NP-complete In complexity theory, computational problems that are co-NP-complete are those that are the hardest problems in co-NP, in the sense that any problem in co-NP can be reformulated as a special case of any co-NP-complete problem with only polynomial ...

. It is therefore a candidate for the

NP-intermediate In computational complexity, problems that are in the complexity class NP but are neither in the class P nor NP-complete are called NP-intermediate, and the class of such problems is called NPI. Ladner's theorem, shown in 1975 by Richard E. Lad ...

complexity class. If it could be proved to be either NP-complete or co-NP-complete, this would imply NP = co-NP, a very surprising result, and therefore integer factorization is widely suspected to be outside both these classes. In contrast, the decision problem "Is ''n'' a composite number?" (or equivalently: "Is ''n'' a prime number?") appears to be much easier than the problem of specifying factors of ''n''. The composite/prime problem can be solved in polynomial time (in the number ''b'' of digits of ''n'') with the

. In addition, there are several

probabilistic algorithm A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performan ...

s that can test primality very quickly in practice if one is willing to accept a vanishingly small possibility of error. The ease of

primality test A primality test is an algorithm for determining whether an input number is prime. Among other fields of mathematics, it is used for cryptography. Unlike integer factorization, primality tests do not generally give prime factors, only stating wh ...

ing is a crucial part of the

algorithm, as it is necessary to find large prime numbers to start with.

Factoring algorithms

Special-purpose

A special-purpose factoring algorithm's running time depends on the properties of the number to be factored or on one of its unknown factors: size, special form, etc. The parameters which determine the running time vary among algorithms. An important subclass of special-purpose factoring algorithms is the ''Category 1'' or ''First Category'' algorithms, whose running time depends on the size of smallest prime factor. Given an integer of unknown form, these methods are usually applied before general-purpose methods to remove small factors. For example, naive

is a Category 1 algorithm. *

Trial division Trial division is the most laborious but easiest to understand of the integer factorization algorithms. The essential idea behind trial division tests to see if an integer ''n'', the integer to be factored, can be divided by each number in turn ...

Wheel factorization Wheel factorization is an improvement of the trial division method for integer factorization. The trial division method consists of dividing the number to be factorized successively by the first integers (2, 3, 4, 5, ...) until finding a divisor ...

Pollard's rho algorithm Pollard's rho algorithm is an algorithm for integer factorization. It was invented by John Pollard in 1975. It uses only a small amount of space, and its expected running time is proportional to the square root of the smallest prime factor of th ...

, which has two common flavors to identify group cycles: one by Floyd and one by Brent. * Algebraic-group factorization algorithms, among which are Pollard's ''p'' − 1 algorithm, Williams' ''p'' + 1 algorithm, and

Lenstra elliptic curve factorization The Lenstra elliptic-curve factorization or the elliptic-curve factorization method (ECM) is a fast, sub- exponential running time, algorithm for integer factorization, which employs elliptic curves. For general-purpose factoring, ECM is the t ...

Euler's factorization method Euler's factorization method is a technique for factoring a number by writing it as a sum of two squares in two different ways. For example the number 1000009 can be written as 1000^2 + 3^2 or as 972^2 + 235^2 and Euler's method gives the factoriza ...

Special number field sieve In number theory, a branch of mathematics, the special number field sieve (SNFS) is a special-purpose integer factorization algorithm. The general number field sieve (GNFS) was derived from it. The special number field sieve is efficient for inte ...

General-purpose

A general-purpose factoring algorithm, also known as a ''Category 2'', ''Second Category'', or ''Kraitchik'' ''family'' algorithm, has a running time which depends solely on the size of the integer to be factored. This is the type of algorithm used to factor RSA numbers. Most general-purpose factoring algorithms are based on the

congruence of squares In number theory, a congruence of squares is a congruence commonly used in integer factorization algorithms. Derivation Given a positive integer ''n'', Fermat's factorization method relies on finding numbers ''x'' and ''y'' satisfying the equal ...

method. * Dixon's algorithm *

Continued fraction factorization In number theory, the continued fraction factorization method (CFRAC) is an integer factorization algorithm. It is a general-purpose algorithm, meaning that it is suitable for factoring any integer ''n'', not depending on special form or propertie ...

(CFRAC) *

Quadratic sieve The quadratic sieve algorithm (QS) is an integer factorization algorithm and, in practice, the second fastest method known (after the general number field sieve). It is still the fastest for integers under 100 decimal digits or so, and is considerab ...

Rational sieve In mathematics, the rational sieve is a general algorithm for factoring integers into prime factors. It is a special case of the general number field sieve. While it is less efficient than the general algorithm, it is conceptually simpler. It ser ...

General number field sieve In number theory, the general number field sieve (GNFS) is the most efficient classical algorithm known for factoring integers larger than . Heuristically, its complexity for factoring an integer (consisting of bits) is of the form :\exp\le ...

* Shanks's square forms factorization (SQUFOF)

Other notable algorithms

, for quantum computers

Heuristic running time

In number theory, there are many integer factoring algorithms that heuristically have expected

running time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...

L_n\left tfrac12,1+o(1)\right e^

in little-o and

L-notation ''L''-notation is an asymptotic notation analogous to big-O notation, denoted as L_n alpha,c/math> for a bound variable n tending to infinity. Like big-O notation, it is usually used to roughly convey the rate of growth of a function, such as the ...

. Some examples of those algorithms are the

elliptic curve method The Lenstra elliptic-curve factorization or the elliptic-curve factorization method (ECM) is a fast, sub- exponential running time, algorithm for integer factorization, which employs elliptic curves. For general-purpose factoring, ECM is the t ...

and the

quadratic sieve The quadratic sieve algorithm (QS) is an integer factorization algorithm and, in practice, the second fastest method known (after the general number field sieve). It is still the fastest for integers under 100 decimal digits or so, and is considerab ...

. Another such algorithm is the class group relations method proposed by Schnorr, Seysen, and Lenstra, which they proved only assuming the unproved Generalized Riemann Hypothesis (GRH).

Rigorous running time

The Schnorr–Seysen–Lenstra probabilistic algorithm has been rigorously proven by Lenstra and Pomerance to have expected running time

L_n\left tfrac12,1+o(1)\right /math> by replacing the GRH assumption with the use of multipliers.
The algorithm uses the

class group In number theory, the ideal class group (or class group) of an algebraic number field is the quotient group where is the group of fractional ideals of the ring of integers of , and is its subgroup of principal ideals. The class group is a me ...

of positive binary

quadratic form In mathematics, a quadratic form is a polynomial with terms all of degree two ("form" is another name for a homogeneous polynomial). For example, :4x^2 + 2xy - 3y^2 is a quadratic form in the variables and . The coefficients usually belong to ...

s of

discriminant In mathematics, the discriminant of a polynomial is a quantity that depends on the coefficients and allows deducing some properties of the roots without computing them. More precisely, it is a polynomial function of the coefficients of the ori ...

Δ denoted by ''G''_Δ. ''G''_Δ is the set of triples of integers (''a'', ''b'', ''c'') in which those integers are relative prime.

Schnorr–Seysen–Lenstra Algorithm

Given an integer ''n'' that will be factored, where ''n'' is an odd positive integer greater than a certain constant. In this factoring algorithm the discriminant Δ is chosen as a multiple of ''n'', , where ''d'' is some positive multiplier. The algorithm expects that for one ''d'' there exist enough

smooth Smooth may refer to: Mathematics * Smooth function, a function that is infinitely differentiable; used in calculus and topology * Smooth manifold, a differentiable manifold for which all the transition maps are smooth functions * Smooth algebrai ...

forms in ''G''_Δ. Lenstra and Pomerance show that the choice of ''d'' can be restricted to a small set to guarantee the smoothness result. Denote by ''P''_Δ the set of all primes ''q'' with

Kronecker symbol In number theory, the Kronecker symbol, written as \left(\frac an\right) or (a, n), is a generalization of the Jacobi symbol to all integers n. It was introduced by . Definition Let n be a non-zero integer, with prime factorization :n=u \cdot ...

\left(\tfrac\right)=1

. By constructing a set of generators of ''G''_Δ and prime forms ''f''_q of ''G''_Δ with ''q'' in ''P''_Δ a sequence of relations between the set of generators and ''f''_q are produced. The size of ''q'' can be bounded by

c_0(\log, \Delta, )^2

for some constant

c_0

. The relation that will be used is a relation between the product of powers that is equal to the

neutral element In mathematics, an identity element, or neutral element, of a binary operation operating on a set is an element of the set that leaves unchanged every element of the set when the operation is applied. This concept is used in algebraic structures ...

of ''G''_Δ. These relations will be used to construct a so-called ambiguous form of ''G''_Δ, which is an element of ''G''_Δ of order dividing 2. By calculating the corresponding factorization of Δ and by taking a gcd, this ambiguous form provides the complete prime factorization of ''n''. This algorithm has these main steps: Let ''n'' be the number to be factored. # Let Δ be a negative integer with , where ''d'' is a multiplier and Δ is the negative discriminant of some quadratic form. # Take the ''t'' first primes

p_1=2,p_2=3,p_3=5, \dots ,p_t

, for some

t\in

. # Let

f_q

be a random prime form of ''G''_Δ with

\left(\tfrac\right)=1

. # Find a generating set ''X'' of ''G''_Δ # Collect a sequence of relations between set ''X'' and satisfying:

\left(\prod_ x^\right).\left(\prod_ f^_\right) = 1

# Construct an ambiguous form that is an element ''f'' ∈ ''G''_Δ of order dividing 2 to obtain a coprime factorization of the largest odd divisor of Δ in which

\Delta = -4ac \text a(a - 4c) \text (b - 2a)(b + 2a)

# If the ambiguous form provides a factorization of ''n'' then stop, otherwise find another ambiguous form until the factorization of ''n'' is found. In order to prevent useless ambiguous forms from generating, build up the 2-Sylow group Sll₂(Δ) of ''G''(Δ). To obtain an algorithm for factoring any positive integer, it is necessary to add a few steps to this algorithm such as trial division, and the Jacobi sum test.

Expected running time

The algorithm as stated is a

as it makes random choices. Its expected running time is at most

L_n\left tfrac12,1+o(1)\right /math>.

Notes

References

* Chapter 5: Exponential Factoring Algorithms, pp. 191–226. Chapter 6: Subexponential Factoring Algorithms, pp. 227–284. Section 7.4: Elliptic curve method, pp. 301–313. *

Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer ...

. ''

The Art of Computer Programming ''The Art of Computer Programming'' (''TAOCP'') is a comprehensive monograph written by the computer scientist Donald Knuth presenting programming algorithms and their analysis. Volumes 1–5 are intended to represent the central core of comp ...

'', Volume 2: ''Seminumerical Algorithms'', Third Edition. Addison-Wesley, 1997. . Section 4.5.4: Factoring into Primes, pp. 379–417. * . *

External links

msieve
- SIQS and NFS - has helped complete some of the largest public factorizations known * Richard P. Brent, "Recent Progress and Prospects for Integer Factorisation Algorithms", ''Computing and Combinatorics"'', 2000, pp. 3–22

*

Manindra Agrawal Manindra Agrawal (born 20 May 1966) is a professor at the Department of Computer Science and Engineering and the Deputy Director at the Indian Institute of Technology, Kanpur. He was also the recipient of the first Infosys Prize for Mathematics ...

, Neeraj Kayal, Nitin Saxena, "PRIMES is in P." Annals of Mathematics 160(2): 781-793 (2004)
August 2005 version PDF
* Eric W. Weisstein
“RSA-640 Factored” ''MathWorld Headline News'', November 8, 2005

Dario Alpern's Integer factorization calculator
- A web app for factoring large integers {{Authority control Computational hardness assumptions Unsolved problems in computer science Factorization