computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, cycle detection or cycle finding is the

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

ic problem of finding a cycle in a

sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...

iterated function In mathematics, an iterated function is a function that is obtained by composing another function with itself two or several times. The process of repeatedly applying the same function is called iteration. In this process, starting from some ...

values. For any function that maps a

finite set In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example, is a finite set with five elements. Th ...

to itself, and any initial value in , the sequence of iterated function values :

x_0,\ x_1=f(x_0),\ x_2=f(x_1),\ \dots,\ x_i=f(x_),\ \dots

must eventually use the same value twice: there must be some pair of distinct indices and such that . Once this happens, the sequence must continue periodically, by repeating the same sequence of values from to . Cycle detection is the problem of finding and , given and . Several algorithms are known for finding cycles quickly and with little memory. Robert W. Floyd's tortoise and hare algorithm moves two pointers at different speeds through the sequence of values until they both point to equal values. Alternatively, Brent's algorithm is based on the idea of exponential search. Both Floyd's and Brent's algorithms use only a constant number of memory cells, and take a number of function evaluations that is proportional to the distance from the start of the sequence to the first repetition. Several other algorithms trade off larger amounts of memory for fewer function evaluations. The applications of cycle detection include testing the quality of

pseudorandom number generator A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers whose properties approximate the properties of sequences of random number generation, random n ...

s and

cryptographic hash function A cryptographic hash function (CHF) is a hash algorithm (a map (mathematics), map of an arbitrary binary string to a binary string with a fixed size of n bits) that has special properties desirable for a cryptography, cryptographic application: ...

computational number theory In mathematics and computer science, computational number theory, also known as algorithmic number theory, is the study of computational methods for investigating and solving problems in number theory and arithmetic geometry, including algorithm ...

algorithms, detection of

infinite loop In computer programming, an infinite loop (or endless loop) is a sequence of instructions that, as written, will continue endlessly, unless an external intervention occurs, such as turning off power via a switch or pulling a plug. It may be inte ...

s in computer programs and periodic configurations in cellular automata, automated shape analysis of

linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whi ...

data structures, and detection of deadlocks for transactions management in

DBMS In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and ana ...

Example

The figure shows a function that maps the set to itself. If one starts from and repeatedly applies , one sees the sequence of values : The cycle in this value sequence is .

Definitions

Let be any finite set, be any endofunction from to itself, and be any element of . For any , let . Let be the smallest index such that the value reappears infinitely often within the sequence of values , and let (the loop length) be the smallest positive integer such that . The cycle detection problem is the task of finding and .. One can view the same problem graph-theoretically, by constructing a functional graph (that is, a directed graph in which each vertex has a single outgoing edge) the vertices of which are the elements of and the edges of which map an element to the corresponding function value, as shown in the figure. The set of vertices reachable from starting vertex form a subgraph with a shape resembling the Greek letter rho (): a path of length from to a cycle of vertices.. Practical cycle-detection algorithms do not find and exactly. They usually find lower and upper bounds for the start of the cycle, and a more detailed search of the range must be performed if the exact value of is needed. Also, most algorithms do not guarantee to find directly, but may find some multiple . (Continuing the search for an additional steps, where is the smallest prime divisor of , will either find the true or prove that .)

Computer representation

Except in toy examples like the above, will not be specified as a table of values. Such a table implies

space complexity The space complexity of an algorithm or a data structure is the amount of memory space required to solve an instance of the computational problem as a function of characteristics of the input. It is the memory required by an algorithm until it exec ...

, and if that is permissible, an

associative array In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...

mapping to will detect the first repeated value. Rather, a cycle detection algorithm is given a

black box In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings. Its implementation is "opaque" (black). The te ...

for generating the sequence , and the task is to find and using very little memory. The black box might consist of an implementation of the recurrence function , but it might also store additional internal state to make the computation more efficient. Although must be true ''in principle'', this might be expensive to compute directly; the function could be defined in terms of the

discrete logarithm In mathematics, for given real numbers a and b, the logarithm \log_b(a) is a number x such that b^x=a. Analogously, in any group G, powers b^k can be defined for all integers k, and the discrete logarithm \log_b(a) is an integer k such that b^k=a ...

of or some other difficult-to-compute property which can only be practically computed in terms of additional information. In such cases, the number of black boxes required becomes a figure of merit distinguishing the algorithms. A second reason to use one of these algorithms is that they are pointer algorithms which do no operations on elements of other than testing for equality. An associative array implementation requires computing a

hash function A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...

on the elements of , or ordering them. But cycle detection can be applied in cases where neither of these are possible. The classic example is Pollard's rho algorithm for

integer factorization In mathematics, integer factorization is the decomposition of a positive integer into a product of integers. Every positive integer greater than 1 is either the product of two or more integer factors greater than 1, in which case it is a comp ...

, which searches for a factor of a given number by looking for values and which are equal modulo ''without knowing in advance.'' This is done by computing the

greatest common divisor In mathematics, the greatest common divisor (GCD), also known as greatest common factor (GCF), of two or more integers, which are not all zero, is the largest positive integer that divides each of the integers. For two integers , , the greatest co ...

of the difference with a known multiple of , namely . If the gcd is non-trivial (neither 1 nor ), then the value is a proper factor of , as desired. If is not prime, it must have at least one factor , and by the

birthday paradox In probability theory, the birthday problem asks for the probability that, in a set of randomly chosen people, at least two will share the same birthday. The birthday paradox is the counterintuitive fact that only 23 people are needed for that ...

, a random function has an expected cycle length (modulo ) of .

Algorithms

If the input is given as a subroutine for calculating , the cycle detection problem may be trivially solved using only function applications, simply by computing the sequence of values and using a

data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...

such as a

hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...

to store these values and test whether each subsequent value has already been stored. However, the space complexity of this algorithm is proportional to , unnecessarily large. Additionally, to implement this method as a pointer algorithm would require applying the equality test to each pair of values, resulting in quadratic time overall. Thus, research in this area has concentrated on two goals: using less space than this naive algorithm, and finding pointer algorithms that use fewer equality tests.

Floyd's tortoise and hare

Floyd's cycle-finding algorithm is a pointer algorithm that uses only two pointers, which move through the sequence at different speeds. It is also called the "tortoise and the hare algorithm", alluding to Aesop's fable of

The Tortoise and the Hare "The Tortoise and the Hare" is one of Aesop's Fables and is numbered 226 in the Perry Index. The account of a race between unequal partners has attracted conflicting interpretations. The fable itself is a variant of a common folktale theme in w ...

. The algorithm is named after Robert W. Floyd, who was credited with its invention by

Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...

. However, the algorithm does not appear in Floyd's published work, and this may be a misattribution: Floyd describes algorithms for listing all simple cycles in a directed graph in a 1967 paper, but this paper does not describe the cycle-finding problem in functional graphs that is the subject of this article. In fact, Knuth's statement (in 1969), attributing it to Floyd, without citation, is the first known appearance in print, and it thus may be a folk theorem, not attributable to a single individual. The key insight in the algorithm is as follows. If there is a cycle, then, for any integers and , , where is the length of the loop to be found, is the index of the first element of the cycle, and is a whole integer representing the number of loops. Based on this, it can then be shown that for some if and only if (if in the cycle, then there exists some such that , which implies that ; and if there are some and such that , then and ). Thus, the algorithm only needs to check for repeated values of this special form, one twice as far from the start of the sequence as the other, to find a period of a repetition that is a multiple of . Once is found, the algorithm retraces the sequence from its start to find the first repeated value in the sequence, using the fact that divides and therefore that . Finally, once the value of is known it is trivial to find the length of the shortest repeating cycle, by searching for the first position for which . The algorithm thus maintains two pointers into the given sequence, one (the tortoise) at , and the other (the hare) at . At each step of the algorithm, it increases by one, moving the tortoise one step forward and the hare two steps forward in the sequence, and then compares the sequence values at these two pointers. The smallest value of for which the tortoise and hare point to equal values is the desired value . The following Python code shows how this idea may be implemented as an algorithm. def floyd(f, x0) -> (int, int): """Floyd's cycle detection algorithm.""" # Main phase of algorithm: finding a repetition x_i = x_2i. # The hare moves twice as quickly as the tortoise and # the distance between them increases by 1 at each step. # Eventually they will both be inside the cycle and then, # at some point, the distance between them will be # divisible by the period λ. tortoise = f(x0) # f(x0) is the element/node next to x0. hare = f(f(x0)) while tortoise != hare: tortoise = f(tortoise) hare = f(f(hare)) # At this point the tortoise position, ν, which is also equal # to the distance between hare and tortoise, is divisible by # the period λ. So hare moving in cycle one step at a time, # and tortoise (reset to x0) moving towards the cycle, will # intersect at the beginning of the cycle. Because the # distance between them is constant at 2ν, a multiple of λ, # they will agree as soon as the tortoise reaches index μ. # Find the position μ of first repetition. mu = 0 tortoise = x0 while tortoise != hare: tortoise = f(tortoise) hare = f(hare) # Hare and tortoise move at same speed mu += 1 # Find the length of the shortest cycle starting from x_μ # The hare moves one step at a time while tortoise is still. # lam is incremented until λ is found. lam = 1 hare = f(tortoise) while tortoise != hare: hare = f(hare) lam += 1 return lam, mu This code only accesses the sequence by storing and copying pointers, function evaluations, and equality tests; therefore, it qualifies as a pointer algorithm. The algorithm uses operations of these types, and storage space.

Brent's algorithm

Richard P. Brent described an alternative cycle detection algorithm that, like the tortoise and hare algorithm, requires only two pointers into the sequence.. However, it is based on a different principle: searching for the smallest

power of two A power of two is a number of the form where is an integer, that is, the result of exponentiation with number 2, two as the Base (exponentiation), base and integer as the exponent. In the fast-growing hierarchy, is exactly equal to f_1^ ...

that is larger than both and . For , the algorithm compares with each subsequent sequence value up to the next power of two, stopping when it finds a match. It has two advantages compared to the tortoise and hare algorithm: it finds the correct length of the cycle directly, rather than needing to search for it in a subsequent stage, and its steps involve only one evaluation of the function rather than three. The following Python code shows how this technique works in more detail. def brent(f, x0) -> (int, int): """Brent's cycle detection algorithm.""" # main phase: search successive powers of two power = lam = 1 tortoise = x0 hare = f(x0) # f(x0) is the element/node next to x0. # this assumes there is a cycle; otherwise this loop won't terminate while tortoise != hare: if power

lam: # time to start a new power of two? tortoise = hare power *= 2 lam = 0 hare = f(hare) lam += 1 # Find the position of the first repetition of length λ tortoise = hare = x0 for i in range(lam): hare = f(hare) # The distance between the hare and tortoise is now λ. # Next, the hare and tortoise move at same speed until they agree mu = 0 while tortoise != hare: tortoise = f(tortoise) hare = f(hare) mu += 1 return lam, mu Like the tortoise and hare algorithm, this is a pointer algorithm that uses tests and function evaluations and storage space. It is not difficult to show that the number of function evaluations can never be higher than for Floyd's algorithm. Brent claims that, on average, his cycle finding algorithm runs around 36% more quickly than Floyd's and that it speeds up the Pollard rho algorithm by around 24%. He also performs an average case analysis for a randomized version of the algorithm in which the sequence of indices traced by the slower of the two pointers is not the powers of two themselves, but rather a randomized multiple of the powers of two. Although his main intended application was in integer factorization algorithms, Brent also discusses applications in testing pseudorandom number generators.

Gosper's algorithm

R. W. Gosper's algorithm finds the period

\lambda

, and the lower and upper bound of the starting point,

\mu_l

and

\mu_u

, of the first cycle. The difference between the lower and upper bound is of the same order as the period, i.e.

\mu_l + \lambda \approx \mu_h

. The algorithm maintains an array of tortoises

T_j

. For each

x_i

: * For each

0 \le j \le \log_2 i,

compare

x_i

T_j

. * If

x_i = T_j

, a cycle has been detected, of length

\lambda = (i - 2^j) \bmod 2^ + 1.

* If no match is found, set

T_k \leftarrow x_i

, where

k

is the number of trailing zeros in the binary representation of

i+1

. I.e. the greatest power of 2 which divides

i+1

. If it is inconvenient to vary the number of comparisons as

i

increases, you may initialize all of the

T_j = x_0

, but must then return

\lambda = i

x_i = T_j

while

i < 2^j

Advantages

The main features of Gosper's algorithm are that it is economical in space, very economical in evaluations of the generator function, and always finds the exact cycle length (never a multiple). The cost is a large number of equality comparisons. It could be roughly described as a concurrent version of Brent's algorithm. While Brent's algorithm uses a single tortoise, repositioned every time the hare passes a power of two, Gosper's algorithm uses several tortoises (several previous values are saved), which are roughly exponentially spaced. According to the note in

HAKMEM HAKMEM, alternatively known as AI Memo 239, is a February 1972 "memo" ( technical report) of the MIT AI Lab containing a wide variety of hacks, including useful and clever algorithms for mathematical computation, some number theory and schemat ...

item 132, this algorithm will detect repetition before the third occurrence of any value, i.e. the cycle will be iterated at most twice. HAKMEM also states that it is sufficient to store

\lceil\log_2\lambda\rceil

previous values; however, this only offers a saving if we know ''a priori'' that

\lambda

is significantly smaller than

\mu

. The standard implementations store

\lceil\log_2 (\mu + 2\lambda)\rceil

values. For example, assume the function values are 32-bit integers, so

\mu + \lambda \le 2^

and

\mu + 2\lambda \le 2^.

Then Gosper's algorithm will find the cycle after less than

\mu + 2\lambda

function evaluations (in fact, the most possible is

3\cdot 2^ - 1

), while consuming the space of 33 values (each value being a 32-bit integer).

Complexity

Upon the

i

-th evaluation of the generator function, the algorithm compares the generated value with

\log_2 i

previous values; observe that

i

goes up to at least

\mu + \lambda

and at most

\mu + 2\lambda

. Therefore, the time complexity of this algorithm is

O((\mu + \lambda) \cdot \log (\mu + \lambda))

. Since it stores

\log_2 (\mu + 2\lambda)

values, its space complexity is

\Theta(\log (\mu + \lambda))

. This is under the usual transdichotomous model, assumed throughout this article, in which the size of the function values is constant. Without this assumption, we know it requires

\Omega(\log (\mu + \lambda))

space to store

\mu + \lambda

distinct values, so the overall space complexity is

\Omega(\log^2 (\mu + \lambda)).

Time–space tradeoffs

A number of authors have studied techniques for cycle detection that use more memory than Floyd's and Brent's methods, but detect cycles more quickly. In general these methods store several previously-computed sequence values, and test whether each new value equals one of the previously-computed values. In order to do so quickly, they typically use a

or similar data structure for storing the previously-computed values, and therefore are not pointer algorithms: in particular, they usually cannot be applied to Pollard's rho algorithm. Where these methods differ is in how they determine which values to store. Following Nivasch,. we survey these techniques briefly. *Brent already describes variations of his technique in which the indices of saved sequence values are powers of a number other than two. By choosing to be a number close to one, and storing the sequence values at indices that are near a sequence of consecutive powers of , a cycle detection algorithm can use a number of function evaluations that is within an arbitrarily small factor of the optimum .. *Sedgewick, Szymanski, and Yao provide a method that uses memory cells and requires in the worst case only

(\lambda+\mu)(1+cM^)

function evaluations, for some constant , which they show to be optimal. The technique involves maintaining a numerical parameter , storing in a table only those positions in the sequence that are multiples of , and clearing the table and doubling whenever too many values have been stored. *Several authors have described ''distinguished point'' methods that store function values in a table based on a criterion involving the values, rather than (as in the method of Sedgewick et al.) based on their positions. For instance, values equal to zero modulo some value might be stored.. More simply, Nivasch credits D. P. Woodruff with the suggestion of storing a random sample of previously seen values, making an appropriate random choice at each step so that the sample remains random. *Nivasch describes an algorithm that does not use a fixed amount of memory, but for which the expected amount of memory used (under the assumption that the input function is random) is logarithmic in the sequence length. An item is stored in the memory table, with this technique, when no later item has a smaller value. As Nivasch shows, the items with this technique can be maintained using a stack data structure, and each successive sequence value need be compared only to the top of the stack. The algorithm terminates when the repeated sequence element with smallest value is found. Running the same algorithm with multiple stacks, using random permutations of the values to reorder the values within each stack, allows a time–space tradeoff similar to the previous algorithms. However, even the version of this algorithm with a single stack is not a pointer algorithm, due to the comparisons needed to determine which of two values is smaller. Any cycle detection algorithm that stores at most values from the input sequence must perform at least

(\lambda+\mu)\left(1+\frac\right)

function evaluations..

Applications

Cycle detection has been used in many applications. *Determining the cycle length of a

is one measure of its strength. This is the application cited by Knuth in describing Floyd's method. Brent describes the results of testing a

linear congruential generator A linear congruential generator (LCG) is an algorithm that yields a sequence of pseudo-randomized numbers calculated with a discontinuous piecewise linear equation. The method represents one of the oldest and best-known pseudorandom number gener ...

in this fashion; its period turned out to be significantly smaller than advertised. For more complex generators, the sequence of values in which the cycle is to be found may not represent the output of the generator, but rather its internal state. *Several number-theoretic algorithms are based on cycle detection, including Pollard's rho algorithm for integer factorization and his related kangaroo algorithm for the

problem. *In

cryptographic Cryptography, or cryptology (from "hidden, secret"; and ''graphein'', "to write", or '' -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adversarial behavior. More gen ...

applications, the ability to find two distinct values ''x''_''μ''−1 and ''x''_{''λ''+''μ''−1} mapped by some cryptographic function ''ƒ'' to the same value ''x''_''μ'' may indicate a weakness in ''ƒ''. For instance, Quisquater and Delescaille apply cycle detection algorithms in the search for a message and a pair of

Data Encryption Standard The Data Encryption Standard (DES ) is a symmetric-key algorithm for the encryption of digital data. Although its short key length of 56 bits makes it too insecure for modern applications, it has been highly influential in the advancement of cryp ...

keys that map that message to the same encrypted value; Kaliski, Rivest, and Sherman. also use cycle detection algorithms to attack DES. The technique may also be used to find a

collision In physics, a collision is any event in which two or more bodies exert forces on each other in a relatively short time. Although the most common use of the word ''collision'' refers to incidents in which two or more objects collide with great for ...

in a

. *Cycle detection may be helpful as a way of discovering

s in certain types of

computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...

s. * Periodic configurations in

cellular automaton A cellular automaton (pl. cellular automata, abbrev. CA) is a discrete model of computation studied in automata theory. Cellular automata are also called cellular spaces, tessellation automata, homogeneous structures, cellular structures, tessel ...

simulations may be found by applying cycle detection algorithms to the sequence of automaton states. * Shape analysis of

data structures is a technique for verifying the correctness of an algorithm using those structures. If a node in the list incorrectly points to an earlier node in the same list, the structure will form a cycle that can be detected by these algorithms.. In

Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...

, the

S-expression In computer programming, an S-expression (or symbolic expression, abbreviated as sexpr or sexp) is an expression in a like-named notation for nested List (computing), list (Tree (data structure), tree-structured) data. S-expressions were invented ...

printer, under control of the *print-circle* variable, detects circular list structure and prints it compactly. *Teske describes applications in

computational group theory In mathematics, computational group theory is the study of group (mathematics), groups by means of computers. It is concerned with designing and analysing algorithms and data structures to compute information about groups. The subject has attracte ...

: determining the structure of an

Abelian group In mathematics, an abelian group, also called a commutative group, is a group in which the result of applying the group operation to two group elements does not depend on the order in which they are written. That is, the group operation is commu ...

from a set of its generators. The cryptographic algorithms of Kaliski et al. may also be viewed as attempting to infer the structure of an unknown group. * briefly mentions an application to

computer simulation Computer simulation is the running of a mathematical model on a computer, the model being designed to represent the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be determin ...

celestial mechanics Celestial mechanics is the branch of astronomy that deals with the motions of objects in outer space. Historically, celestial mechanics applies principles of physics (classical mechanics) to astronomical objects, such as stars and planets, to ...

, which she attributes to William Kahan. In this application, cycle detection in the

phase space The phase space of a physical system is the set of all possible physical states of the system when described by a given parameterization. Each possible state corresponds uniquely to a point in the phase space. For mechanical systems, the p ...

of an orbital system may be used to determine whether the system is periodic to within the accuracy of the simulation. *In

Mandelbrot Set The Mandelbrot set () is a two-dimensional set (mathematics), set that is defined in the complex plane as the complex numbers c for which the function f_c(z)=z^2+c does not Stability theory, diverge to infinity when Iteration, iterated starting ...

fractal generation some performance techniques are used to speed up the image generation. One of them is called "period checking", which consists of finding the cycles in a point orbit. This article describes the " period checking" technique. You can find another explanatio
here
Some cycle detection algorithms have to be implemented in order to implement this technique.

References

External links

*Gabriel Nivasch
The Cycle Detection Problem and the Stack AlgorithmTortoise and Hare
Portland Pattern Repository
Floyd's Cycle Detection Algorithm (The Tortoise and the Hare)Brent's Cycle Detection Algorithm (The Teleporting Turtle)
{{DEFAULTSORT:Cycle Detection Fixed points (mathematics) Combinatorial algorithms Articles with example Python (programming language) code The Tortoise and the Hare