Maximal Pair
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a maximal pair within a string is a pair of matching substrings that are maximal, where "maximal" means that it is not possible to make a longer matching pair by extending the range of both substrings to the left or right.


Example

For example, in this table, the substrings at indices 2 to 4 (in red) and indices 6 to 8 (in blue) are a maximal pair, because they contain identical characters (abc), and they have different characters to the left (x at index 1 and y at index 5) and different characters to the right (y at index 5 and w at index 9). Similarly, the substrings at indices 6 to 8 (in blue) and indices 10 to 12 (in green) are a maximal pair. However, the substrings at indices 2 to 4 (in red) and indices 10 to 12 (in green) are not a maximal pair, as the character y follows both substrings, and so they can be extended to the right to make a longer pair.


Formal definition

Formally, a maximal pair of substrings with starting positions p_1 and p_2 respectively, and both of length l, is specified by a
triple Triple is used in several contexts to mean "threefold" or a " treble": Sports * Triple (baseball), a three-base hit * A basketball three-point field goal * A figure skating jump with three rotations * In bowling terms, three strikes in a row * ...
(p_1, p_2, l), such that, given a string S of length n, S _1..p_1+l-1S _2..p_2+l-1/math> (meaning that the substrings have identical contents), but S _1-1\neq S _2-1/math> (they have different characters to their left) and S _1+l\neq S _2+l/math> (they also have different characters to their right; together, these two inequalities are the condition for being maximal). Thus, in the example above, the maximal pairs are (2,6,3) (the red and blue substrings) and (6,10,3) (the green and blue substrings), and (2,10,3) is not a maximal pair.


Related concepts and time complexity

A maximal repeat is the string represented by a maximal pair. A supermaximal repeat is a maximal repeat never occurring as a proper substring of another maximal repeat. In the above example, abc and abcy are both maximal repeats, but only abcy is a supermaximal repeat. Maximal pairs, maximal repeats and supermaximal repeats can each be found in \Theta(n+z) time using a
suffix tree In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particu ...
, if there are z such structures.


References


External links


Project for the computation of all maximal repeats in one ore more strings in Python
using
suffix array In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full-text indices, data-compression algorithms, and the field of bibliometrics. Suffix arrays were introduced by as ...
. String (computer science) Formal languages {{comp-sci-stub