Combinatorial Cube
   HOME

TheInfoList



OR:

In the mathematical study of
combinatorics on words Combinatorics on words is a fairly new field of mathematics, branching from combinatorics, which focuses on the study of words and formal languages. The subject looks at letters or symbols, and the sequences they form. Combinatorics on words ...
, a parameter word is a string over a given
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syll ...
having some number of
wildcard character In software, a wildcard character is a kind of placeholder represented by a single character, such as an asterisk (), which can be interpreted as a number of literal characters or an empty string. It is often used in file searches so the full na ...
s. The set of strings matching a given parameter word is called a parameter set or combinatorial cube. Parameter words can be composed, to produce smaller subcubes of a given combinatorial cube. They have applications in
Ramsey theory Ramsey theory, named after the British mathematician and philosopher Frank P. Ramsey, is a branch of mathematics that focuses on the appearance of order in a substructure given a structure of a known size. Problems in Ramsey theory typically ask a ...
and in computer science in the detection of
duplicate code In computer programming, duplicate code is a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a n ...
.


Definitions and notation

Formally, a word of length n, over a given alphabet A, is a sequence of n characters, some of which may be drawn from A and the others of which are k distinct wildcard characters *_1,*_2,\ldots, *_k. Each wildcard character is required to appear at least once, but may appear multiple times, and the wildcard characters must appear in the order given by their indexes: the first wildcard character in the word must be *_1, the next one that is different from *_1 must be *_2, etc. As a special case, a word over the given alphabet, without any wildcard characters, is said to be a 0-parameter word. For 1-parameter words, the subscripts may be omitted, as there is no ambiguity between different wildcard characters. The set of all words over A, of length n, is A word represents a set of , A, ^k strings (0-parameter words), obtained by substituting a symbol of A for each wildcard character. This set of strings is called a parameter set of combinatorial cube, and k is called its dimension. A one-dimensional combinatorial cube may be called a combinatorial line. In a combinatorial cube, each copy of a particular wildcard character must have the same replacement. A generalization of parameter words allows different copies of the same wildcard character to be replaced by different characters from the alphabet, in a controlled way. If A is an alphabet and G is a
group A group is a number of persons or things that are located, gathered, or classed together. Groups of people * Cultural group, a group whose members share the same cultural identity * Ethnic group, a group whose members share the same ethnic ide ...
with an
action Action may refer to: * Action (narrative), a literary mode * Action fiction, a type of genre fiction * Action game, a genre of video game Film * Action film, a genre of film * ''Action'' (1921 film), a film by John Ford * ''Action'' (1980 fil ...
on A, then a parameter word is a word together with an assignment of a group element to each wildcard character in the word. The first occurrence of each wildcard character must be assigned the
identity element In mathematics, an identity element, or neutral element, of a binary operation operating on a set is an element of the set that leaves unchanged every element of the set when the operation is applied. This concept is used in algebraic structures su ...
of the group. Then, the strings represented by a labeled parameter word are obtained by choosing a character of A for each wildcard character, and substituting the result of combining that character with the group element labeling each copy of that character. The set of all words over A, of length n, is


Example

In the game of
tic-tac-toe Tic-tac-toe (American English), noughts and crosses (Commonwealth English), or Xs and Os (Canadian or Irish English) is a paper-and-pencil game for two players who take turns marking the spaces in a three-by-three grid with ''X'' or ''O''. T ...
, the cells of the game board can be given two integer coordinates (x,y) from the alphabet \. Concatenating these two coordinates produces a string representing each cell, one of the nine strings 11, 12, 13, 21, 22, 23, 31, 32, or 33. There are seven one-parameter words of length two over this alphabet, the words 1*, 2*, 3*, *1, *2, *3, and **. The corresponding combinatorial lines form seven of the eight lines of three cells in a row of the tic-tac-toe board; for instance, the one-parameter word 2* corresponds to the combinatorial line \, and the one-parameter word ** corresponds to the combinatorial However, one of the eight winning lines of the tic-tac-toe game is missing from this set of combinatorial lines: the
antidiagonal In linear algebra, the main diagonal (sometimes principal diagonal, primary diagonal, leading diagonal, major diagonal, or good diagonal) of a matrix A is the list of entries a_ where i = j. All off-diagonal elements are zero in a diagonal matrix. ...
It is possible to obtain this line as a combinatorial line (without including any other combinations of cells that would be invalid for tic-tac-toe) by using a group with two elements, and an action in which the non-identity element swaps the alphabet letters 1 and 3 while leaving the element 2 in place. There are eight labeled one-parameter words of length two for this action, seven of which are obtained from the unlabeled one-parameter words by using the identity label for all wildcards. These seven have the same combinatorial lines as before. The eighth labeled word consists of the word ** labeled by the identity element for its first * and the reversing non-identity element for the second *; its combinatorial line is the final winning line of the tic-tac-toe board,


Composition

For three given integer parameters n\ge m\ge k, it is possible to combine two parameter words, f\in A\tbinom and g\in A\tbinom, to produce another parameter word f\circ g\in A\tbinom. To do so, simply replace each copy of the ith wildcard symbol in f by the ith character in g. This will necessarily produce a word of length n that uses each of the wildcard symbols in g at least once, in ascending order, so it produces a valid word of length n. This notion of composition can also be extended to composition of labeled parameter words (both using the same alphabet and group action), by applying the group action to the non-wildcard substituted characters and composing the group labels for the wildcard substituted characters. A subset of a combinatorial cube is a smaller combinatorial cube if it can be obtained by a composition in this way.


Combinatorial enumeration

The number of parameter words in A\tbinom for an alphabet of size r is an r-
Stirling number of the second kind In mathematics, particularly in combinatorics, a Stirling number of the second kind (or Stirling partition number) is the number of ways to partition a set of ''n'' objects into ''k'' non-empty subsets and is denoted by S(n,k) or \textstyle \lef ...
\textstyle\left\_r. These numbers count the number of partitions of the integers in the range ,r+n/math> into r+k non-empty subsets such that the first r integers belong to distinct subsets. Partitions of this type can be placed into a bijective equivalence with the parameter words, by creating a word with a character for each of the n integers in the range +1,n+r/math>, setting this character value to be either an integer in ,r/math> belonging to the same subset of the partition, or a wildcard character for each subset of the partition that does not contain an integer in ,r/math>. The r-Stirling numbers obey a simple
recurrence relation In mathematics, a recurrence relation is an equation according to which the nth term of a sequence of numbers is equal to some combination of the previous terms. Often, only k previous terms of the sequence appear in the equation, for a parameter ...
by which they may easily be calculated.


Applications

In
Ramsey theory Ramsey theory, named after the British mathematician and philosopher Frank P. Ramsey, is a branch of mathematics that focuses on the appearance of order in a substructure given a structure of a known size. Problems in Ramsey theory typically ask a ...
, parameter words and combinatorial cubes may be used to formulate the
Graham–Rothschild theorem In mathematics, the Graham–Rothschild theorem is a theorem that applies Ramsey theory to combinatorics on words and combinatorial cubes. It is named after Ronald Graham and Bruce Lee Rothschild, who published its proof in 1971. Through the work o ...
, according to which, for every finite alphabet and group action, and every combination of integer values m, k, and r, there exists a sufficiently large number n such that if each combinatorial cube over strings of length n is assigned one of r colors, then there exists a combinatorial cube all of whose subcubes have the same color. This result is a key foundation for structural Ramsey theory, and is used to define Graham's number, an enormous number used to estimate the value of n for a certain combination of values. In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, in the problem of searching for
duplicate code In computer programming, duplicate code is a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a n ...
, the source code for a given routine or module may be transformed into a parameter word by converting it into a sequence of tokens, and for each variable or subroutine name, replacing each copy of the same name with the same wildcard character. If code is duplicated, the resulting parameter words will remain equal even if some of the variables or subroutines have been renamed. More sophisticated searching algorithms can find long duplicate code sections that form substrings of larger source code repositories, by allowing the wildcard characters to be substituted for each other. An important special case of parameter words, well-studied in the combinatorics of words, is given by the partial words. These are strings with wildcard characters that may be substituted independently of each other, without requiring that some of the substituted characters be equal or controlled by a group action. In the language of parameter words, a partial word may be described as a parameter word in which each wildcard symbol appears exactly once. However, because there is no repetition of wildcard symbols, partial words may be written more simply by omitting the subscripts on the wildcard symbols.


See also

*
Regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...


References

{{reflist, refs= {{citation , last = Baker , first = Brenda S. , authorlink = Brenda Baker , doi = 10.1137/S0097539793246707 , issue = 5 , journal =
SIAM Journal on Computing The ''SIAM Journal on Computing'' is a scientific journal focusing on the mathematical and formal aspects of computer science. It is published by the Society for Industrial and Applied Mathematics (SIAM). Although its official ISO abbreviation is ...
, mr = 1471985 , pages = 1343–1362 , title = Parameterized duplication in strings: algorithms and an application to software maintenance , volume = 26 , year = 1997
{{citation , last1 = Benzait , first1 = A. , last2 = Voigt , first2 = B. , department = Proceedings of the Oberwolfach Meeting "Kombinatorik" (1986) , doi = 10.1016/0012-365X(88)90130-6 , issue = 1-2 , journal =
Discrete Mathematics Discrete mathematics is the study of mathematical structures that can be considered "discrete" (in a way analogous to discrete variables, having a bijection with the set of natural numbers) rather than "continuous" (analogously to continuous f ...
, mr = 974810 , pages = 27–35 , title = A combinatorial interpretation of (1/k!)\Delta^kt^n , volume = 73 , year = 1989, doi-access = free
{{citation , last = Broder , first = Andrei Z. , authorlink = Andrei Broder , doi = 10.1016/0012-365X(84)90161-4 , issue = 3 , journal =
Discrete Mathematics Discrete mathematics is the study of mathematical structures that can be considered "discrete" (in a way analogous to discrete variables, having a bijection with the set of natural numbers) rather than "continuous" (analogously to continuous f ...
, mr = 743795 , pages = 241–259 , title = The r-Stirling numbers , volume = 49 , year = 1984, doi-access = free
{{citation , last = Blanchet-Sadri , first = Francine , isbn = 978-1-4200-6092-8 , mr = 2384993 , publisher = Chapman & Hall/CRC , location = Boca Raton, Florida , series = Discrete Mathematics and its Applications , title = Algorithmic Combinatorics on Partial Words , title-link = Algorithmic Combinatorics on Partial Words , year = 2008 {{citation , last = Prömel , first = Hans Jürgen , contribution = Hales–Jewett's Theorem , doi = 10.1007/978-3-319-01315-2_4 , pages = 41–51 , publisher = Springer International Publishing , title = Ramsey Theory for Discrete Structures , year = 2013 {{citation , last = Prömel , first = Hans Jürgen , doi = 10.1023/A:1020879709125 , issue = 1-2 , journal =
Synthese ''Synthese'' () is a scholarly periodical specializing in papers in epistemology, methodology, and philosophy of science, and related issues. Its subject area is divided into four specialties, with a focus on the first three: (1) "epistemology, me ...
, jstor = 20117296 , mr = 1950045 , pages = 87–105 , title = Large numbers, Knuth's arrow notation, and Ramsey theory , volume = 133 , year = 2002
Combinatorics on words