Combinatorics on words is a fairly new field of mathematics, branching from
combinatorics
Combinatorics is an area of mathematics primarily concerned with counting, both as a means and an end in obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many a ...
, which focuses on the study of
words
A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...
and
formal language
In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.
The alphabet of a formal language consists of s ...
s. The subject looks at letters or
symbol
A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
s, and the
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...
s they form. Combinatorics on words affects various areas of mathematical study, including
algebra
Algebra () is one of the areas of mathematics, broad areas of mathematics. Roughly speaking, algebra is the study of mathematical symbols and the rules for manipulating these symbols in formulas; it is a unifying thread of almost all of mathem ...
and
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
. There have been a wide range of contributions to the field. Some of the first work was on
square-free word In combinatorics, a squarefree word is a word (a sequence of symbols) that does not contain any squares. A square is a word of the form , where is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern .
Finite ...
s by Axel Thue in the early 1900s. He and colleagues observed patterns within words and tried to explain them. As time went on, combinatorics on words became useful in the study of
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
abstract algebra
In mathematics, more specifically algebra, abstract algebra or modern algebra is the study of algebraic structures. Algebraic structures include groups, rings, fields, modules, vector spaces, lattices, and algebras over a field. The te ...
and answering open questions.
Definition
Combinatorics is an area of
discrete mathematics
Discrete mathematics is the study of mathematical structures that can be considered "discrete" (in a way analogous to discrete variables, having a bijection with the set of natural numbers) rather than "continuous" (analogously to continu ...
. Discrete mathematics is the study of countable structures. These objects have a definite beginning and end. The study of enumerable objects is the opposite of disciplines such as
analysis
Analysis ( : analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (3 ...
, where
calculus
Calculus, originally called infinitesimal calculus or "the calculus of infinitesimals", is the mathematics, mathematical study of continuous change, in the same way that geometry is the study of shape, and algebra is the study of generalizati ...
and infinite structures are studied. Combinatorics studies how to count these objects using various representations. Combinatorics on words is a recent development in this field that focuses on the study of words and formal languages. A formal language is any set of symbols and combinations of symbols that people use to communicate information.
Some terminology relevant to the study of words should first be explained. First and foremost, a word is basically a sequence of symbols, or letters, in a
finite set
In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example,
:\
is a finite set with five elements. ...
. One of these sets is known by the general public as the
alphabet
An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a s ...
. For example, the word "encyclopedia" is a sequence of symbols in the
English alphabet
The alphabet for Modern English is a Latin-script alphabet consisting of 26 Letter (alphabet), letters, each having an Letter case, upper- and lower-case form. The word ''alphabet'' is a Compound (linguistics), compound of the first two lett ...
, a finite set of twenty-six letters. Since a word can be described as a sequence, other basic mathematical descriptions can be applied. The alphabet is a set, so as one would expect, the
empty set
In mathematics, the empty set is the unique set having no elements; its size or cardinality (count of elements in a set) is zero. Some axiomatic set theories ensure that the empty set exists by including an axiom of empty set, while in oth ...
is a
subset
In mathematics, set ''A'' is a subset of a set ''B'' if all elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are unequal, then ''A'' is a proper subset o ...
. In other words, there exists a unique word of length zero. The length of the word is defined by the number of symbols that make up the sequence, and is denoted by , ''w'', . Again looking at the example "encyclopedia", , ''w'', = 12, since encyclopedia has twelve letters. The idea of factoring of large numbers can be applied to words, where a factor of a word is a block of consecutive symbols. Thus, "cyclop" is a factor of "encyclopedia".
In addition to examining sequences in themselves, another area to consider of combinatorics on words is how they can be represented visually. In mathematics various structures are used to encode data. A common structure used in combinatorics is the
tree structure
A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is genera ...
. A tree structure is a
graph
Graph may refer to:
Mathematics
*Graph (discrete mathematics), a structure made of vertices and edges
**Graph theory, the study of such graphs and their properties
*Graph (topology), a topological space resembling a graph in the sense of discre ...
where the vertices are connected by one line, called a path or edge. Trees may not contain cycles, and may or may not be complete. It is possible to encode a word, since a word is constructed by symbols, and encode the data by using a tree. This gives a visual representation of the object.
Major contributions
The first books on combinatorics on words that summarize the origins of the subject were written by a group of mathematicians that collectively went by the name of M. Lothaire. Their first book was published in 1983, when combinatorics on words became more widespread.
Patterns
Patterns within words
A main contributor to the development of combinatorics on words was Axel Thue (1863–1922); he researched repetition. Thue's main contribution was the proof of the existence of infinite
square-free word In combinatorics, a squarefree word is a word (a sequence of symbols) that does not contain any squares. A square is a word of the form , where is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern .
Finite ...
s. Square-free words do not have adjacent repeated factors. To clarify, "summer" is not square-free since m is repeated consecutively, while "encyclopedia" is square-free. Thue proves his conjecture on the existence of infinite square-free words by using substitutions. A substitution is a way to take a symbol and replace it with a word. He uses this technique to describe his other contribution, the Thue–Morse sequence, or Thue–Morse word.
Thue wrote two papers on square-free words, the second of which was on the Thue–Morse word. Marston Morse is included in the name because he discovered the same result as Thue did, yet they worked independently. Thue also proved the existence of an overlap-free word. An overlap-free word is when, for two symbols x and y, the pattern does not exist within the word. He continues in his second paper to prove a relationship between infinite overlap-free words and square-free words. He takes overlap-free words that are created using two different letters, and demonstrates how they can be transformed into square-free words of three letters using substitution.
As was previously described, words are studied by examining the sequences made by the symbols. Patterns are found, and they are able to be described mathematically. Patterns can be either avoidable patterns, or unavoidable. A significant contributor to the work of unavoidable patterns, or regularities, was Frank Ramsey in 1930. His important theorem states that for integers k, m≥2, there exists a least positive integer such that despite how a complete graph is colored with two colors, there will always exist a solid color subgraph of each color.
Other contributors to the study of unavoidable patterns include van der Waerden. His theorem states that if the positive integers are partitioned into k classes, then there exists a class c such that c contains an arithmetic progression of some unknown length. An arithmetic progression is a sequence of numbers in which the difference between adjacent numbers remains constant.
When examining unavoidable patterns sesquipowers are also studied. For some patterns x,y,z, a sesquipower is of the form x, , , .... This is another pattern such as square-free, or unavoidable patterns. Coudrain and
Schützenberger Schützenberger may refer to these people:
* Anne Ancelin Schützenberger (1919–2018) (de)
* Paul Schützenberger, French chemist
* René Schützenberger, French painter
* Marcel-Paul "Marco" Schützenberger, French mathematician and Doctor of M ...
mainly studied these sesquipowers for
group theory
In abstract algebra, group theory studies the algebraic structures known as group (mathematics), groups.
The concept of a group is central to abstract algebra: other well-known algebraic structures, such as ring (mathematics), rings, field ...
applications. In addition, Zimin proved that sesquipowers are all unavoidable. Whether the entire pattern shows up, or only some piece of the sesquipower shows up repetitively, it is not possible to avoid it.
Patterns within alphabets
Necklaces are constructed from words of circular sequences. They are most frequently used in
music
Music is generally defined as the The arts, art of arranging sound to create some combination of Musical form, form, harmony, melody, rhythm or otherwise Musical expression, expressive content. Exact definition of music, definitions of mu ...
and
astronomy
Astronomy () is a natural science that studies astronomical object, celestial objects and phenomena. It uses mathematics, physics, and chemistry in order to explain their origin and chronology of the Universe, evolution. Objects of interest ...
. Flye Sainte-Marie in 1894 proved there are 22''n''−1−''n''binaryde Bruijn necklaces of length 2n. A de Bruijn necklace contains factors made of words of length n over a certain number of letters. The words appear only once in the necklace.
In 1874, Baudot developed the code that would eventually take the place of
Morse code
Morse code is a method used in telecommunication to encode text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code is named after Samuel Morse, one ...
by applying the theory of binary de Bruijn necklaces. The problem continued from Sainte-Marie to Martin in 1934, who began looking at algorithms to make words of the de Bruijn structure. It was then worked on by Posthumus in 1943.
Language hierarchy
Possibly the most applied result in combinatorics on words is the
Chomsky hierarchy
In formal language theory, computer science and linguistics, the Chomsky hierarchy (also referred to as the Chomsky–Schützenberger hierarchy) is a containment hierarchy of classes of formal grammars.
This hierarchy of grammars was described ...
, developed by
Noam Chomsky
Avram Noam Chomsky (born December 7, 1928) is an American public intellectual: a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky is ...
. He studied formal language in the 1950s. His way of looking at language simplified the subject. He disregards the actual meaning of the word, does not consider certain factors such as frequency and context, and applies patterns of short terms to all length terms. The basic idea of Chomsky's work is to divide language into four levels, or the language
hierarchy
A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an important ...
. The four levels are:
regular
The term regular can mean normal or in accordance with rules. It may refer to:
People
* Moses Regular (born 1971), America football player
Arts, entertainment, and media Music
* "Regular" (Badfinger song)
* Regular tunings of stringed instrum ...
, context-free, context-sensitive, and computably enumerable or unrestricted. Regular is the least complex while computably enumerable is the most complex. While his work grew out of combinatorics on words, it drastically affected other disciplines, especially
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
.
Word types
Sturmian words
Sturmian word
In mathematics, a Sturmian word (Sturmian sequence or billiard sequence), named after Jacques Charles François Sturm, is a certain kind of infinitely long sequence of characters. Such a sequence can be generated by considering a game of English ...
s, created by François Sturm, have roots in combinatorics on words. There exist several equivalent definitions of Sturmian words. For example, an infinite word is Sturmian if and only if it has n+1 distinct factors of length n, for every non-negative integer n.
Lyndon word
A Lyndon word is a word over a given alphabet that is written in its simplest and most ordered form out of its respective conjugacy class. Lyndon words are important because for any given Lyndon word x, there exists Lyndon words y and z, with ytheorem by Chen, Fox, and Lyndon, that states any word has a unique factorization of Lyndon words, where the factorization words are non-increasing. Due to this property, Lyndon words are used to study
algebra
Algebra () is one of the areas of mathematics, broad areas of mathematics. Roughly speaking, algebra is the study of mathematical symbols and the rules for manipulating these symbols in formulas; it is a unifying thread of almost all of mathem ...
, specifically
group theory
In abstract algebra, group theory studies the algebraic structures known as group (mathematics), groups.
The concept of a group is central to abstract algebra: other well-known algebraic structures, such as ring (mathematics), rings, field ...
. They form the basis for the idea of
commutators
In mathematics, the commutator gives an indication of the extent to which a certain binary operation fails to be commutative. There are different definitions used in group theory and ring theory.
Group theory
The commutator of two elements, a ...
.
Visual representation
Cobham Cobham may refer to:
Geography Towns or districts
* Cobham, Kent, England
* Cobham, Surrey, England
* Cobham, South Australia, a former town in Australia
* Cobham, Albemarle County, Virginia, United States
* Cobham, Surry County, Virginia, U ...
finite automata
A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
. A mathematical graph is made of edges and
node
In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex).
Node may refer to:
In mathematics
* Vertex (graph theory), a vertex in a mathematical graph
* Vertex (geometry), a point where two or more curves, line ...
s. With finite automata, the edges are labeled with a letter in an alphabet. To use the graph, one starts at a node and travels along the edges to reach a final node. The path taken along the graph forms the word. It is a finite graph because there are a countable number of nodes and edges, and only one path connects two distinct nodes.Gauss codes, created by
Carl Friedrich Gauss
Johann Carl Friedrich Gauss (; german: Gauß ; la, Carolus Fridericus Gauss; 30 April 177723 February 1855) was a German mathematician and physicist who made significant contributions to many fields in mathematics and science. Sometimes refe ...
in 1838, are developed from graphs. Specifically, a
closed curve
In mathematics, a curve (also called a curved line in older texts) is an object similar to a line, but that does not have to be straight.
Intuitively, a curve may be thought of as the trace left by a moving point. This is the definition tha ...
on a plane is needed. If the curve only crosses over itself a finite number of times, then one labels the intersections with a letter from the alphabet used. Traveling along the curve, the word is determined by recording each letter as an intersection is passed. Gauss noticed that the distance between when the same symbol shows up in a word is an even integer.
Group theory
Walther Franz Anton von Dyck
Walther Franz Anton von Dyck (6 December 1856 – 5 November 1934), born Dyck () and later ennobled, was a German mathematician. He is credited with being the first to define a mathematical group, in the modern sense in . He laid the foundatio ...
began the work of combinatorics on words in group theory by his published work in 1882 and 1883. He began by using words as group elements.
Lagrange
Joseph-Louis Lagrange (born Giuseppe Luigi Lagrangiapermutation groups.
One aspect of combinatorics on words studied in group theory is reduced words. A group is constructed with words on some alphabet including generators and
inverse element
In mathematics, the concept of an inverse element generalises the concepts of opposite () and reciprocal () of numbers.
Given an operation denoted here , and an identity element denoted , if , one says that is a left inverse of , and that ...
s, excluding factors that appear of the form aā or āa, for some a in the alphabet. Reduced words are formed when the factors aā, āa are used to cancel out elements until a unique word is reached.Nielsen transformations were also developed. For a set of elements of a
free group
In mathematics, the free group ''F'S'' over a given set ''S'' consists of all words that can be built from members of ''S'', considering two words to be different unless their equality follows from the group axioms (e.g. ''st'' = ''suu''− ...
, a Nielsen transformation is achieved by three transformations; replacing an element with its inverse, replacing an element with the product of itself and another element, and eliminating any element equal to 1. By applying these transformations Nielsen reduced sets are formed. A reduced set means no element can be multiplied by other elements to cancel out completely. There are also connections with Nielsen transformations with Sturmian words.
Considered problems
One problem considered in the study of combinatorics on words in group theory is the following: for two elements x,y of a
semigroup
In mathematics, a semigroup is an algebraic structure consisting of a Set (mathematics), set together with an associative internal binary operation on it.
The binary operation of a semigroup is most often denoted multiplication, multiplicatively ...
, does x=y
modulo
In computing, the modulo operation returns the remainder or signed remainder of a division, after one number is divided by another (called the '' modulus'' of the operation).
Given two positive numbers and , modulo (often abbreviated as ) is t ...
Markov Markov ( Bulgarian, russian: Марков), Markova, and Markoff are common surnames used in Russia and Bulgaria. Notable people with the name include:
Academics
* Ivana Markova (born 1938), Czechoslovak-British emeritus professor of psychology at ...
studied this problem and determined it undecidable. Undecidable means the theory cannot be proved.
The Burnside question was proved using the existence of an infinite
cube-free word In combinatorics, a squarefree word is a word (a sequence of symbols) that does not contain any squares. A square is a word of the form , where is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern .
Finite ...
. This question asks if a group is finite if the group has a definite number of generators and meets the criteria xn=1, for x in the group.
Many word problems are undecidable based on the Post correspondence problem. Any two
homomorphisms
In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type (such as two groups, two rings, or two vector spaces). The word ''homomorphism'' comes from the Ancient Greek language: () meaning "same" ...
with a common domain and a common codomain form an instance of the Post correspondence problem, which asks whether there exists a word in the domain such that . Post proved that this problem is undecidable; consequently, any word problem that can be reduced to this basic problem is likewise undecidable.
Other applications
Combinatorics on words have applications on
equations
In mathematics, an equation is a formula that expresses the equality of two expressions, by connecting them with the equals sign . The word ''equation'' and its cognates in other languages may have subtly different meanings; for example, i ...
. Makanin proved that it is possible to find a solution for a finite system of equations, when the equations are constructed from words.
Word problem (computability)
In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question of the input values. An example of a decision problem is deciding by means of an algorithm whethe ...
Algorithmic Combinatorics on Partial Words
''Algorithmic Combinatorics on Partial Words'' is a book in the area of combinatorics on words, and more specifically on partial words. It was written by Francine Blanchet-Sadri, and published in 2008 by Chapman & Hall/CRC in their Discrete Mathem ...
", Francine Blanchet-Sadri, CRC Press, 2008, .
*
*"Combinatorics of Compositions and Words", Silvia Heubach,
Toufik Mansour
Toufik Mansour is an Israeli mathematician working in algebraic combinatorics. He is a member of the Druze community and is the first Israeli Druze to become a professional mathematician.
Mansour obtained his Ph.D. in mathematics from the Unive ...
, CRC Press, 2009, .
*"Word equations and related topics: 1st international workshop, IWWERT '90, Tübingen, Germany, October 1–3, 1990 : proceedings", Editor: Klaus Ulrich Schulz, Springer, 1992,
*" Jewels of stringology: text algorithms", Maxime Crochemore,
Wojciech Rytter Wojciech Rytter is a Polish computer scientist, a professor of computer science in the automata theory group at the University of Warsaw. His research focuses on the design and analysis of algorithms, and in particular on stringology, the study of a ...
, World Scientific, 2003,
*
*
*"Patterns in Permutations and Words", Sergey Kitaev, Springer, 2011,
*"Distribution Modulo One and Diophantine Approximation", Yann Bugeaud, Cambridge University Press, 2012,