mathematics
Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
, branching from
combinatorics
Combinatorics is an area of mathematics primarily concerned with counting, both as a means and as an end to obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many ...
formal language
In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet".
The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
s. The subject looks at letters or
symbol
A symbol is a mark, Sign (semiotics), sign, or word that indicates, signifies, or is understood as representing an idea, physical object, object, or wikt:relationship, relationship. Symbols allow people to go beyond what is known or seen by cr ...
s, and the
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
s they form. Combinatorics on words affects various areas of mathematical study, including
algebra
Algebra is a branch of mathematics that deals with abstract systems, known as algebraic structures, and the manipulation of expressions within those systems. It is a generalization of arithmetic that introduces variables and algebraic ope ...
and
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
. There have been a wide range of contributions to the field. Some of the first work was on square-free words by
Axel Thue
Axel Thue (; 19 February 1863 – 7 March 1922) was a Norwegian mathematician, known for his original work in diophantine approximation and combinatorics.
Work
Thue published his first important paper in 1909.
He stated in 1914 the so-called w ...
in the early 1900s. He and colleagues observed patterns within words and tried to explain them. As time went on, combinatorics on words became useful in the study of
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
abstract algebra
In mathematics, more specifically algebra, abstract algebra or modern algebra is the study of algebraic structures, which are set (mathematics), sets with specific operation (mathematics), operations acting on their elements. Algebraic structur ...
and answering open questions.
Definition
Combinatorics is an area of
discrete mathematics
Discrete mathematics is the study of mathematical structures that can be considered "discrete" (in a way analogous to discrete variables, having a bijection with the set of natural numbers) rather than "continuous" (analogously to continuous f ...
. Discrete mathematics is the study of countable structures. These objects have a definite beginning and end. The study of enumerable objects is the opposite of disciplines such as
analysis
Analysis (: analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (38 ...
, where
calculus
Calculus is the mathematics, mathematical study of continuous change, in the same way that geometry is the study of shape, and algebra is the study of generalizations of arithmetic operations.
Originally called infinitesimal calculus or "the ...
and infinite structures are studied. Combinatorics studies how to count these objects using various representations. Combinatorics on words is a recent development in this field that focuses on the study of words and formal languages. A formal language is any set of symbols and combinations of symbols that people use to communicate information.
Some terminology relevant to the study of words should first be explained. First and foremost, a word is basically a sequence of symbols, or letters, in a
finite set
In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example,
is a finite set with five elements. Th ...
. One of these sets is known by the general public as the
alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
. For example, the word "encyclopedia" is a sequence of symbols in the
English alphabet
Modern English is written with a Latin-script alphabet consisting of 26 Letter (alphabet), letters, with each having both uppercase and lowercase forms. The word ''alphabet'' is a Compound (linguistics), compound of ''alpha'' and ''beta'', t ...
, a finite set of twenty-six letters. Since a word can be described as a sequence, other basic mathematical descriptions can be applied. The alphabet is a
set
Set, The Set, SET or SETS may refer to:
Science, technology, and mathematics Mathematics
*Set (mathematics), a collection of elements
*Category of sets, the category whose objects and morphisms are sets and total functions, respectively
Electro ...
, so as one would expect, the
empty set
In mathematics, the empty set or void set is the unique Set (mathematics), set having no Element (mathematics), elements; its size or cardinality (count of elements in a set) is 0, zero. Some axiomatic set theories ensure that the empty set exi ...
is a
subset
In mathematics, a Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they a ...
. In other words, there exists a unique word of length zero. The length of the word is defined by the number of symbols that make up the sequence, and is denoted by . Again looking at the example "encyclopedia", , since encyclopedia has twelve letters. The idea of factoring of large numbers can be applied to words, where a factor of a word is a block of consecutive symbols. Thus, "cyclop" is a factor of "encyclopedia".
In addition to examining sequences in themselves, another area to consider of combinatorics on words is how they can be represented visually. In mathematics various structures are used to encode data. A common structure used in combinatorics is the
tree structure
A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is gen ...
. A tree structure is a graph where the vertices are connected by one line, called a path or edge. Trees may not contain cycles, and may or may not be complete. It is possible to encode a word, since a word is constructed by symbols, and encode the data by using a tree. This gives a visual representation of the object.
Major contributions
The first books on combinatorics on words that summarize the origins of the subject were written by a group of mathematicians that collectively went by the name of M. Lothaire. Their first book was published in 1983, when combinatorics on words became more widespread.
Patterns
Patterns within words
A main contributor to the development of combinatorics on words was
Axel Thue
Axel Thue (; 19 February 1863 – 7 March 1922) was a Norwegian mathematician, known for his original work in diophantine approximation and combinatorics.
Work
Thue published his first important paper in 1909.
He stated in 1914 the so-called w ...
(1863–1922); he researched repetition. Thue's main contribution was the proof of the existence of infinite square-free words. Square-free words do not have adjacent repeated factors. To clarify, "dining" is not square-free since "in" is repeated consecutively, while "servers" is square-free, its two "er" factors not being adjacent. Thue proves his conjecture on the existence of infinite square-free words by using substitutions. A substitution is a way to take a symbol and replace it with a word. He uses this technique to describe his other contribution, the
Thue–Morse sequence
In mathematics, the Thue–Morse or Prouhet–Thue–Morse sequence is the binary sequence (an infinite sequence of 0s and 1s) that can be obtained by starting with 0 and successively appending the Boolean complement of the sequence obtained thus ...
, or Thue–Morse word.
Thue wrote two papers on square-free words, the second of which was on the Thue–Morse word. Marston Morse is included in the name because he discovered the same result as Thue did, yet they worked independently. Thue also proved the existence of an overlap-free word. An overlap-free word is when, for two symbols and , the pattern does not exist within the word. He continues in his second paper to prove a relationship between infinite overlap-free words and square-free words. He takes overlap-free words that are created using two different letters, and demonstrates how they can be transformed into square-free words of three letters using substitution.
As was previously described, words are studied by examining the sequences made by the symbols. Patterns are found, and they can be described mathematically. Patterns can be either avoidable patterns, or unavoidable. A significant contributor to the work of unavoidable patterns, or regularities, was Frank Ramsey in 1930. His important theorem states that for integers , , there exists a least positive integer such that despite how a complete graph is colored with two colors, there will always exist a solid color subgraph of each color.
Other contributors to the study of unavoidable patterns include van der Waerden. His theorem states that if the positive integers are partitioned into classes, then there exists a class such that contains an arithmetic progression of some unknown length. An
arithmetic progression
An arithmetic progression or arithmetic sequence is a sequence of numbers such that the difference from any succeeding term to its preceding term remains constant throughout the sequence. The constant difference is called common difference of that ...
is a sequence of numbers in which the difference between adjacent numbers remains constant.
When examining unavoidable patterns sesquipowers are also studied. For some patterns ,,, a sesquipower is of the form , , , . This is another pattern such as square-free, or unavoidable patterns. Coudrain and Schützenberger mainly studied these sesquipowers for
group theory
In abstract algebra, group theory studies the algebraic structures known as group (mathematics), groups.
The concept of a group is central to abstract algebra: other well-known algebraic structures, such as ring (mathematics), rings, field ( ...
applications. In addition, Zimin proved that sesquipowers are all unavoidable. Whether the entire pattern shows up, or only some piece of the sesquipower shows up repetitively, it is not possible to avoid it.
Patterns within alphabets
Necklaces are constructed from words of circular sequences. They are most frequently used in
music
Music is the arrangement of sound to create some combination of Musical form, form, harmony, melody, rhythm, or otherwise Musical expression, expressive content. Music is generally agreed to be a cultural universal that is present in all hum ...
and
astronomy
Astronomy is a natural science that studies celestial objects and the phenomena that occur in the cosmos. It uses mathematics, physics, and chemistry in order to explain their origin and their overall evolution. Objects of interest includ ...
. Flye Sainte-Marie in 1894 proved there are binaryde Bruijn necklaces of length . A de Bruijn necklace contains factors made of words of length n over a certain number of letters. The words appear only once in the necklace.
In 1874, Baudot developed the code that would eventually take the place of
Morse code
Morse code is a telecommunications method which Character encoding, encodes Written language, text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code i ...
by applying the theory of binary de Bruijn necklaces. The problem continued from Sainte-Marie to
Martin Martin may refer to:
Places Antarctica
* Martin Peninsula, Marie Byrd Land
* Port Martin, Adelie Land
* Point Martin, South Orkney Islands
Europe
* Martin, Croatia, a village
* Martin, Slovakia, a city
* Martín del Río, Aragón, Spain
* M ...
in 1934, who began looking at algorithms to make words of the de Bruijn structure. It was then worked on by Klaas Posthumus in 1943.
Language hierarchy
Possibly the most applied result in combinatorics on words is the
Chomsky hierarchy
The Chomsky hierarchy in the fields of formal language theory, computer science, and linguistics, is a containment hierarchy of classes of formal grammars. A formal grammar describes how to form strings from a formal language's alphabet that are v ...
, developed by
Noam Chomsky
Avram Noam Chomsky (born December 7, 1928) is an American professor and public intellectual known for his work in linguistics, political activism, and social criticism. Sometimes called "the father of modern linguistics", Chomsky is also a ...
. He studied formal language in the 1950s. His way of looking at language simplified the subject. He disregards the actual meaning of the word, does not consider certain factors such as frequency and context, and applies patterns of short terms to all length terms. The basic idea of Chomsky's work is to divide language into four levels, or the language
hierarchy
A hierarchy (from Ancient Greek, Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy ...
computably enumerable
In computability theory, a set ''S'' of natural numbers is called computably enumerable (c.e.), recursively enumerable (r.e.), semidecidable, partially decidable, listable, provable or Turing-recognizable if:
*There is an algorithm such that the ...
or unrestricted. Regular is the least complex while computably enumerable is the most complex. While his work grew out of combinatorics on words, it drastically affected other disciplines, especially
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
.
Word types
Sturmian words
Sturmian words, created by François Sturm, have roots in combinatorics on words. There exist several equivalent definitions of Sturmian words. For example, an infinite word is Sturmian if and only if it has distinct factors of length , for every non-negative integer .
Lyndon word
A Lyndon word is a word over a given alphabet that is written in its simplest and most ordered form out of its respective
conjugacy class
In mathematics, especially group theory, two elements a and b of a group are conjugate if there is an element g in the group such that b = gag^. This is an equivalence relation whose equivalence classes are called conjugacy classes. In other ...
. Lyndon words are important because for any given Lyndon word , there exists Lyndon words and , with