In
abstract algebra
In mathematics, more specifically algebra, abstract algebra or modern algebra is the study of algebraic structures, which are set (mathematics), sets with specific operation (mathematics), operations acting on their elements. Algebraic structur ...
, the free monoid on a
set
Set, The Set, SET or SETS may refer to:
Science, technology, and mathematics Mathematics
*Set (mathematics), a collection of elements
*Category of sets, the category whose objects and morphisms are sets and total functions, respectively
Electro ...
is the
monoid
In abstract algebra, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being .
Monoids are semigroups with identity ...
whose elements are all the
finite sequences (or strings) of zero or more elements from that set, with
string concatenation
In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalizations of concatenati ...
as the monoid operation and with the unique sequence of zero elements, often called the
empty string
In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero.
Formal theory
Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...
and denoted by ε or λ, as the
identity element
In mathematics, an identity element or neutral element of a binary operation is an element that leaves unchanged every element when the operation is applied. For example, 0 is an identity element of the addition of real numbers. This concept is use ...
. The free monoid on a set ''A'' is usually denoted ''A''
∗. The free semigroup on ''A'' is the sub
semigroup
In mathematics, a semigroup is an algebraic structure consisting of a set together with an associative internal binary operation on it.
The binary operation of a semigroup is most often denoted multiplicatively (just notation, not necessarily th ...
of ''A''
∗ containing all elements except the empty string. It is usually denoted ''A''
+.
/ref>
More generally, an abstract monoid (or semigroup) ''S'' is described as free if it is isomorphic
In mathematics, an isomorphism is a structure-preserving mapping or morphism between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between the ...
to the free monoid (or semigroup) on some set.
As the name implies, free monoids and semigroups are those objects which satisfy the usual universal property
In mathematics, more specifically in category theory, a universal property is a property that characterizes up to an isomorphism the result of some constructions. Thus, universal properties can be used for defining some objects independently fro ...
defining free object
In mathematics, the idea of a free object is one of the basic concepts of abstract algebra. Informally, a free object over a set ''A'' can be thought of as being a "generic" algebraic structure over ''A'': the only equations that hold between elem ...
s, in the respective categories
Category, plural categories, may refer to:
General uses
*Classification, the general act of allocating things to classes/categories Philosophy
*Category of being
* ''Categories'' (Aristotle)
*Category (Kant)
*Categories (Peirce)
*Category (Vais ...
of monoids and semigroups. It follows that every monoid (or semigroup) arises as a homomorphic image of a free monoid (or semigroup). The study of semigroups as images of free semigroups is called combinatorial semigroup theory.
Free monoids (and monoids in general) are associative
In mathematics, the associative property is a property of some binary operations that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a valid rule of replacement for express ...
, by definition; that is, they are written without any parenthesis to show grouping or order of operation. The non-associative equivalent is the free magma.
Examples
Natural numbers
The monoid (N0,+) of natural numbers
In mathematics, the natural numbers are the numbers 0, 1, 2, 3, and so on, possibly excluding 0. Some start counting with 0, defining the natural numbers as the non-negative integers , while others start with 1, defining them as the positiv ...
(including zero) under addition is a free monoid on a singleton free generator, in this case, the natural number 1.
According to the formal definition, this monoid consists of all sequences like "1", "1+1", "1+1+1", "1+1+1+1", and so on, including the empty sequence.
Mapping each such sequence to its evaluation result
and the empty sequence to zero establishes an isomorphism from the set of such sequences to N0.
This isomorphism is compatible with "+", that is, for any two sequences ''s'' and ''t'', if ''s'' is mapped (i.e. evaluated) to a number ''m'' and ''t'' to ''n'', then their concatenation ''s''+''t'' is mapped to the sum ''m''+''n''.
Kleene star
In formal language
In logic, mathematics, computer science, and linguistics, a formal language is a set of strings whose symbols are taken from a set called "alphabet".
The alphabet of a formal language consists of symbols that concatenate into strings (also c ...
theory, usually a finite set of "symbols" A (sometimes called the alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
) is considered. A finite sequence of symbols is called a "word over ''A''", and the free monoid ''A''∗ is called the "Kleene star
In mathematical logic and theoretical computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation on a Set (mathematics), set to generate a set of all finite-length strings that are composed of zero or more repe ...
of ''A''".
Thus, the abstract study of formal languages can be thought of as the study of subsets of finitely generated free monoids.
For example, assuming an alphabet ''A'' = , its Kleene star ''A''∗ contains all concatenations of ''a'', ''b'', and ''c'':
:.
If ''A'' is any set, the ''word length'' function on ''A''∗ is the unique monoid homomorphism
In abstract algebra, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being .
Monoids are semigroups with identity ...
from ''A''∗ to (N0,+) that maps each element of ''A'' to 1. A free monoid is thus a graded monoid.[Sakarovitch (2009) p.382] (A graded monoid is a monoid that can be written as . Each is a grade; the grading here is just the length of the string. That is, contains those strings of length The symbol here can be taken to mean "set union"; it is used instead of the symbol because, in general, set unions might not be monoids, and so a distinct symbol is used. By convention, gradations are always written with the symbol.)
There are deep connections between the theory of semigroup
In mathematics, a semigroup is an algebraic structure consisting of a set together with an associative internal binary operation on it.
The binary operation of a semigroup is most often denoted multiplicatively (just notation, not necessarily th ...
s and that of automata
An automaton (; : automata or automatons) is a relatively self-operating machine, or control mechanism designed to automatically follow a sequence of operations, or respond to predetermined instructions. Some automata, such as bellstrikers i ...
. For example, every formal language has a syntactic monoid
In mathematics and computer science, the syntactic monoid M(L) of a formal language L is the minimal monoid that recognizes the language L. By the Myhill–Nerode theorem, the syntactic monoid is unique up to unique isomorphism.
Syntactic quot ...
that recognizes that language. For the case of a regular language
In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
, that monoid is isomorphic to the transition monoid In mathematics and theoretical computer science, a semiautomaton is a deterministic finite automaton having inputs but no output. It consists of a set ''Q'' of states, a set Σ called the input alphabet, and a function ''T'': ''Q'' × Σ → ''Q'' c ...
associated to the semiautomaton In mathematics and theoretical computer science, a semiautomaton is a deterministic finite automaton having inputs but no output. It consists of a set ''Q'' of states, a set Σ called the input alphabet, and a function ''T'': ''Q'' × Σ → ''Q'' c ...
of some deterministic finite automaton
In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state auto ...
that recognizes that language. The regular languages over an alphabet A are the closure of the finite subsets of A*, the free monoid over A, under union, product, and generation of submonoid.
For the case of concurrent computation
Concurrent computing is a form of computing in which several computations are executed ''Concurrency (computer science), concurrently''—during overlapping time periods—instead of ''sequentially—''with one completing before the next starts. ...
, that is, systems with locks, mutex
In computer science, a lock or mutex (from mutual exclusion) is a synchronization primitive that prevents state from being modified or accessed by multiple threads of execution at once. Locks enforce mutual exclusion concurrency control policies, ...
es or thread joins, the computation can be described with history monoid In mathematics and computer science, a history monoid is a way of representing the histories of concurrently running computer processes as a collection of strings, each string representing the individual history of a process. The history monoid pr ...
s and trace monoid
In computer science, a trace is an equivalence class of strings, wherein certain letters in the string are allowed to commute, but others are not. Traces generalize the concept of strings by relaxing the requirement for all the letters to have a d ...
s. Roughly speaking, elements of the monoid can commute, (e.g. different threads can execute in any order), but only up to a lock or mutex, which prevent further commutation (e.g. serialize thread access to some object).
Conjugate words
We define a pair of words in ''A''∗ of the form ''uv'' and ''vu'' as conjugate: the conjugates of a word are thus its circular shift
In combinatorial mathematics, a circular shift is the operation of rearranging the entries in a tuple, either by moving the final entry to the first position, while shifting all other entries to the next position, or by performing the inverse ope ...
s.[Sakarovitch (2009) p.27] Two words are conjugate in this sense if they are conjugate in the sense of group theory as elements of the free group
In mathematics, the free group ''F'S'' over a given set ''S'' consists of all words that can be built from members of ''S'', considering two words to be different unless their equality follows from the group axioms (e.g. ''st'' = ''suu''− ...
generated by ''A''.
Equidivisibility
A free monoid is equidivisible: if the equation ''mn'' = ''pq'' holds, then there exists an ''s'' such that either ''m'' = ''ps'', ''sn'' = ''q'' (example see image) or ''ms'' = ''p'', ''n'' = ''sq''.[Sakarovitch (2009) p.26] This result is also known as Levi's lemma
In theoretical computer science and mathematics, especially in the area of combinatorics on words, the Levi lemma states that, for all string (computer science), strings ''u'', ''v'', ''x'' and ''y'', if ''uv'' = ''xy'', then there exist ...
.
A monoid is free if and only if it is graded (in the strong sense that only the identity has gradation 0) and equidivisible.[
]
Free generators and rank
The members of a set ''A'' are called the free generators for ''A''∗ and ''A''+. The superscript * is then commonly understood to be the Kleene star
In mathematical logic and theoretical computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation on a Set (mathematics), set to generate a set of all finite-length strings that are composed of zero or more repe ...
. More generally, if ''S'' is an abstract free monoid (semigroup), then a set of elements which maps onto the set of single-letter words under an isomorphism to a monoid ''A''∗ (semigroup ''A+'') is called a ''set of free generators'' for ''S''.
Each free monoid (or semigroup) ''S'' has exactly one set of free generators, the cardinality
The thumb is the first digit of the hand, next to the index finger. When a person is standing in the medical anatomical position (where the palm is facing to the front), the thumb is the outermost digit. The Medical Latin English noun for thum ...
of which is called the ''rank'' of ''S''.
Two free monoids or semigroups are isomorphic if and only if they have the same rank. In fact, ''every'' set of generators for a free monoid or semigroup ''S'' contains the free generators, since a free generator has word length 1 and hence can only be generated by itself. It follows that a free semigroup or monoid is finitely generated if and only if it has finite rank.
A submonoid
In abstract algebra, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being .
Monoids are semigroups with identity ...
''N'' of ''A''∗ is stable if ''u'', ''v'', ''ux'', ''xv'' in ''N'' together imply ''x'' in ''N''. A submonoid of ''A''∗ is stable if and only if it is free.
For example, using the set of bit
The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as ...
s as ''A'', the set ''N'' of all bit strings containing an even number of "1"s is a stable submonoid because if ''u'' contains an even number of "1"s, and ''ux'' as well, then ''x'' must contain an even number of "1"s, too. While ''N'' cannot be freely generated by any set of single bits, it ''can'' be freely generated by the set of bit strings – the set of strings of the form "10''n''1" for some nonnegative integer ''n'' (along with the string "0").
Codes
A set of free generators for a free monoid ''P'' is referred to as a basis for P: a set of words ''C'' is a code if ''C''* is a free monoid and ''C'' is a basis.[ A set ''X'' of words in ''A''∗ is a prefix, or has the prefix property, if it does not contain a proper (string) prefix of any of its elements. Every prefix in ''A''+ is a code, indeed a ]prefix code
A prefix code is a type of code system distinguished by its possession of the prefix property, which requires that there is no whole Code word (communication), code word in the system that is a prefix (computer science), prefix (initial segment) of ...
.
A submonoid ''N'' of ''A''∗ is right unitary if ''x'', ''xy'' in ''N'' implies ''y'' in ''N''. A submonoid is generated by a prefix if and only if it is right unitary.
Factorization
A factorization of a free monoid is a sequence of subsets of words with the property that every word in the free monoid can be written as a concatenation of elements drawn from the subsets. The Chen–Fox–Lyndon theorem states that the Lyndon word
In mathematics, in the areas of combinatorics and computer science, a Lyndon word is a nonempty string that is strictly smaller in lexicographic order than all of its rotations. Lyndon words are named after mathematician Roger Lyndon, who investi ...
s furnish a factorization. More generally, Hall words provide a factorization; the Lyndon words are a special case of the Hall words.
Free hull
The intersection of free submonoids of a free monoid ''A''∗ is again free. If ''S'' is a subset of a free monoid ''A''* then the intersection of all free submonoids of ''A''* containing ''S'' is well-defined, since ''A''* itself is free, and contains ''S''; it is a free monoid and called the free hull of ''S''. A basis for this intersection is a code.
The defect theorem states that if ''X'' is finite and ''C'' is the basis of the free hull of ''X'', then either ''X'' is a code and ''C'' = ''X'', or
:, ''C'', ≤ , ''X'', − 1 .
Morphisms
A monoid morphism
In abstract algebra, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being .
Monoids are semigroups with identi ...
''f'' from a free monoid ''B''∗ to a monoid ''M'' is a map such that ''f''(''xy'') = ''f''(''x'')⋅''f''(''y'') for words ''x'',''y'' and ''f''(ε) = ι, where ε and ι denote the identity elements of ''B''∗ and ''M'', respectively. The morphism ''f'' is determined by its values on the letters of ''B'' and conversely any map from ''B'' to ''M'' extends to a morphism. A morphism is non-erasing or continuous if no letter of ''B'' maps to ι and trivial if every letter of ''B'' maps to ι.
A morphism ''f'' from a free monoid ''B''∗ to a free monoid ''A''∗ is total if every letter of ''A'' occurs in some word in the image of ''f''; cyclic[ or periodic] if the image of ''f'' is contained in ∗ for some word ''w'' of ''A''∗. A morphism ''f'' is ''k''-uniform if the length , ''f''(''a''), is constant and equal to ''k'' for all ''a'' in ''A''. A 1-uniform morphism is strictly alphabetic[ or a coding.]
A morphism ''f'' from a free monoid ''B''∗ to a free monoid ''A''∗ is simplifiable if there is an alphabet ''C'' of cardinality less than that of ''B'' such the morphism ''f'' factors through ''C''∗, that is, it is the composition of a morphism from ''B''∗ to ''C''∗ and a morphism from that to ''A''∗; otherwise ''f'' is elementary. The morphism ''f'' is called a code if the image of the alphabet ''B'' under ''f'' is a code. Every elementary morphism is a code.
Test sets
For ''L'' a subset of ''B''∗, a finite subset ''T'' of ''L'' is a ''test set'' for ''L'' if morphisms ''f'' and ''g'' on ''B''∗ agree on ''L'' if and only if they agree on ''T''. The Ehrenfeucht conjecture is that any subset ''L'' has a test set: it has been proved independently by Albert and Lawrence; McNaughton; and Guba. The proofs rely on Hilbert's basis theorem
In mathematics Hilbert's basis theorem asserts that every ideal (ring theory), ideal of a polynomial ring over a field (mathematics), field has a finite generating set of an ideal, generating set (a finite ''basis'' in Hilbert's terminology).
In ...
.
Map and fold
The computational embodiment of a monoid morphism is a map
A map is a symbolic depiction of interrelationships, commonly spatial, between things within a space. A map may be annotated with text and graphics. Like any graphic, a map may be fixed to paper or other durable media, or may be displayed on ...
followed by a fold. In this setting, the free monoid on a set ''A'' corresponds to lists of elements from ''A'' with concatenation as the binary operation. A monoid homomorphism from the free monoid to any other monoid (''M'',•) is a function ''f'' such that
* ''f''(''x''1...''x''''n'') = ''f''(''x''1) • ... • ''f''(''x''''n'')
* ''f''() = ''e''
where ''e'' is the identity on ''M''. Computationally, every such homomorphism corresponds to a map
A map is a symbolic depiction of interrelationships, commonly spatial, between things within a space. A map may be annotated with text and graphics. Like any graphic, a map may be fixed to paper or other durable media, or may be displayed on ...
operation applying ''f'' to all the elements of a list, followed by a fold operation which combines the results using the binary operator •. This computational paradigm (which can be generalized to non-associative binary operators) has inspired the MapReduce
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel and distributed algorithm on a cluster.
A MapReduce program is composed of a ''map'' procedure, which performs filte ...
software framework.
Endomorphisms
An endomorphism
In mathematics, an endomorphism is a morphism from a mathematical object to itself. An endomorphism that is also an isomorphism is an automorphism. For example, an endomorphism of a vector space is a linear map , and an endomorphism of a g ...
of ''A''∗ is a morphism from ''A''∗ to itself. The identity map
Graph of the identity function on the real numbers
In mathematics, an identity function, also called an identity relation, identity map or identity transformation, is a function that always returns the value that was used as its argument, unc ...
''I'' is an endomorphism of ''A''∗, and the endomorphisms form a monoid
In abstract algebra, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being .
Monoids are semigroups with identity ...
under composition of functions
In mathematics, the composition operator \circ takes two functions, f and g, and returns a new function h(x) := (g \circ f) (x) = g(f(x)). Thus, the function is applied after applying to . (g \circ f) is pronounced "the composition of an ...
.
An endomorphism ''f'' is prolongable if there is a letter ''a'' such that ''f''(''a'') = ''as'' for a non-empty string ''s''.[Allouche & Shallit (2003) p.10]
String projection
The operation of string projection is an endomorphism. That is, given a letter ''a'' ∈ Σ and a string ''s'' ∈ Σ∗, the string projection ''p''a(''s'') removes every occurrence of ''a'' from ''s''; it is formally defined by
:
Note that string projection is well-defined even if the rank of the monoid is infinite, as the above recursive definition works for all strings of finite length. String projection is a morphism
In mathematics, a morphism is a concept of category theory that generalizes structure-preserving maps such as homomorphism between algebraic structures, functions from a set to another set, and continuous functions between topological spaces. Al ...
in the category of free monoids, so that
:
where is understood to be the free monoid of all finite strings that don't contain the letter ''a''. Projection commutes with the operation of string concatenation, so that for all strings ''s'' and ''t''. There are many right inverses to string projection, and thus it is a split epimorphism.
The identity morphism is defined as for all strings ''s'', and .
String projection is commutative, as clearly
:
For free monoids of finite rank, this follows from the fact that free monoids of the same rank are isomorphic, as projection reduces the rank of the monoid by one.
String projection is idempotent
Idempotence (, ) is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of pl ...
, as
:
for all strings ''s''. Thus, projection is an idempotent, commutative operation, and so it forms a bounded semilattice
In mathematics, a join-semilattice (or upper semilattice) is a partially ordered set that has a join (a least upper bound) for any nonempty finite subset. Dually, a meet-semilattice (or lower semilattice) is a partially ordered set which has ...
or a commutative band.
The free commutative monoid
Given a set ''A'', the free commutative monoid
In abstract algebra, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being .
Monoids are semigroups with identity ...
on ''A'' is the set of all finite multiset
In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the ''multiplicity'' of ...
s with elements drawn from ''A'', with the monoid operation being multiset sum and the monoid unit being the empty multiset.
For example, if ''A'' = , elements of the free commutative monoid on ''A'' are of the form
:.
The fundamental theorem of arithmetic
In mathematics, the fundamental theorem of arithmetic, also called the unique factorization theorem and prime factorization theorem, states that every integer greater than 1 is prime or can be represented uniquely as a product of prime numbers, ...
states that the monoid of positive integers under multiplication is a free commutative monoid on an infinite set of generators, the prime number
A prime number (or a prime) is a natural number greater than 1 that is not a Product (mathematics), product of two smaller natural numbers. A natural number greater than 1 that is not prime is called a composite number. For example, 5 is prime ...
s.
The free commutative semigroup is the subset of the free commutative monoid that contains all multisets with elements drawn from ''A'' except the empty multiset.
The free partially commutative monoid, or ''trace monoid
In computer science, a trace is an equivalence class of strings, wherein certain letters in the string are allowed to commute, but others are not. Traces generalize the concept of strings by relaxing the requirement for all the letters to have a d ...
'', is a generalization that encompasses both the free and free commutative monoids as instances. This generalization finds applications in combinatorics
Combinatorics is an area of mathematics primarily concerned with counting, both as a means and as an end to obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many ...
and in the study of parallelism in computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
.
See also
* String operations In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical ...
Notes
References
*
*
*
*
*
*
*
*
External links
*{{Commons category-inline
Semigroup theory
Formal languages
Free algebraic structures
Combinatorics on words