HOME

TheInfoList



OR:

In
theoretical computer science Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory. It is difficult to circumsc ...
, more precisely in the theory of
formal languages In logic, mathematics, computer science, and linguistics, a formal language consists of string (computer science), words whose symbol (formal), letters are taken from an alphabet (formal languages), alphabet and are well-formedness, well-formed ...
, the star height is a measure for the structural complexity of
regular expressions A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
and
regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
s. The star height of a regular ''expression'' equals the maximum nesting depth of stars appearing in that expression. The star height of a regular ''language'' is the least star height of any regular expression for that language. The concept of star height was first defined and studied by Eggan (1963).


Formal definition

More formally, the star height of a
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
''E'' over a finite
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syll ...
''A'' is inductively defined as follows: * \textstyle h\left(\emptyset\right)\,=\,0, \textstyle h\left(\varepsilon\right)\,=\,0, and \textstyle h\left(a\right)\,=\,0 for all alphabet symbols ''a'' in ''A''. * \textstyle h\left(E F\right)\,=\, h\left(E\, \mid\, F\right)\,=\,\max \left(\, h(E), h(F)\,\right) * \textstyle h\left(E^*\right)\,=\,h(E)+1. Here, \scriptstyle \emptyset is the special regular expression denoting the empty set and ε the special one denoting the
empty word In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special case ...
; ''E'' and ''F'' are arbitrary regular expressions. The star height ''h''(''L'') of a regular language ''L'' is defined as the minimum star height among all regular expressions representing ''L''. The intuition is here that if the language ''L'' has large star height, then it is in some sense inherently complex, since it cannot be described by means of an "easy" regular expression, of low star height.


Examples

While computing the star height of a regular expression is easy, determining the star height of a language can be sometimes tricky. For illustration, the regular expression :\textstyle \left(b\, \mid\, a a^*b\right)^*a a^* over the alphabet ''A = '' has star height 2. However, the described language is just the set of all words ending in an ''a'': thus the language can also be described by the expression :\textstyle (a\, \mid\, b)^*a which is only of star height 1. To prove that this language indeed has star height 1, one still needs to rule out that it could be described by a regular expression of lower star height. For our example, this can be done by an indirect proof: One proves that a language of star height 0 contains only finitely many words. Since the language under consideration is infinite, it cannot be of star height 0. The star height of a
group language In mathematics and computer science, the syntactic monoid M(L) of a formal language L is the smallest monoid that recognizes the language L. Syntactic quotient The free monoid on a given set is the monoid whose elements are all the strings of ze ...
is computable: for example, the star height of the language over in which the number of occurrences of ''a'' and ''b'' are congruent modulo 2''n'' is ''n''.Sakarovitch (2009) p.342


Eggan's theorem

In his seminal study of the star height of regular languages, established a relation between the theories of regular expressions, finite automata, and of
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
s. In subsequent years, this relation became known as ''Eggan's theorem'', cf. . We recall a few concepts from
graph theory In mathematics, graph theory is the study of ''graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are conne ...
and
automata theory Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science. The word ''automata'' comes from the Greek word αὐτόματο ...
. In graph theory, the
cycle rank In graph theory, the cycle rank of a directed graph is a digraph connectivity measure proposed first by Eggan and Büchi . Intuitively, this concept measures how close a digraph is to a directed acyclic graph (DAG), in the sense that a DAG has ...
''r''(''G'') of a
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
(digraph) is inductively defined as follows: * If ''G'' is acyclic, then . This applies in particular if ''G'' is empty. * If ''G'' is
strongly connected In the mathematical theory of directed graphs, a graph is said to be strongly connected if every vertex is reachable from every other vertex. The strongly connected components of an arbitrary directed graph form a partition into subgraphs that a ...
and ''E'' is nonempty, then ::r(G) = 1 + \min_ r(G-v),\,where is the digraph resulting from deletion of vertex and all edges beginning or ending at . * If ''G'' is not strongly connected, then ''r''(''G'') is equal to the maximum cycle rank among all strongly connected components of ''G''. In automata theory, a
nondeterministic finite automaton In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if * each of its transitions is ''uniquely'' determined by its source state and input symbol, and * reading an input symbol is required for each state tr ...
with ε-transitions (ε-NFA) is defined as a 5-tuple, (''Q'', Σ, ''δ'', ''q0'', ''F''), consisting of * a finite
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
of states ''Q'' * a finite set of
input symbol In formal language theory, an alphabet is a non-empty set of symbols/glyphs, typically thought of as representing letters, characters, or digits but among other possibilities the "symbols" could also be a set of phonemes (sound units). Alphabets in ...
s Σ * a set of labeled edges ''δ'', referred to as ''transition relation'': ''Q'' × (Σ ∪) × ''Q''. Here ε denotes the
empty word In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special case ...
. * an ''initial'' state ''q''0 ∈ ''Q'' * a set of states ''F'' distinguished as ''accepting states'' ''F'' ⊆ ''Q''. A word ''w'' ∈ Σ* is accepted by the ε-NFA if there exists a
directed path In graph theory, a path in a graph is a finite or infinite sequence of edges which joins a sequence of vertices which, by most definitions, are all distinct (and since the vertices are distinct, so are the edges). A directed path (sometimes c ...
from the initial state ''q''0 to some final state in ''F'' using edges from ''δ'', such that the
concatenation In formal language, formal language theory and computer programming, string concatenation is the operation of joining character string (computer science), character strings wikt:end-to-end, end-to-end. For example, the concatenation of "sno ...
of all labels visited along the path yields the word ''w''. The set of all words over Σ* accepted by the automaton is the ''language'' accepted by the automaton ''A''. When speaking of digraph properties of a nondeterministic finite automaton ''A'' with state set ''Q'', we naturally address the digraph with vertex set ''Q'' induced by its transition relation. Now the theorem is stated as follows. :Eggan's Theorem: The star height of a regular language ''L'' equals the minimum
cycle rank In graph theory, the cycle rank of a directed graph is a digraph connectivity measure proposed first by Eggan and Büchi . Intuitively, this concept measures how close a digraph is to a directed acyclic graph (DAG), in the sense that a DAG has ...
among all
nondeterministic finite automata In automata theory, a finite-state machine is called a deterministic finite automaton (DFA), if * each of its transitions is ''uniquely'' determined by its source state and input symbol, and * reading an input symbol is required for each state tr ...
with ε-transitions accepting ''L''. Proofs of this theorem are given by , and more recently by .


Generalized star height

The above definition assumes that regular expressions are built from the elements of the alphabet ''A'' using only the standard operators
set union In set theory, the union (denoted by ∪) of a collection of sets is the set of all elements in the collection. It is one of the fundamental operations through which sets can be combined and related to each other. A refers to a union of ze ...
,
concatenation In formal language, formal language theory and computer programming, string concatenation is the operation of joining character string (computer science), character strings wikt:end-to-end, end-to-end. For example, the concatenation of "sno ...
, and
Kleene star In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid c ...
. ''Generalized regular expressions'' are defined just as regular expressions, but here also the
set complement In set theory, the complement of a set , often denoted by (or ), is the set of elements not in . When all sets in the universe, i.e. all sets under consideration, are considered to be members of a given set , the absolute complement of is the ...
operator is allowed (the complement is always taken with respect to the set of all words over A). If we alter the definition such that taking complements does not increase the star height, that is, :\textstyle h\left(E^c\right)\,=\,h(E) we can define the generalized star height of a regular language ''L'' as the minimum star height among all ''generalized'' regular expressions representing ''L''. Note that, whereas it is immediate that a language of (ordinary) star height 0 can contain only finitely many words, there exist infinite languages having generalized star height 0. For instance, the regular expression :\textstyle (a\, \mid\, b)^*a, which we saw in the example above, can be equivalently described by the generalized regular expression :\textstyle \emptyset^c a, since the complement of the empty set is precisely the set of all words over ''A''. Thus the set of all words over the alphabet ''A'' ending in the letter ''a'' has star height one, while its generalized star height equals zero. Languages of generalized star height zero are also called
star-free language A regular language is said to be star-free if it can be described by a regular expression constructed from the letters of the alphabet, the empty set symbol, all boolean operators – including complementation – and concatenation but no ...
s. It can be shown that a language ''L'' is star-free if and only if its
syntactic monoid In mathematics and computer science, the syntactic monoid M(L) of a formal language L is the smallest monoid that recognizes the language L. Syntactic quotient The free monoid on a given set is the monoid whose elements are all the strings of zero ...
is
aperiodic A periodic function is a function that repeats its values at regular intervals. For example, the trigonometric functions, which repeat at intervals of 2\pi radians, are periodic functions. Periodic functions are used throughout science to desc ...
().


See also

*
Star height problem The star height problem in formal language theory is the question whether all regular languages can be expressed using regular expressions of limited star height, i.e. with a limited nesting depth of Kleene stars. Specifically, is a nesting depth ...
*
Generalized star height problem The generalized star-height problem in formal language theory is the open question whether all regular languages can be expressed using generalized regular expressions with a limited nesting depth of Kleene stars. Here, generalized regular expres ...
.


References

* * * * * * * {{Citation , last=Schützenberger , first=M.P. , authorlink=Marcel-Paul Schützenberger , title=On finite monoids having only trivial subgroups , journal=
Information and Control ''Information and Computation'' is a closed-access computer science journal published by Elsevier (formerly Academic Press). The journal was founded in 1957 under its former name ''Information and Control'' and given its current title in 1987. , t ...
, year=1965, volume=8 , issue=2 , pages=190–194 , doi=10.1016/S0019-9958(65)90108-7 , zbl=0131.02001 , issn=0019-9958 , doi-access=free Formal languages