Dyck Language
   HOME

TheInfoList



OR:

In the theory of
formal languages In logic, mathematics, computer science, and linguistics, a formal language consists of string (computer science), words whose symbol (formal), letters are taken from an alphabet (formal languages), alphabet and are well-formedness, well-formed ...
of
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
,
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
, and
linguistics Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
, a Dyck word is a balanced string of square brackets
and or AND may refer to: Logic, grammar, and computing * Conjunction (grammar), connecting two words, phrases, or clauses * Logical conjunction in mathematical logic, notated as "∧", "⋅", "&", or simple juxtaposition * Bitwise AND, a boole ...
The set of Dyck words forms the Dyck language. Dyck words and language are named after the mathematician
Walther von Dyck Walther Franz Anton von Dyck (6 December 1856 – 5 November 1934), born Dyck () and later ennobled, was a German mathematician. He is credited with being the first to define a mathematical group, in the modern sense in . He laid the foundations ...
. They have applications in the
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from L ...
of expressions that must have a correctly nested sequence of brackets, such as arithmetic or algebraic expressions.


Formal definition

Let \Sigma = \ be the alphabet consisting of the symbols
and or AND may refer to: Logic, grammar, and computing * Conjunction (grammar), connecting two words, phrases, or clauses * Logical conjunction in mathematical logic, notated as "∧", "⋅", "&", or simple juxtaposition * Bitwise AND, a boole ...
Let \Sigma^ denote its
Kleene closure In mathematical logic and computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation, either on sets of strings or on sets of symbols or characters. In mathematics, it is more commonly known as the free monoid ...
. The Dyck language is defined as: : \.


Context-free grammar

It may be helpful to define the Dyck language via a context-free grammar in some situations. The Dyck language is generated by the context-free grammar with a single non-terminal , and the production: : That is, ''S'' is either the
empty string In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special cas ...
() or is " , an element of the Dyck language, the matching ", and an element of the Dyck language. An alternative context-free grammar for the Dyck language is given by the production: : That is, ''S'' is zero or more occurrences of the combination of " , an element of the Dyck language, and a matching ", where multiple elements of the Dyck language on the right side of the production are free to differ from each other.


Alternative definition

In yet other contexts it may instead be helpful to define the Dyck language by splitting \Sigma^ into equivalence classes, as follows. For any element u \in \Sigma^ of length , u , , we define
partial function In mathematics, a partial function from a set to a set is a function from a subset of (possibly itself) to . The subset , that is, the domain of viewed as a function, is called the domain of definition of . If equals , that is, if is de ...
s \operatorname : \Sigma^ \times \mathbb \rightarrow \Sigma^ and \operatorname : \Sigma^ \times \mathbb \rightarrow \Sigma^ by :\operatorname(u, j) is u with "[]" inserted into the jth position :\operatorname(u, j) is u with "[]" deleted from the jth position with the understanding that \operatorname(u, j) is undefined for j > , u, and \operatorname(u, j) is undefined if j > , u, - 2. We define an equivalence relation R on \Sigma^ as follows: for elements a, b \in \Sigma^ we have (a, b) \in R if and only if there exists a sequence of zero or more applications of the \operatorname and \operatorname functions starting with a and ending with b. That the sequence of zero operations is allowed accounts for the reflexivity of R.
Symmetry Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definit ...
follows from the observation that any finite sequence of applications of \operatorname to a string can be undone with a finite sequence of applications of \operatorname. Transitivity is clear from the definition. The equivalence relation partitions the language \Sigma^ into equivalence classes. If we take \epsilon to denote the empty string, then the language corresponding to the equivalence class \operatorname(\epsilon) is called the Dyck language.


Properties

* The Dyck language is closed under the operation of
concatenation In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenat ...
. * By treating \Sigma^ as an algebraic
monoid In abstract algebra, a branch of mathematics, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being 0. Monoids ...
under concatenation we see that the monoid structure transfers onto the
quotient In arithmetic, a quotient (from lat, quotiens 'how many times', pronounced ) is a quantity produced by the division of two numbers. The quotient has widespread use throughout mathematics, and is commonly referred to as the integer part of a ...
\Sigma^ / R, resulting in the
syntactic monoid In mathematics and computer science, the syntactic monoid M(L) of a formal language L is the smallest monoid that recognizes the language L. Syntactic quotient The free monoid on a given set is the monoid whose elements are all the strings of zero ...
of the Dyck language. The class \operatorname(\epsilon) will be denoted 1. * The syntactic monoid of the Dyck language is not
commutative In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Most familiar as the name o ...
: if u = \operatorname( and v = \operatorname( then uv = \operatorname([]) = 1 \ne \operatorname(][) = vu. * With the notation above, uv = 1 but neither u nor v are invertible in \Sigma^ / R. * The syntactic monoid of the Dyck language is isomorphic to the
bicyclic semigroup In mathematics, the bicyclic semigroup is an algebraic object important for the structure theory of semigroups. Although it is in fact a monoid, it is usually referred to as simply a semigroup. It is perhaps most easily understood as the syntactic ...
by virtue of the properties of \operatorname( and \operatorname( described above. * By the
Chomsky–Schützenberger representation theorem In formal language theory, the Chomsky–Schützenberger representation theorem is a theorem derived by Noam Chomsky and Marcel-Paul Schützenberger about representing a given context-free language in terms of two simpler languages. These two si ...
, any
context-free language In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG). Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by ...
is a homomorphic image of the intersection of some
regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
with a Dyck language on one or more kinds of bracket pairs. * The Dyck language with two distinct types of brackets can be recognized in the
complexity class In computational complexity theory, a complexity class is a set (mathematics), set of computational problems of related resource-based computational complexity, complexity. The two most commonly analyzed resources are time complexity, time and spa ...
TC^.Barrington and Corbett, Information Processing Letters 32 (1989) 251-256 * The number of distinct Dyck words with exactly pairs of parentheses and innermost pairs (viz. the substring /math>) is the
Narayana number In combinatorics, the Narayana numbers \operatorname(n, k), n \in \mathbb^+, 1 \le k \le n form a triangular array of natural numbers, called the Narayana triangle, that occur in various counting problems. They are named after Canadian mathemati ...
\operatorname(n, k). * The number of distinct Dyck words with exactly pairs of parentheses is the -th
Catalan number In combinatorial mathematics, the Catalan numbers are a sequence of natural numbers that occur in various counting problems, often involving recursively defined objects. They are named after the French-Belgian mathematician Eugène Charles Ca ...
C_n. Notice that the Dyck language of words with parentheses pairs is equal to the union, over all possible , of the Dyck languages of words of parentheses pairs ''with innermost pairs'', as defined in the previous point. Since can range from 0 to , we obtain the following equality, which indeed holds: ::C_n = \sum_^n \operatorname(n, k)


Examples

We can define an equivalence relation L on the Dyck language \mathcal. For u,v\in\mathcal we have (u,v)\in L if and only if , u, = , v, , i.e. u and v have the same length. This relation partitions the Dyck language: \mathcal / L = \. We have \mathcal = \mathcal_ \cup \mathcal_ \cup \mathcal_ \cup \ldots = \bigcup_^ \mathcal_ where \mathcal_ = \. Note that \mathcal_ is empty for odd n. Having introduced the Dyck words of length n, we can introduce a relationship on them. For every n \in \mathbb we define a relation S_ on \mathcal_; for u,v\in\mathcal_ we have (u,v)\in S_ if and only if v can be reached from u by a series of proper swaps. A proper swap in a word u\in\mathcal_ swaps an occurrence of '] _with_'[. For_each_n\in\mathbb_the_relation_S__makes_\mathcal__into_a_ _with_'[. For_each_n\in\mathbb_the_relation_S__makes_\mathcal__into_a_partially_ordered_set">html"_;"title="_with_'[">_with_'[. For_each_n\in\mathbb_the_relation_S__makes_\mathcal__into_a_partially_ordered_set._The_relation_S__is_ _with_'[. For_each_n\in\mathbb_the_relation_S__makes_\mathcal__into_a_partially_ordered_set">html"_;"title="_with_'[">_with_'[. For_each_n\in\mathbb_the_relation_S__makes_\mathcal__into_a_partially_ordered_set._The_relation_S__is_reflexive_relation">reflexive_because_an_empty_sequence_of_proper_swaps_takes_u_to_u.__Transitivity_follows_because_we_can_extend_a_sequence_of_proper_swaps_that_takes_u_to_v_by_concatenating_it_with_a_sequence_of_proper_swaps_that_takes_v_to_w_forming_a_sequence_that_takes_u_into_w._To_see_that_S__is_also_ _with_'[. For_each_n\in\mathbb_the_relation_S__makes_\mathcal__into_a_partially_ordered_set">html"_;"title="_with_'[">_with_'[. For_each_n\in\mathbb_the_relation_S__makes_\mathcal__into_a_partially_ordered_set._The_relation_S__is_reflexive_relation">reflexive_because_an_empty_sequence_of_proper_swaps_takes_u_to_u.__Transitivity_follows_because_we_can_extend_a_sequence_of_proper_swaps_that_takes_u_to_v_by_concatenating_it_with_a_sequence_of_proper_swaps_that_takes_v_to_w_forming_a_sequence_that_takes_u_into_w._To_see_that_S__is_also_Antisymmetric_relation">antisymmetric_we_introduce_an_auxiliary_function_\sigma_:\mathcal_\rightarrow\mathbb_defined_as_a_sum_over_all_prefixes_v_of_u: :_\sigma_n(u)_=_\sum__\Big(_(\text_v)_-_(\text_v)_\Big) The_following_table_illustrates_that_\sigma__is_Monotonic_function.html" ;"title="Antisymmetric_relation.html" ;"title="reflexive_relation.html" ;"title="partially_ordered_set.html" ;"title="html" ;"title=" with '["> with '[. For each n\in\mathbb the relation S_ makes \mathcal_ into a partially ordered set">html" ;"title=" with '["> with '[. For each n\in\mathbb the relation S_ makes \mathcal_ into a partially ordered set. The relation S_ is reflexive relation">reflexive because an empty sequence of proper swaps takes u to u. Transitivity follows because we can extend a sequence of proper swaps that takes u to v by concatenating it with a sequence of proper swaps that takes v to w forming a sequence that takes u into w. To see that S_ is also Antisymmetric relation">antisymmetric we introduce an auxiliary function \sigma_:\mathcal_\rightarrow\mathbb defined as a sum over all prefixes v of u: : \sigma_n(u) = \sum_ \Big( (\text v) - (\text v) \Big) The following table illustrates that \sigma_ is Monotonic function">strictly monotonic with respect to proper swaps. Hence \sigma_(u') - \sigma_(u) = 2 > 0 so \sigma_(u) < \sigma_(u') when there is a proper swap that takes u into u'. Now if we assume that both (u,v), (v,u)\in S_ and u\ne v, then there are non-empty sequences of proper swaps such u is taken into v and vice versa. But then \sigma_(u) < \sigma_(v) < \sigma_(u) which is nonsensical. Therefore, whenever both (u,v) and (v,u) are in S_, we have u = v, hence S_ is antisymmetric. The partial ordered set D_ is shown in the illustration accompanying the introduction if we interpret a [ as going up and ] as going down.


Generalizations

There exist variants of the Dyck language with multiple delimiters, e.g., on the alphabet "(", ")", "[", and "]". The words of such a language are the ones which are well-parenthesized for all delimiters, i.e., one can read the word from left to right, push every opening delimiter on the stack, and whenever we reach a closing delimiter then we must be able to pop the matching opening delimiter from the top of the stack.


See also

* Dyck congruence * Lattice word


Notes


References

* {{planetmath, dycklanguage
A proof of the Chomsky Schützenberger theorem

An AMS blog entry on Dyck words
Formal languages