HOME

TheInfoList



OR:

In
formal language theory In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet". The alphabet of a formal language consists of symbol ...
, an alphabet, sometimes called a vocabulary, is a non-empty set of indivisible
symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise different concep ...
/ characters/
glyph A glyph ( ) is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A ...
s, typically thought of as representing letters, characters, digits,
phonemes A phoneme () is any set of similar speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible phonetic unit—that helps distinguish one word from another. All languages con ...
, or even words. The definition is used in a diverse range of fields including logic, mathematics, computer science, and linguistics. An alphabet may have any
cardinality The thumb is the first digit of the hand, next to the index finger. When a person is standing in the medical anatomical position (where the palm is facing to the front), the thumb is the outermost digit. The Medical Latin English noun for thum ...
("size") and, depending on its purpose, may be finite (e.g., the alphabet of letters "a" through "z"),
countable In mathematics, a Set (mathematics), set is countable if either it is finite set, finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is ''countable'' if there exists an injective function fro ...
(e.g., \), or even uncountable (e.g., \). Strings, also known as "words" or "sentences", over an alphabet are defined as a
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
of the symbols from the alphabet set. For example, the alphabet of lowercase letters "a" through "z" can be used to form English words like "iceberg" while the alphabet of both upper and lower case letters can also be used to form proper names like "Wikipedia". A common alphabet is , the binary alphabet, and a "00101111" is an example of a binary string. Infinite sequences of symbols may be considered as well (see Omega language). It is often necessary for practical purposes to restrict the symbols in an alphabet so that they are unambiguous when interpreted. For instance, if the two-member alphabet is , a string written on paper as "000" is ambiguous because it is unclear if it is a sequence of three "0" symbols, a "00" followed by a "0", or a "0" followed by a "00".


Notation

By
definition A definition is a statement of the meaning of a term (a word, phrase, or other set of symbols). Definitions can be classified into two large categories: intensional definitions (which try to give the sense of a term), and extensional definitio ...
, the ''alphabet'' of a formal language L over \Sigma is the set \Sigma, which can be any non-empty set of symbols from which every string in L is built. For example, the set \Sigma = \ can be the alphabet of the formal language L that means "all variable identifiers in C programming language". Notice that it is not required to use every symbol in the alphabet of L for its strings. Given an alphabet \Sigma, the set of all strings of length n over the alphabet \Sigma is indicated by \Sigma^n. The set \bigcup_ \Sigma^i of all finite strings (regardless of their length) is indicated by the
Kleene star In mathematical logic and theoretical computer science, the Kleene star (or Kleene operator or Kleene closure) is a unary operation on a Set (mathematics), set to generate a set of all finite-length strings that are composed of zero or more repe ...
operator as \Sigma^*, and is also called the Kleene closure of \Sigma. The notation \Sigma^\omega indicates the set of all infinite sequences over the alphabet \Sigma, and \Sigma^\infty indicates the set \Sigma^\ast \cup \Sigma^\omega of all finite or infinite sequences. For example, using the binary alphabet , the strings ε, 0, 1, 00, 01, 10, 11, 000, etc. are all in the Kleene closure of the alphabet (where ε represents the
empty string In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero. Formal theory Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...
).


Applications

Alphabets are important in the use of
formal languages In logic, mathematics, computer science, and linguistics, a formal language is a set of string (computer science), strings whose symbols are taken from a set called "#Definition, alphabet". The alphabet of a formal language consists of symbol ...
, automata and semiautomata. In most cases, for defining instances of automata, such as deterministic finite automata (DFAs), it is required to specify an alphabet from which the input strings for the automaton are built. In these applications, an alphabet is usually required to be a
finite set In mathematics, particularly set theory, a finite set is a set that has a finite number of elements. Informally, a finite set is a set which one could in principle count and finish counting. For example, is a finite set with five elements. Th ...
, but is not otherwise restricted. When using automata,
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s, or
formal grammar A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
s as part of string-processing
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s, the alphabet may be assumed to be the
character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a c ...
of the text to be processed by these algorithms, or a subset of allowable characters from the character set.


See also

* Combinatorics on words * Terminal and nonterminal symbols


References


Literature

* John E. Hopcroft and Jeffrey D. Ullman, '' Introduction to Automata Theory, Languages, and Computation'', Addison-Wesley Publishing, Reading Massachusetts, 1979. . {{DEFAULTSORT:Alphabet (formal languages) Formal languages Combinatorics on words