HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, in the area of
formal language theory In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symb ...
, frequent use is made of a variety of string functions; however, the notation used is different from that used for
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as ana ...
, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.


Strings and languages

A string is a finite sequence of characters. The
empty string In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special cas ...
is denoted by \varepsilon. The concatenation of two string s and t is denoted by s \cdot t, or shorter by s t. Concatenating with the empty string makes no difference: s \cdot \varepsilon = s = \varepsilon \cdot s. Concatenation of strings is
associative In mathematics, the associative property is a property of some binary operations, which means that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a valid rule of replacement f ...
: s \cdot (t \cdot u) = (s \cdot t) \cdot u. For example, (\langle b \rangle \cdot \langle l \rangle) \cdot (\varepsilon \cdot \langle ah \rangle) = \langle bl \rangle \cdot \langle ah \rangle = \langle blah \rangle. A
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of met ...
is a finite or infinite set of strings. Besides the usual set operations like union, intersection etc., concatenation can be applied to languages: if both S and T are languages, their concatenation S \cdot T is defined as the set of concatenations of any string from S and any string from T, formally S \cdot T = \. Again, the concatenation dot \cdot is often omitted for brevity. The language \ consisting of just the empty string is to be distinguished from the empty language \. Concatenating any language with the former doesn't make any change: S \cdot \ = S = \ \cdot S, while concatenating with the latter always yields the empty language: S \cdot \ = \ = \ \cdot S. Concatenation of languages is associative: S \cdot (T \cdot U) = (S \cdot T) \cdot U. For example, abbreviating D = \, the set of all three-digit decimal numbers is obtained as D \cdot D \cdot D. The set of all decimal numbers of arbitrary length is an example for an infinite language.


Alphabet of a string

The alphabet of a string is the set of all of the characters that occur in a particular string. If ''s'' is a string, its
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syll ...
is denoted by :\operatorname(s) The alphabet of a language S is the set of all characters that occur in any string of S, formally: \operatorname(S) = \bigcup_ \operatorname(s). For example, the set \ is the alphabet of the string \langle cacao \rangle, and the above D is the alphabet of the above language D \cdot D \cdot D as well as of the language of all decimal numbers.


String substitution

Let ''L'' be a
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of met ...
, and let Σ be its alphabet. A string substitution or simply a substitution is a mapping ''f'' that maps characters in Σ to languages (possibly in a different alphabet). Thus, for example, given a character ''a'' ∈ Σ, one has ''f''(''a'')=''L''''a'' where ''L''''a'' ⊆ Δ * is some language whose alphabet is Δ. This mapping may be extended to strings as :''f''(ε)=ε for the
empty string In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special cas ...
ε, and :''f''(''sa'')=''f''(''s'')''f''(''a'') for string ''s'' ∈ ''L'' and character ''a'' ∈ Σ. String substitutions may be extended to entire languages as :f(L)=\bigcup_ f(s)
Regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
s are closed under string substitution. That is, if each character in the alphabet of a regular language is substituted by another regular language, the result is still a regular language. Similarly,
context-free language In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG). Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by ...
s are closed under string substitution.Although every regular language is also context-free, the previous theorem is not implied by the current one, since the former yields a shaper result for regular languages. A simple example is the conversion ''f''uc(.) to uppercase, which may be defined e.g. as follows: For the extension of ''f''uc to strings, we have e.g. * ''f''uc(‹Straße›) = ⋅ ⋅ ⋅ ⋅ ⋅ = , * ''f''uc(‹u2›) = ⋅ = , and * ''f''uc(‹Go!›) = ⋅ ⋅ = . For the extension of ''f''uc to languages, we have e.g. * ''f''uc() = ∪ ∪ = .


String homomorphism

A string homomorphism (often referred to simply as a
homomorphism In algebra, a homomorphism is a structure-preserving map between two algebraic structures of the same type (such as two groups, two rings, or two vector spaces). The word ''homomorphism'' comes from the Ancient Greek language: () meaning "same" ...
in
formal language theory In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symb ...
) is a string substitution such that each character is replaced by a single string. That is, f(a)=s, where s is a string, for each character a.Strictly formally, a homomorphism yields a language consisting of just one string, i.e. f(a) = . String homomorphisms are
monoid morphism In abstract algebra, a branch of mathematics, a monoid is a set equipped with an associative binary operation and an identity element. For example, the nonnegative integers with addition form a monoid, the identity element being 0. Monoids ...
s on the
free monoid In abstract algebra, the free monoid on a set is the monoid whose elements are all the finite sequences (or strings) of zero or more elements from that set, with string concatenation as the monoid operation and with the unique sequence of zero eleme ...
, preserving the empty string and the
binary operation In mathematics, a binary operation or dyadic operation is a rule for combining two elements (called operands) to produce another element. More formally, a binary operation is an operation of arity two. More specifically, an internal binary op ...
of
string concatenation In formal language theory and computer programming, string concatenation is the operation of joining character strings end-to-end. For example, the concatenation of "snow" and "ball" is "snowball". In certain formalisations of concatenat ...
. Given a language L, the set f(L) is called the homomorphic image of L. The inverse homomorphic image of a string s is defined as f^(s) = \ while the inverse homomorphic image of a language L is defined as f^(L) = \ In general, f(f^(L)) \neq L, while one does have f(f^(L)) \subseteq L and L \subseteq f^(f(L)) for any language L. The class of regular languages is closed under homomorphisms and inverse homomorphisms. Similarly, the context-free languages are closed under homomorphismsThis follows from the above-mentioned closure under arbitrary substitutions. and inverse homomorphisms. A string homomorphism is said to be ε-free (or e-free) if f(a) \neq \varepsilon for all ''a'' in the alphabet \Sigma. Simple single-letter
substitution cipher In cryptography, a substitution cipher is a method of encrypting in which units of plaintext are replaced with the ciphertext, in a defined manner, with the help of a key; the "units" may be single letters (the most common), pairs of letters, trip ...
s are examples of (ε-free) string homomorphisms. An example string homomorphism ''g''uc can also be obtained by defining similar to the above substitution: ''g''uc(‹a›) = ‹A›, ..., ''g''uc(‹0›) = ε, but letting ''g''uc be undefined on punctuation chars. Examples for inverse homomorphic images are * ''g''uc−1() = , since ''g''uc(‹sss›) = ''g''uc(‹sß›) = ''g''uc(‹ßs›) = ‹SSS›, and *''g''uc−1() = , since ''g''uc(‹a›) = ‹A›, while ‹bb› cannot be reached by ''g''uc. For the latter language, ''g''uc(''g''uc−1()) = ''g''uc() = ≠ . The homomorphism ''g''uc is not ε-free, since it maps e.g. ‹0› to ε. A very simple string homomorphism example that maps each character to just a character is the conversion of an
EBCDIC Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six- ...
-encoded string to
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
.


String projection

If ''s'' is a string, and \Sigma is an alphabet, the string projection of ''s'' is the string that results by removing all characters that are not in \Sigma. It is written as \pi_\Sigma(s)\,. It is formally defined by removal of characters from the right hand side: :\pi_\Sigma(s) = \begin \varepsilon & \mbox s=\varepsilon \mbox \\ \pi_\Sigma(t) & \mbox s=ta \mbox a \notin \Sigma \\ \pi_\Sigma(t)a & \mbox s=ta \mbox a \in \Sigma \end Here \varepsilon denotes the
empty string In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special cas ...
. The projection of a string is essentially the same as a
projection in relational algebra In relational algebra, a projection is a unary operation written as \Pi_( R ), where R is a relation and a_1,...,a_n are attribute names. Its result is defined as the set obtained when the components of the tuples in R are restricted to the set \ â ...
. String projection may be promoted to the projection of a language. Given a
formal language In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of symb ...
''L'', its projection is given by :\pi_\Sigma (L)=\


Right quotient

The right quotient of a character ''a'' from a string ''s'' is the truncation of the character ''a'' in the string ''s'', from the right hand side. It is denoted as s/a. If the string does not have ''a'' on the right hand side, the result is the empty string. Thus: :(sa)/ b = \begin s & \mbox a=b \\ \varepsilon & \mbox a \ne b \end The quotient of the empty string may be taken: :\varepsilon / a = \varepsilon Similarly, given a subset S\subset M of a monoid M, one may define the quotient subset as :S/a=\ Left quotients may be defined similarly, with operations taking place on the left of a string. Hopcroft and Ullman (1979) define the quotient ''L''1/''L''2 of the languages ''L''1 and ''L''2 over the same alphabet as ''L''1/''L''2 = . This is not a generalization of the above definition, since, for a string ''s'' and distinct characters ''a'', ''b'', Hopcroft's and Ullman's definition implies / yielding , rather than . The left quotient (when defined similar to Hopcroft and Ullman 1979) of a singleton language ''L''1 and an arbitrary language ''L''2 is known as
Brzozowski derivative In theoretical computer science, in particular in formal language theory, the Brzozowski derivative u^S of a set S of strings and a string u is the set of all strings obtainable from a string in S by cutting off the prefix u, as illustrated in th ...
; if ''L''2 is represented by a
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
, so can be the left quotient.


Syntactic relation

The right quotient of a subset S\subset M of a monoid M defines an
equivalence relation In mathematics, an equivalence relation is a binary relation that is reflexive, symmetric and transitive. The equipollence relation between line segments in geometry is a common example of an equivalence relation. Each equivalence relation ...
, called the right
syntactic relation In mathematics and computer science, the syntactic monoid M(L) of a formal language L is the smallest monoid that recognizes the language L. Syntactic quotient The free monoid on a given set is the monoid whose elements are all the strings of zero ...
of ''S''. It is given by :\sim_S \;\,=\, \ The relation is clearly of finite index (has a finite number of equivalence classes) if and only if the family right quotients is finite; that is, if :\ is finite. In the case that ''M'' is the monoid of words over some alphabet, ''S'' is then a
regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
, that is, a language that can be recognized by a
finite state automaton A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
. This is discussed in greater detail in the article on
syntactic monoid In mathematics and computer science, the syntactic monoid M(L) of a formal language L is the smallest monoid that recognizes the language L. Syntactic quotient The free monoid on a given set is the monoid whose elements are all the strings of zero ...
s.


Right cancellation

The right cancellation of a character ''a'' from a string ''s'' is the removal of the first occurrence of the character ''a'' in the string ''s'', starting from the right hand side. It is denoted as s\div a and is recursively defined as :(sa)\div b = \begin s & \mbox a=b \\ (s\div b)a & \mbox a \ne b \end The empty string is always cancellable: :\varepsilon \div a = \varepsilon Clearly, right cancellation and projection
commute Commute, commutation or commutative may refer to: * Commuting, the process of travelling between a place of residence and a place of work Mathematics * Commutative property, a property of a mathematical operation whose result is insensitive to th ...
: :\pi_\Sigma(s)\div a = \pi_\Sigma(s \div a )


Prefixes

The prefixes of a string is the set of all
prefixes A prefix is an affix which is placed before the Word stem, stem of a word. Adding it to the beginning of one word changes it into another word. For example, when the prefix ''un-'' is added to the word ''happy'', it creates the word ''unhappy'' ...
to a string, with respect to a given language: :\operatorname_L(s) = \ where s\in L. The prefix closure of a language is :\operatorname (L) = \bigcup_ \operatorname_L(s) = \left\ Example:
L=\left\\mbox \operatorname(L)=\left\ A language is called prefix closed if \operatorname (L) = L. The prefix closure operator is
idempotent Idempotence (, ) is the property of certain operation (mathematics), operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence ...
: :\operatorname (\operatorname (L)) =\operatorname (L) The prefix relation is a
binary relation In mathematics, a binary relation associates elements of one set, called the ''domain'', with elements of another set, called the ''codomain''. A binary relation over Set (mathematics), sets and is a new set of ordered pairs consisting of ele ...
\sqsubseteq such that s\sqsubseteq t if and only if s \in \operatorname_L(t). This relation is a particular example of a
prefix order In mathematics, especially order theory, a prefix ordered set generalizes the intuitive concept of a tree by introducing the possibility of continuous progress and continuous branching. Natural prefix orders often occur when considering dynamical sy ...
.


See also

* Comparison of programming languages (string functions) *
Levi's lemma In theoretical computer science and mathematics, especially in the area of combinatorics on words, the Levi lemma states that, for all strings ''u'', ''v'', ''x'' and ''y'', if ''uv'' = ''xy'', then there exists a string ''w'' such tha ...
*
String (computer science) In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). ...
— definition and implementation of more basic operations on strings


Notes


References

* ''(See chapter 3.)'' {{reflist Formal languages Relational algebra Operations