![Substring](https://upload.wikimedia.org/wikipedia/commons/3/3f/Substring.png)
In
formal language theory
In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.
The alphabet of a formal language consists of symb ...
and
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a substring is a contiguous sequence of
character
Character or Characters may refer to:
Arts, entertainment, and media Literature
* ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk
* ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...
s within a
string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequence of "''It was the best of times''", but not a substring.
Prefixes and suffixes are special cases of substrings. A prefix of a string
is a substring of
that occurs at the beginning of
; likewise, a suffix of a string
is a substring that occurs at the end of
.
The substrings of the string "''apple''" would be:
"''a''", "''ap''", "''app''", "''appl''", "''apple''",
"''p''", "''pp''", "''ppl''", "''pple''",
"''pl''", "''ple''",
"''l''", "''le''"
"''e''", ""
(note the
empty string
In formal language theory, the empty string, or empty word, is the unique string of length zero.
Formal theory
Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special cas ...
at the end).
Substring
A string
is a substring (or factor)
[ of a string if there exists two strings and such that . In particular, the empty string is a substring of every string.
Example: The string ]ana
is equal to substrings (and subsequences) of banana
at two different offsets:
banana
, , , , ,
ana, ,
, , ,
ana
The first occurrence is obtained with b
and na
, while the second occurrence is obtained with ban
and being the empty string.
A substring of a string is a prefix
A prefix is an affix which is placed before the Word stem, stem of a word. Adding it to the beginning of one word changes it into another word. For example, when the prefix ''un-'' is added to the word ''happy'', it creates the word ''unhappy'' ...
of a suffix
In linguistics, a suffix is an affix which is placed after the stem of a word. Common examples are case endings, which indicate the grammatical case of nouns, adjectives, and verb endings, which form the conjugation of verbs. Suffixes can carry ...
of the string, and equivalently a suffix of a prefix; for example, nan
is a prefix of nana
, which is in turn a suffix of banana
. If is a substring of , it is also a subsequence
In mathematics, a subsequence of a given sequence is a sequence that can be derived from the given sequence by deleting some or no elements without changing the order of the remaining elements. For example, the sequence \langle A,B,D \rangle is a ...
, which is a more general concept. The occurrences of a given pattern in a given string can be found with a string searching algorithm
In computer science, string-searching algorithms, sometimes called string-matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string ...
. Finding the longest string which is equal to a substring of two or more strings is known as the longest common substring problem
In computer science, the longest common substring problem is to find a longest string that is a substring of two or more strings. The problem may have multiple solutions.
Applications include data deduplication and plagiarism detection.
Examples ...
.
In the mathematical literature, substrings are also called subwords (in America) or factors (in Europe).
Prefix
A string is a prefix[ of a string if there exists a string such that . A ''proper prefix'' of a string is not equal to the string itself;][ some sources][ in addition restrict a proper prefix to be non-empty. A prefix can be seen as a special case of a substring.
Example: The string ]ban
is equal to a prefix (and substring and subsequence) of the string banana
:
banana
, , ,
ban
The square subset symbol is sometimes used to indicate a prefix, so that denotes that is a prefix of . This defines a binary relation
In mathematics, a binary relation associates elements of one set, called the ''domain'', with elements of another set, called the ''codomain''. A binary relation over Set (mathematics), sets and is a new set of ordered pairs consisting of ele ...
on strings, called the prefix relation In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical ...
, which is a particular kind of prefix order In mathematics, especially order theory, a prefix ordered set generalizes the intuitive concept of a tree by introducing the possibility of continuous progress and continuous branching. Natural prefix orders often occur when considering dynamical sy ...
.
Suffix
A string is a suffix[ of a string if there exists a string such that . A ''proper suffix'' of a string is not equal to the string itself. A more restricted interpretation is that it is also not empty. A suffix can be seen as a special case of a substring.
Example: The string ]nana
is equal to a suffix (and substring and subsequence) of the string banana
:
banana
, , , ,
nana
A suffix tree
In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particu ...
for a string is a trie
In computer science, a trie, also called digital tree or prefix tree, is a type of ''k''-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most often strings, with links between nodes def ...
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
that represents all of its suffixes. Suffix trees have large numbers of applications in string algorithms
In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). ...
. The suffix array
In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full-text indices, data-compression algorithms, and the field of bibliometrics.
Suffix arrays were introduced by as ...
is a simplified version of this data structure that lists the start positions of the suffixes in alphabetically sorted order; it has many of the same applications.
Border
A border is suffix and prefix of the same string, e.g. "bab" is a border of "babab" (and also of "baboon eating a kebab").
Superstring
A superstring of a finite set of strings is a single string that contains every string in as a substring. For example, is a superstring of , and is a shorter one. Concatenating all members of , in arbitrary order, always obtains a trivial superstring of . Finding superstrings whose length is as small as possible is a more interesting problem.
A string that contains every possible permutation of a specified character set is called a superpermutation
In combinatorial mathematics, a superpermutation on ''n'' symbols is a string that contains each permutation of ''n'' symbols as a substring. While trivial superpermutations can simply be made up of every permutation listed together, superpermuta ...
.
See also
* Brace notation
In several programming languages, such as Perl, brace notation is a faster way to extract bytes from a string variable.
In pseudocode
An example of brace notation using pseudocode which would extract the 82nd character from the string is:
a_byte ...
* Substring index
In computer science, a substring index is a data structure which gives substring search in a text or text collection in sublinear time. If you have a document S of length n, or a set of documents D=\ of total length n, you can locate all occurrenc ...
* Suffix automaton
In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string which allows the storage, processing, and retrieval of compressed information about all its substrings. The suffix auto ...
References
{{Reflist, refs=
[{{cite book
, last = Gusfield
, first = Dan
, orig-year = 1997
, year = 1999
, title = Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology
, publisher = Cambridge University Press
, location = USA
, isbn = 0-521-58519-8
]
[{{cite book
, last = Kelley
, first = Dean
, year = 1995
, title = Automata and Formal Languages: An Introduction
, publisher = Prentice-Hall International
, location = London
, isbn = 0-13-497777-7
]
[{{cite book
, last = Lothaire
, first = M.
, year = 1997
, title = Combinatorics on words
, publisher = Cambridge University Press
, location = Cambridge
, isbn = 0-521-59924-5
]
String (computer science)
Formal languages