HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, the string-to-string correction problem refers to determining the minimum cost sequence of edit operations necessary to change one string into another (i.e., computing the shortest
edit distance In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings (e.g., words) are to one another, that is measured by counting the minimum number of operations required to tr ...
). Each type of edit operation has its own cost value. A single edit operation may be changing a single
symbol A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
of the string into another (cost WC), deleting a symbol (cost WD), or inserting a new symbol (cost WI). If all edit operations have the same unit costs (WC = WD = WI = 1) the problem is the same as computing the
Levenshtein distance In information theory, linguistics, and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-charact ...
of two strings. Several
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s exist to provide an efficient way to determine string distance and specify the minimum number of transformation operations required. Such algorithms are particularly useful for
delta Delta commonly refers to: * Delta (letter) (Δ or δ), a letter of the Greek alphabet * River delta, at a river mouth * D ( NATO phonetic alphabet: "Delta") * Delta Air Lines, US * Delta variant of SARS-CoV-2 that causes COVID-19 Delta may also ...
creation operations where something is stored as a set of differences relative to a base version. This allows several versions of a single object to be stored much more efficiently than storing them separately. This holds true even for single versions of several objects if they do not differ greatly, or anything in between. Notably, such difference algorithms are used in
molecular biology Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and physi ...
to provide some measure of kinship between different kinds of organisms based on the similarities of their
macromolecule A macromolecule is a very large molecule important to biophysical processes, such as a protein or nucleic acid. It is composed of thousands of covalently bonded atoms. Many macromolecules are polymers of smaller molecules called monomers. The ...
s (such as
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
s or DNA).


Extension

The extended variant of the problem includes a new type of edit operation: swapping any two adjacent symbols, with a cost of WS. This version can be solved in polynomial time under certain restrictions on edit operation costs. Robert A. Wagner (1975) showed that the general problem is
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryi ...
. In particular, he proved that when WI < WC = WD = ∞ and 0 < WS < ∞ (or equivalently, changing and deletion are not permitted), the problem is NP-complete.


References

{{reflist Problems on strings