Subsequence
   HOME

TheInfoList



OR:

In
mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
, a subsequence of a given
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
is a sequence that can be derived from the given sequence by deleting some or no elements without changing the order of the remaining elements. For example, the sequence \langle A,B,D \rangle is a subsequence of \langle A,B,C,D,E,F \rangle obtained after removal of elements C, E, and F. The relation of one sequence being the subsequence of another is a
partial order In mathematics, especially order theory, a partial order on a set is an arrangement such that, for certain pairs of elements, one precedes the other. The word ''partial'' is used to indicate that not every pair of elements needs to be comparable ...
. Subsequences can contain consecutive elements which were not consecutive in the original sequence. A subsequence which consists of a consecutive run of elements from the original sequence, such as \langle B,C,D \rangle, from \langle A,B,C,D,E,F \rangle, is a substring. The substring is a refinement of the subsequence. The list of all subsequences for the word "apple" would be "''a''", "''ap''", "''al''", "''ae''", "''app''", "''apl''", "''ape''", "''ale''", "''appl''", "''appe''", "''aple''", "''apple''", "''p''", "''pp''", "''pl''", "''pe''", "''ppl''", "''ppe''", "''ple''", "''pple''", "''l''", "''le''", "''e''", "" (
empty string In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero. Formal theory Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...
).


Common subsequence

Given two sequences X and Y, a sequence Z is said to be a ''common subsequence'' of X and Y, if Z is a subsequence of both X and Y. For example, if X = \langle A,C,B,D,E,G,C,E,D,B,G \rangle \qquad \text Y = \langle B,E,G,J,C,F,E,K,B \rangle \qquad \text Z = \langle B,E,E \rangle. then Z is said to be a common subsequence of X and Y. This would be the '' longest common subsequence'', since Z only has length 3, and the common subsequence \langle B,E,E,B \rangle has length 4. The longest common subsequence of X and Y is \langle B,E,G,C,E,B \rangle.


Applications

Subsequences have applications to
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
,In computer science, '' string'' is often used as a synonym for ''sequence'', but it is important to note that '' substring'' and ''subsequence'' are not synonyms. Substrings are ''consecutive'' parts of a string, while subsequences need not be. This means that a substring of a string is always a subsequence of the string, but a subsequence of a string is not always a substring of the string, see: especially in the discipline of
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
, where computers are used to compare, analyze, and store
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
,
RNA Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
, and
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
sequences. Take two sequences of DNA containing 37 elements, say: :SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA :SEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA The longest common subsequence of sequences 1 and 2 is: :LCS(SEQ1,SEQ2) = CGTTCGGCTATGCTTCTACTTATTCTA This can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences: :SEQ1 = AGGTGAGGAG :SEQ2 = CTAGTTAGTA Another way to show this is to ''align'' the two sequences, that is, to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) for padding of arisen empty subsequences: :SEQ1 = ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA- :        ,  , ,  , , ,  , , , , ,  ,   ,  ,   ,  , ,  ,   , ,  ,  , ,  ,   , , , :SEQ2 = -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAA Subsequences are used to determine how similar the two strands of DNA are, using the DNA bases:
adenine Adenine (, ) (nucleoside#List of nucleosides and corresponding nucleobases, symbol A or Ade) is a purine nucleotide base that is found in DNA, RNA, and Adenosine triphosphate, ATP. Usually a white crystalline subtance. The shape of adenine is ...
,
guanine Guanine () (symbol G or Gua) is one of the four main nucleotide bases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside ...
,
cytosine Cytosine () (symbol C or Cyt) is one of the four nucleotide bases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attac ...
and
thymine Thymine () (symbol T or Thy) is one of the four nucleotide bases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine ...
.


Theorems

* Every infinite sequence of
real number In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...
s has an infinite monotone subsequence (This is a lemma used in the proof of the Bolzano–Weierstrass theorem). * Every infinite bounded sequence in \R^n has a convergent subsequence (This is the
Bolzano–Weierstrass theorem In mathematics, specifically in real analysis, the Bolzano–Weierstrass theorem, named after Bernard Bolzano and Karl Weierstrass, is a fundamental result about convergence in a finite-dimensional Euclidean space \R^n. The theorem states that ea ...
). * For all
integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
s r and s, every finite sequence of length at least (r - 1)(s - 1) + 1 contains a monotonically increasing subsequence of length r a monotonically decreasing subsequence of length s (This is the Erdős–Szekeres theorem). * A metric space (X,d) is compact if every sequence in X has a convergent subsequence whose limit is in X.


See also

* * *


Notes

{{PlanetMath attribution, id=3300, title=subsequence Elementary mathematics Sequences and series