Subsequence
   HOME

TheInfoList



OR:

In
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
, a subsequence of a given
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
is a sequence that can be derived from the given sequence by deleting some or no elements without changing the order of the remaining elements. For example, the sequence \langle A,B,D \rangle is a subsequence of \langle A,B,C,D,E,F \rangle obtained after removal of elements C, E, and F. The relation of one sequence being the subsequence of another is a
preorder In mathematics, especially in order theory, a preorder or quasiorder is a binary relation that is reflexive and transitive. Preorders are more general than equivalence relations and (non-strict) partial orders, both of which are special c ...
. Subsequences can contain consecutive elements which were not consecutive in the original sequence. A subsequence which consists of a consecutive run of elements from the original sequence, such as \langle B,C,D \rangle, from \langle A,B,C,D,E,F \rangle, is a
substring In formal language theory and computer science, a substring is a contiguous sequence of characters within a string. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequenc ...
. The substring is a refinement of the subsequence. The list of all subsequences for the word "apple" would be "''a''", "''ap''", "''al''", "''ae''", "''app''", "''apl''", "''ape''", "''ale''", "''appl''", "''appe''", "''aple''", "''apple''", "''p''", "''pp''", "''pl''", "''pe''", "''ppl''", "''ppe''", "''ple''", "''pple''", "''l''", "''le''", "''e''", "" (
empty string In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special cas ...
).


Common subsequence

Given two sequences X and Y, a sequence Z is said to be a ''common subsequence'' of X and Y, if Z is a subsequence of both X and Y. For example, if X = \langle A,C,B,D,E,G,C,E,D,B,G \rangle \qquad \text Y = \langle B,E,G,J,C,F,E,K,B \rangle \qquad \text Z = \langle B,E,E \rangle. then Z is said to be a common subsequence of X and Y. This would be the ''
longest common subsequence A longest common subsequence (LCS) is the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring: unlike substrings, subsequences are not required to occupy conse ...
'', since Z only has length 3, and the common subsequence \langle B,E,E,B \rangle has length 4. The longest common subsequence of X and Y is \langle B,E,G,C,E,B \rangle.


Applications

Subsequences have applications to
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
,In computer science, '' string'' is often used as a synonym for ''sequence'', but it is important to note that ''
substring In formal language theory and computer science, a substring is a contiguous sequence of characters within a string. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequenc ...
'' and ''subsequence'' are not synonyms. Substrings are ''consecutive'' parts of a string, while subsequences need not be. This means that a substring of a string is always a subsequence of the string, but a subsequence of a string is not always a substring of the string, see:
especially in the discipline of
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
, where computers are used to compare, analyze, and store DNA,
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
, and
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
sequences. Take two sequences of DNA containing 37 elements, say: :SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA :SEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA The longest common subsequence of sequences 1 and 2 is: :LCS(SEQ1,SEQ2) = CGTTCGGCTATGCTTCTACTTATTCTA This can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences: :SEQ1 = AGGTGAGGAG :SEQ2 = CTAGTTAGTA Another way to show this is to ''align'' the two sequences, that is, to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) for padding of arisen empty subsequences: :SEQ1 = ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA- :        ,  , ,  , , ,  , , , , ,  ,   ,  ,   ,  , ,  ,   , ,  ,  , ,  ,   , , , :SEQ2 = -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAA Subsequences are used to determine how similar the two strands of DNA are, using the DNA bases:
adenine Adenine () ( symbol A or Ade) is a nucleobase (a purine derivative). It is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The three others are guanine, cytosine and thymine. Its derivati ...
,
guanine Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is called ...
,
cytosine Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine (uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached (an am ...
and
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nu ...
.


Theorems

* Every infinite sequence of
real number In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every real ...
s has an infinite
monotone Monotone refers to a sound, for example music or speech, that has a single unvaried tone. See: monophony. Monotone or monotonicity may also refer to: In economics *Monotone preferences, a property of a consumer's preference ordering. *Monotonic ...
subsequence (This is a lemma used in the proof of the Bolzano–Weierstrass theorem). * Every infinite
bounded sequence In mathematics, a function ''f'' defined on some set ''X'' with real or complex values is called bounded if the set of its values is bounded. In other words, there exists a real number ''M'' such that :, f(x), \le M for all ''x'' in ''X''. A ...
in \R^n has a convergent subsequence (This is the
Bolzano–Weierstrass theorem In mathematics, specifically in real analysis, the Bolzano–Weierstrass theorem, named after Bernard Bolzano and Karl Weierstrass, is a fundamental result about convergence in a finite-dimensional Euclidean space \R^n. The theorem states that each ...
). * For all
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
s r and s, every finite sequence of length at least (r - 1)(s - 1) + 1 contains a monotonically increasing subsequence of length r a monotonically decreasing subsequence of length s (This is the
Erdős–Szekeres theorem In mathematics, the Erdős–Szekeres theorem asserts that, given ''r'', ''s,'' any sequence of distinct real numbers with length at least (''r'' − 1)(''s'' − 1) + 1 contains a monotonically increasing su ...
). * A metric space (X,d) is compact if every sequence in X has a convergent subsequence whose limit is in X.


See also

* * *


Notes

{{PlanetMath attribution, id=3300, title=subsequence Elementary mathematics Sequences and series