In
mathematics
Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
, a subsequence of a given
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
is a sequence that can be derived from the given sequence by deleting some or no elements without changing the order of the remaining elements. For example, the sequence
is a subsequence of
obtained after removal of elements
and
The relation of one sequence being the subsequence of another is a
partial order
In mathematics, especially order theory, a partial order on a set is an arrangement such that, for certain pairs of elements, one precedes the other. The word ''partial'' is used to indicate that not every pair of elements needs to be comparable ...
.
Subsequences can contain consecutive elements which were not consecutive in the original sequence. A subsequence which consists of a consecutive run of elements from the original sequence, such as
from
is a
substring. The substring is a refinement of the subsequence.
The list of all subsequences for the word "apple" would be "''a''", "''ap''", "''al''", "''ae''", "''app''", "''apl''", "''ape''", "''ale''", "''appl''", "''appe''", "''aple''", "''apple''", "''p''", "''pp''", "''pl''", "''pe''", "''ppl''", "''ppe''", "''ple''", "''pple''", "''l''", "''le''", "''e''", "" (
empty string
In formal language theory, the empty string, or empty word, is the unique String (computer science), string of length zero.
Formal theory
Formally, a string is a finite, ordered sequence of character (symbol), characters such as letters, digits ...
).
Common subsequence
Given two sequences
and
a sequence
is said to be a ''common subsequence'' of
and
if
is a subsequence of both
and
For example, if
then
is said to be a common subsequence of
and
This would be the ''
longest common subsequence'', since
only has length 3, and the common subsequence
has length 4. The longest common subsequence of
and
is
Applications
Subsequences have applications to
computer science
Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
,
[In computer science, '' string'' is often used as a synonym for ''sequence'', but it is important to note that '' substring'' and ''subsequence'' are not synonyms. Substrings are ''consecutive'' parts of a string, while subsequences need not be. This means that a substring of a string is always a subsequence of the string, but a subsequence of a string is not always a substring of the string, see: ] especially in the discipline of
bioinformatics
Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
, where computers are used to compare, analyze, and store
DNA
Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
,
RNA
Ribonucleic acid (RNA) is a polymeric molecule that is essential for most biological functions, either by performing the function itself (non-coding RNA) or by forming a template for the production of proteins (messenger RNA). RNA and deoxyrib ...
, and
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
sequences.
Take two sequences of DNA containing 37 elements, say:
:
SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA
:
SEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA
The longest common subsequence of sequences 1 and 2 is:
:
LCS(SEQ1,SEQ2) = CGTTCGGCTATGCTTCTACTTATTCTA
This can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences:
:
SEQ1 = AGGTGAGGAG
:
SEQ2 = CTAGTTAGTA
Another way to show this is to ''align'' the two sequences, that is, to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) for padding of arisen empty subsequences:
:
SEQ1 = ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA-
:
, , , , , , , , , , , , , , , , , , , , , , , , , , ,
:
SEQ2 = -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAA
Subsequences are used to determine how similar the two strands of DNA are, using the DNA bases:
adenine
Adenine (, ) (nucleoside#List of nucleosides and corresponding nucleobases, symbol A or Ade) is a purine nucleotide base that is found in DNA, RNA, and Adenosine triphosphate, ATP. Usually a white crystalline subtance. The shape of adenine is ...
,
guanine
Guanine () (symbol G or Gua) is one of the four main nucleotide bases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside ...
,
cytosine
Cytosine () (symbol C or Cyt) is one of the four nucleotide bases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attac ...
and
thymine
Thymine () (symbol T or Thy) is one of the four nucleotide bases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine ...
.
Theorems
* Every infinite sequence of
real number
In mathematics, a real number is a number that can be used to measure a continuous one- dimensional quantity such as a duration or temperature. Here, ''continuous'' means that pairs of values can have arbitrarily small differences. Every re ...
s has an infinite
monotone subsequence (This is a lemma used in the
proof of the Bolzano–Weierstrass theorem).
* Every infinite
bounded sequence in
has a
convergent subsequence (This is the
Bolzano–Weierstrass theorem
In mathematics, specifically in real analysis, the Bolzano–Weierstrass theorem, named after Bernard Bolzano and Karl Weierstrass, is a fundamental result about convergence in a finite-dimensional Euclidean space \R^n. The theorem states that ea ...
).
* For all
integer
An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
s
and
every finite sequence of length at least
contains a monotonically increasing subsequence of length
a monotonically decreasing subsequence of length
(This is the
Erdős–Szekeres theorem).
* A metric space
is compact if every sequence in
has a convergent subsequence whose limit is in
.
See also
*
*
*
Notes
{{PlanetMath attribution, id=3300, title=subsequence
Elementary mathematics
Sequences and series