In
mathematics, a subsequence of a given
sequence
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...
is a sequence that can be derived from the given sequence by deleting some or no elements without changing the order of the remaining elements. For example, the sequence
is a subsequence of
obtained after removal of elements
and
The relation of one sequence being the subsequence of another is a
preorder
In mathematics, especially in order theory, a preorder or quasiorder is a binary relation that is reflexive and transitive. Preorders are more general than equivalence relations and (non-strict) partial orders, both of which are special c ...
.
Subsequences can contain consecutive elements which were not consecutive in the original sequence. A subsequence which consists of a consecutive run of elements from the original sequence, such as
from
is a
substring. The substring is a refinement of the subsequence.
The list of all subsequences for the word "apple" would be "''a''", "''ap''", "''al''", "''ae''", "''app''", "''apl''", "''ape''", "''ale''", "''appl''", "''appe''", "''aple''", "''apple''", "''p''", "''pp''", "''pl''", "''pe''", "''ppl''", "''ppe''", "''ple''", "''pple''", "''l''", "''le''", "''e''", "" (
empty string
In formal language theory, the empty string, or empty word, is the unique string of length zero.
Formal theory
Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special cas ...
).
Common subsequence
Given two sequences
and
a sequence
is said to be a ''common subsequence'' of
and
if
is a subsequence of both
and
For example, if
then
is said to be a common subsequence of
and
This would be the ''
longest common subsequence'', since
only has length 3, and the common subsequence
has length 4. The longest common subsequence of
and
is
Applications
Subsequences have applications to
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
,
[In computer science, '']string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
'' is often used as a synonym for ''sequence'', but it is important to note that '' substring'' and ''subsequence'' are not synonyms. Substrings are ''consecutive'' parts of a string, while subsequences need not be. This means that a substring of a string is always a subsequence of the string, but a subsequence of a string is not always a substring of the string, see: especially in the discipline of
bioinformatics
Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...
, where computers are used to compare, analyze, and store
DNA,
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
, and
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respon ...
sequences.
Take two sequences of DNA containing 37 elements, say:
:
SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA
:
SEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA
The longest common subsequence of sequences 1 and 2 is:
:
LCS(SEQ1,SEQ2) = CGTTCGGCTATGCTTCTACTTATTCTA
This can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences:
:
SEQ1 = AGGTGAGGAG
:
SEQ2 = CTAGTTAGTA
Another way to show this is to ''align'' the two sequences, that is, to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) for padding of arisen empty subsequences:
:
SEQ1 = ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA-
:
, , , , , , , , , , , , , , , , , , , , , , , , , , ,
:
SEQ2 = -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAA
Subsequences are used to determine how similar the two strands of DNA are, using the DNA bases:
adenine
Adenine () (symbol A or Ade) is a nucleobase (a purine derivative). It is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The three others are guanine, cytosine and thymine. Its derivativ ...
,
guanine
Guanine () (symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is c ...
,
cytosine
Cytosine () (symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached ...
and
thymine
Thymine () (symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine n ...
.
Theorems
* Every infinite sequence of
real number
In mathematics, a real number is a number that can be used to measurement, measure a ''continuous'' one-dimensional quantity such as a distance, time, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small var ...
s has an infinite
monotone subsequence (This is a lemma used in the
proof of the Bolzano–Weierstrass theorem).
* Every infinite
bounded sequence in
has a
convergent subsequence (This is the
Bolzano–Weierstrass theorem).
* For all
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
s
and
every finite sequence of length at least
contains a monotonically increasing subsequence of length
a monotonically decreasing subsequence of length
(This is the
Erdős–Szekeres theorem).
* A metric space
is compact if every sequence in
has a convergent subsequence whose limit is in
.
See also
*
*
*
Notes
{{PlanetMath attribution, id=3300, title=subsequence
Elementary mathematics
Sequences and series