In mathematics, a subsequence of a given

sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...

is a sequence that can be derived from the given sequence by deleting some or no elements without changing the order of the remaining elements. For example, the sequence

\langle A,B,D \rangle

is a subsequence of

\langle A,B,C,D,E,F \rangle

obtained after removal of elements

C,

E,

and

F.

The relation of one sequence being the subsequence of another is a

preorder In mathematics, especially in order theory, a preorder or quasiorder is a binary relation that is reflexive and transitive. Preorders are more general than equivalence relations and (non-strict) partial orders, both of which are special c ...

. Subsequences can contain consecutive elements which were not consecutive in the original sequence. A subsequence which consists of a consecutive run of elements from the original sequence, such as

\langle B,C,D \rangle,

from

\langle A,B,C,D,E,F \rangle,

is a substring. The substring is a refinement of the subsequence. The list of all subsequences for the word "apple" would be "''a''", "''ap''", "''al''", "''ae''", "''app''", "''apl''", "''ape''", "''ale''", "''appl''", "''appe''", "''aple''", "''apple''", "''p''", "''pp''", "''pl''", "''pe''", "''ppl''", "''ppe''", "''ple''", "''pple''", "''l''", "''le''", "''e''", "" (

empty string In formal language theory, the empty string, or empty word, is the unique string of length zero. Formal theory Formally, a string is a finite, ordered sequence of characters such as letters, digits or spaces. The empty string is the special cas ...

Common subsequence

Given two sequences

X

and

Y,

a sequence

Z

is said to be a ''common subsequence'' of

X

and

Y,

Z

is a subsequence of both

X

and

Y.

For example, if

X = \langle A,C,B,D,E,G,C,E,D,B,G \rangle \qquad \text

Y = \langle B,E,G,J,C,F,E,K,B \rangle \qquad \text

Z = \langle B,E,E \rangle.

then

Z

is said to be a common subsequence of

X

and

Y.

This would be the '' longest common subsequence'', since

Z

only has length 3, and the common subsequence

\langle B,E,E,B \rangle

has length 4. The longest common subsequence of

X

and

Y

\langle B,E,G,C,E,B \rangle.

Applications

Subsequences have applications to

computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...

,In computer science, ''

string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...

'' is often used as a synonym for ''sequence'', but it is important to note that '' substring'' and ''subsequence'' are not synonyms. Substrings are ''consecutive'' parts of a string, while subsequences need not be. This means that a substring of a string is always a subsequence of the string, but a subsequence of a string is not always a substring of the string, see: especially in the discipline of

bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combin ...

, where computers are used to compare, analyze, and store DNA,

RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...

, and

protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respon ...

sequences. Take two sequences of DNA containing 37 elements, say: :SEQ₁ = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA :SEQ₂ = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA The longest common subsequence of sequences 1 and 2 is: :LCS_{(SEQ₁,SEQ₂)} = CGTTCGGCTATGCTTCTACTTATTCTA This can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences: :SEQ₁ = AGGTGAGGAG :SEQ₂ = CTAGTTAGTA Another way to show this is to ''align'' the two sequences, that is, to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) for padding of arisen empty subsequences: :SEQ₁ = ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA- : , , , , , , , , , , , , , , , , , , , , , , , , , , , :SEQ₂ = -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAA Subsequences are used to determine how similar the two strands of DNA are, using the DNA bases:

adenine Adenine () (symbol A or Ade) is a nucleobase (a purine derivative). It is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The three others are guanine, cytosine and thymine. Its derivativ ...

guanine Guanine () (symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine ( uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is c ...

cytosine Cytosine () (symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine ( uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached ...

and

thymine Thymine () (symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine n ...

Theorems

* Every infinite sequence of

real number In mathematics, a real number is a number that can be used to measurement, measure a ''continuous'' one-dimensional quantity such as a distance, time, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small var ...

s has an infinite monotone subsequence (This is a lemma used in the proof of the Bolzano–Weierstrass theorem). * Every infinite bounded sequence in

\R^n

has a convergent subsequence (This is the Bolzano–Weierstrass theorem). * For all

integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...

r

and

s,

every finite sequence of length at least

(r - 1)(s - 1) + 1

contains a monotonically increasing subsequence of length

r

a monotonically decreasing subsequence of length

s

(This is the Erdős–Szekeres theorem). * A metric space

(X,d)

is compact if every sequence in

X

has a convergent subsequence whose limit is in

X

Notes

{{PlanetMath attribution, id=3300, title=subsequence Elementary mathematics Sequences and series

Common subsequence

Applications

Theorems

See also

Notes