Probalign
   HOME

TheInfoList



OR:

Probalign is a sequence alignment tool that calculates a maximum
expected accuracy Expected may refer to: *Expectation (epistemic) * Expected value *Expected shortfall *Expected utility hypothesis *Expected return *Expected loss Expected loss is the sum of the values of all possible losses, each multiplied by the probability of t ...
alignment using partition function posterior probabilities. Base pair probabilities are estimated using an estimate similar to
Boltzmann distribution In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability t ...
. The partition function is calculated using a
dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. I ...
approach.


Algorithm

The following describes the algorithm used by probalign to determine the base pair probabilities.Lecture "Bioinformatics II" at University of Freiburg
/ref>


Alignment score

To score an alignment of two sequences two things are needed: * a similarity function \sigma(x,y) (e.g. PAM,
BLOSUM In bioinformatics, the BLOSUM (BLOcks SUbstitution Matrix) matrix is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are used to score alignments between evolutionarily divergent protein sequences. They are based o ...
,...) * affine gap penalty: g(k) = \alpha + \beta k The score S(a) of an alignment a is defined as: S(a) = \sum_ \sigma(x_i,y_j) + \text Now the boltzmann weighted score of an alignment a is: e^ = e^ = \left( \prod_ e^ \right) \cdot e^ Where T is a scaling factor. The probability of an alignment assuming boltzmann distribution is given by Pr x,y= \frac Where Z is the partition function, i.e. the sum of the boltzmann weights of all alignments.


Dynamic Programming

Let Z_ denote the partition function of the prefixes x_0,x_1,...,x_i and y_0,y_1,...,y_j. Three different cases are considered: # Z^_: the partition function of all alignments of the two prefixes that end in a match. # Z^_: the partition function of all alignments of the two prefixes that end in an insertion (-,y_j). # Z^_: the partition function of all alignments of the two prefixes that end in a deletion (x_i,-). Then we have: Z_ = Z^_ + Z^_ + Z^_


Initialization

The matrixes are initialized as follows: * Z^_ = Z^_ = 0 * Z^_ = 1 * Z^_ = 0 * Z^_ = 0


Recursion

The partition function for the alignments of two sequences x and y is given by Z_, which can be recursively computed: * Z^_ = Z_ \cdot e^ * Z^_ = Z^_ \cdot e^ + Z^_ \cdot e^ + Z^_ \cdot e^ * Z^_ analogously


Base pair probability

Finally the probability that positions x_i and y_j form a base pair is given by: P(x_i - y_j, x,y) = \frac Z', i', j' are the respective values for the recalculated Z with inversed base pair strings.


See also

*
ProbCons ProbCons is an open source probabilistic consistency-based multiple alignment of amino acid sequences. It is one of the most efficient protein multiple sequence alignment programs, since it has repeatedly demonstrated a statistically significant adv ...
*
Multiple Sequence Alignment Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutio ...


References

{{Reflist


External links


Probalign Webservice
Sequence alignment algorithms