HOME

TheInfoList



OR:

ProbCons is an open source probabilistic consistency-based multiple alignment of
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
sequences. It is one of the most efficient protein
multiple sequence alignment Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutio ...
programs, since it has repeatedly demonstrated a statistically significant advantage in accuracy over similar tools, including
Clustal Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its a ...
and
MAFFT In bioinformatics, MAFFT (for multiple alignment using fast Fourier transform) is a program used to create multiple sequence alignments of amino acid or nucleotide sequences. Published in 2002, the first version of MAFFT used an algorithm based on ...
.


Algorithm

The following describes the basic outline of the ProbCons algorithm.Lecture "Bioinformatics II" at University of Freiburg
/ref>


Step 1: Reliability of an alignment edge

For every pair of sequences compute the probability that letters x_i and y_i are paired in a^* an alignment that is generated by the model. \begin P(x_i \sim y_i, x,y) & \stackrel Pr x,y\\ & = \sum_ Pr x,y\ & = \sum_ \mathbf\ Pr x,y\end (Where \mathbf\ is equal to 1 if x_i and y_i are in the alignment and 0 otherwise.)


Step 2: Maximum expected accuracy

The accuracy of an alignment a^* with respect to another alignment a is defined as the number of common aligned pairs divided by the length of the shorter sequence. Calculate expected accuracy of each sequence: \begin E_(acc(a^*,a)) & = \sum_Pr x,ycc(a^*,a) \\ & = \frac \cdot \sum_\mathbf\ Pr x,y\ & = \frac \cdot \sum_ P(x_i \sim y_j, x,y) \end This yields a maximum expected accuracy (MEA) alignment: E(x,y) = \arg\max_ \; E_(acc(a^*,a))


Step 3: Probabilistic Consistency Transformation

All pairs of sequences x,y from the set of all sequences \mathcal are now re-estimated using all intermediate sequences z: P'(x_i - y_i, x,y) = \frac \sum_ \sum_ P(x_i \sim z_i, x,z) \cdot P(z_i \sim y_i, z,y) This step can be iterated.


Step 4: Computation of guide tree

Construct a guide tree by hierarchical clustering using MEA score as sequence similarity score. Cluster similarity is defined using weighted average over pairwise sequence similarity.


Step 5: Compute MSA

Finally compute the MSA using progressive alignment or iterative alignment.


See also

*
Sequence alignment software This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins. Database sear ...
*
Clustal Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its a ...
*
MUSCLE Skeletal muscles (commonly referred to as muscles) are organs of the vertebrate muscular system and typically are attached by tendons to bones of a skeleton. The muscle cells of skeletal muscles are much longer than in the other types of muscl ...
*
AMAP AMAP is a multiple sequence alignment program based on sequence annealing. This approach consists of building up the multiple alignment one match at a time, thereby circumventing many of the problems of progressive alignment. The AMAP parameter ...
*
T-Coffee T-Coffee (Tree-based Consistency Objective Function for Alignment Evaluation) is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment. It can al ...
*
Probalign Probalign is a sequence alignment tool that calculates a maximum expected accuracy alignment using partition function posterior probabilities. Base pair probabilities are estimated using an estimate similar to Boltzmann distribution. The partition ...


References


External links

*{{Official website, http://probcons.stanford.edu/ Computational phylogenetics