ProbCons is an open source probabilistic consistency-based multiple alignment of
amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
sequences. It is one of the most efficient protein
multiple sequence alignment
Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutio ...
programs, since it has repeatedly demonstrated a statistically significant advantage in accuracy over similar tools, including
Clustal
Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its a ...
and
MAFFT
In bioinformatics, MAFFT (for multiple alignment using fast Fourier transform) is a program used to create multiple sequence alignments of amino acid or nucleotide sequences. Published in 2002, the first version of MAFFT used an algorithm based on ...
.
Algorithm
The following describes the basic outline of the ProbCons algorithm.
Lecture "Bioinformatics II" at University of Freiburg
/ref>
Step 1: Reliability of an alignment edge
For every pair of sequences compute the probability that letters and are paired in an alignment that is generated by the model.
(Where is equal to 1 if and are in the alignment and 0 otherwise.)
Step 2: Maximum expected accuracy
The accuracy of an alignment with respect to another alignment is defined as the number of common aligned pairs divided by the length of the shorter sequence.
Calculate expected accuracy of each sequence:
This yields a maximum expected accuracy (MEA) alignment:
Step 3: Probabilistic Consistency Transformation
All pairs of sequences x,y from the set of all sequences are now re-estimated using all intermediate sequences z:
This step can be iterated.
Step 4: Computation of guide tree
Construct a guide tree by hierarchical clustering using MEA score as sequence similarity score. Cluster similarity is defined using weighted average over pairwise sequence similarity.
Step 5: Compute MSA
Finally compute the MSA using progressive alignment or iterative alignment.
See also
* Sequence alignment software
This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins.
Database sear ...
* Clustal
Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its a ...
* MUSCLE
Skeletal muscles (commonly referred to as muscles) are organs of the vertebrate muscular system and typically are attached by tendons to bones of a skeleton. The muscle cells of skeletal muscles are much longer than in the other types of muscl ...
* AMAP
AMAP is a multiple sequence alignment program based on sequence annealing. This approach consists of building up the multiple alignment one match at a time, thereby circumventing many of the problems of progressive alignment. The AMAP parameter ...
* T-Coffee
T-Coffee (Tree-based Consistency Objective Function for Alignment Evaluation) is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment. It can al ...
* Probalign Probalign is a sequence alignment tool that calculates a maximum expected accuracy alignment using partition function posterior probabilities. Base pair probabilities are estimated using an estimate similar to Boltzmann distribution. The partition ...
References
External links
*{{Official website, http://probcons.stanford.edu/
Computational phylogenetics