Probalign
   HOME
*





Probalign
Probalign is a sequence alignment tool that calculates a maximum expected accuracy alignment using partition function posterior probabilities. Base pair probabilities are estimated using an estimate similar to Boltzmann distribution. The partition function is calculated using a dynamic programming approach. Algorithm The following describes the algorithm used by probalign to determine the base pair probabilities. Alignment score To score an alignment of two sequences two things are needed: * a similarity function \sigma(x,y) (e.g. PAM, BLOSUM,...) * affine gap penalty: g(k) = \alpha + \beta k The score S(a) of an alignment a is defined as: S(a) = \sum_ \sigma(x_i,y_j) + \text Now the boltzmann weighted score of an alignment a is: e^ = e^ = \left( \prod_ e^ \right) \cdot e^ Where T is a scaling factor. The probability of an alignment assuming boltzmann distribution is given by Pr x,y= \frac Where Z is the partition function, i.e. the sum of the boltzmann weights ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


ProbCons
ProbCons is an open source probabilistic consistency-based multiple alignment of amino acid sequences. It is one of the most efficient protein multiple sequence alignment programs, since it has repeatedly demonstrated a statistically significant advantage in accuracy over similar tools, including Clustal and MAFFT. Algorithm The following describes the basic outline of the ProbCons algorithm. Step 1: Reliability of an alignment edge For every pair of sequences compute the probability that letters x_i and y_i are paired in a^* an alignment that is generated by the model. \begin P(x_i \sim y_i, x,y) & \stackrel Pr x,y\\ & = \sum_ Pr x,y\ & = \sum_ \mathbf\ Pr x,y\end (Where \mathbf\ is equal to 1 if x_i and y_i are in the alignment and 0 otherwise.) Step 2: Maximum expected accuracy The accuracy of an alignment a^* with respect to another alignment a is defined as the number of common aligned pairs divided by the length of the shorter sequence. Calculate expected accuracy of each ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Expected Accuracy
Expected may refer to: *Expectation (epistemic) * Expected value *Expected shortfall *Expected utility hypothesis *Expected return *Expected loss Expected loss is the sum of the values of all possible losses, each multiplied by the probability of that loss occurring. In bank lending (homes, autos, credit cards, commercial lending, etc.) the expected loss on a loan varies over time for a num ... ;See also * Unexpected (other) * Expected value (other) {{disambig ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Boltzmann Distribution
In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability that a system will be in a certain state as a function of that state's energy and the temperature of the system. The distribution is expressed in the form: :p_i \propto e^ where is the probability of the system being in state , is the energy of that state, and a constant of the distribution is the product of the Boltzmann constant and thermodynamic temperature . The symbol \propto denotes proportionality (see for the proportionality constant). The term ''system'' here has a very wide meaning; it can range from a collection of 'sufficient number' of atoms or a single atom to a macroscopic system such as a natural gas storage tank. Therefore the Boltzmann distribution can be used to solve a very wide variety of problems. The distribu ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Dynamic Programming
Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. In both contexts it refers to simplifying a complicated problem by breaking it down into simpler sub-problems in a recursive manner. While some decision problems cannot be taken apart this way, decisions that span several points in time do often break apart recursively. Likewise, in computer science, if a problem can be solved optimally by breaking it into sub-problems and then recursively finding the optimal solutions to the sub-problems, then it is said to have ''optimal substructure''. If sub-problems can be nested recursively inside larger problems, so that dynamic programming methods are applicable, then there is a relation between the value of the larger problem and the values of the sub-problems.Cormen, T. H.; Leiserson, C. E.; Rives ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

PAM Matrix
A point accepted mutation — also known as a PAM — is the replacement of a single amino acid in the primary structure of a protein with another single amino acid, which is accepted by the processes of natural selection. This definition does not include all point mutations in the DNA of an organism. In particular, silent mutations are not point accepted mutations, nor are mutations that are lethal or that are rejected by natural selection in other ways. A PAM matrix is a matrix where each column and row represents one of the twenty standard amino acids. In bioinformatics, PAM matrices are sometimes used as substitution matrices to score sequence alignments for proteins. Each entry in a PAM matrix indicates the likelihood of the amino acid of that row being replaced with the amino acid of that column through a series of one or more point accepted mutations during a specified evolutionary interval, rather than these two amino acids being aligned due to chance. Different PAM matri ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

BLOSUM
In bioinformatics, the BLOSUM (BLOcks SUbstitution Matrix) matrix is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are used to score alignments between evolutionarily divergent protein sequences. They are based on local alignments. BLOSUM matrices were first introduced in a paper by Steven Henikoff and Jorja Henikoff. They scanned the BLOCKS database for very conserved regions of protein families (that do not have gaps in the sequence alignment) and then counted the relative frequencies of amino acids and their substitution probabilities. Then, they calculated a log-odds score for each of the 210 possible substitution pairs of the 20 standard amino acids. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins like the PAM Matrices. Biological background The genetic instructions of every replicating cell in a living organism are contained within its DNA. Throughout the cell's ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Multiple Sequence Alignment
Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations (single amino acid or nucleotide changes) that appear as differing characters in a single alignment column, and insertion or deletion mutations (indels or gaps) that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acid ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]