ProbCons
   HOME
*





ProbCons
ProbCons is an open source probabilistic consistency-based multiple alignment of amino acid sequences. It is one of the most efficient protein multiple sequence alignment programs, since it has repeatedly demonstrated a statistically significant advantage in accuracy over similar tools, including Clustal and MAFFT. Algorithm The following describes the basic outline of the ProbCons algorithm. Step 1: Reliability of an alignment edge For every pair of sequences compute the probability that letters x_i and y_i are paired in a^* an alignment that is generated by the model. \begin P(x_i \sim y_i, x,y) & \stackrel Pr x,y\\ & = \sum_ Pr x,y\ & = \sum_ \mathbf\ Pr x,y\end (Where \mathbf\ is equal to 1 if x_i and y_i are in the alignment and 0 otherwise.) Step 2: Maximum expected accuracy The accuracy of an alignment a^* with respect to another alignment a is defined as the number of common aligned pairs divided by the length of the shorter sequence. Calculate expected accuracy of each ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Clustal
Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its algorithm are also detailed in their respective categories. Available operating systems listed in the sidebar are a combination of the software availability and may not be supported for every current version of the Clustal tools. Clustal Omega has the widest variety of operating systems out of all the Clustal tools. History There have been many variations of the Clustal software, all of which are listed below: * Clustal: The original software for multiple sequence alignments, created by Des Higgins in 1988, was based on deriving phylogenetic trees from pairwise sequences of amino acids or nucleotides. * ClustalV: The second generation of the Clustal software was released in 1992 and was a rewrite of the original Clustal package. It intr ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


T-Coffee
T-Coffee (Tree-based Consistency Objective Function for Alignment Evaluation) is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment. It can also combine multiple sequences alignments obtained previously and in the latest versions can use structural information from PDB files (3D-Coffee). It has advanced features to evaluate the quality of the alignments and some capacity for identifying occurrence of motifs (Mocca). It produces alignment in the aln format (Clustal) by default, but can also produce PIR, MSF, and FASTA format. The most common input formats are supported (FASTA, PIR). Algorithm T-Coffee algorithm consist of two main features, the first by utilizing heterogeneous data sources it is able to provide simple and flexible means of generating multiple alignments. T-coffee can compute multiple alignments using a library that was generated using a mixture of local and glo ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Sequence Alignment Software
This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins. Database search only *Sequence type: protein or nucleotide Pairwise alignment *Sequence type: protein or nucleotide **Alignment type: local or global Multiple sequence alignment *Sequence type: protein or nucleotide. **Alignment type: local or global Genomics analysis *Sequence type: protein or nucleotide Motif finding *Sequence type: protein or nucleotide Benchmarking Alignment viewers, editors Please see List of alignment visualization software. Short-read sequence alignment See also * List of open source bioinformatics software References {{Reflist Sequence Sequence alignment software This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple seque ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

MUSCLE (alignment Software)
MUltiple Sequence Comparison by Log-Expectation (MUSCLE) is computer software for multiple sequence alignment of protein and nucleotide sequences. It is licensed as public domain. The method was published by Robert C. Edgar in two papers in 2004. The first paper, published in ''Nucleic Acids Research'', introduced the sequence alignment algorithm. The second paper, published in ''BMC Bioinformatics'', presented more technical details. Algorithm The MUSCLE algorithm proceeds in three stages: the ''draft progressive'', ''improved progressive'', and ''refinement'' stage. Stage 1: Draft Progressive In this first stage, the algorithm produces a multiple alignment, emphasizing speed over accuracy. This step begins by computing the k-mer distance for every pair of input sequences to create a distance matrix. UPGMA clusters the distance matrix to produce a binary tree. From this tree a progressive alignment is constructed, beginning with the creation of profiles for each leaf of the tree ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Probalign
Probalign is a sequence alignment tool that calculates a maximum expected accuracy alignment using partition function posterior probabilities. Base pair probabilities are estimated using an estimate similar to Boltzmann distribution. The partition function is calculated using a dynamic programming approach. Algorithm The following describes the algorithm used by probalign to determine the base pair probabilities. Alignment score To score an alignment of two sequences two things are needed: * a similarity function \sigma(x,y) (e.g. PAM, BLOSUM,...) * affine gap penalty: g(k) = \alpha + \beta k The score S(a) of an alignment a is defined as: S(a) = \sum_ \sigma(x_i,y_j) + \text Now the boltzmann weighted score of an alignment a is: e^ = e^ = \left( \prod_ e^ \right) \cdot e^ Where T is a scaling factor. The probability of an alignment assuming boltzmann distribution is given by Pr x,y= \frac Where Z is the partition function, i.e. the sum of the boltzmann weights ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Amino Acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha amino acids appear in the genetic code. Amino acids can be classified according to the locations of the core structural functional groups, as Alpha and beta carbon, alpha- , beta- , gamma- or delta- amino acids; other categories relate to Chemical polarity, polarity, ionization, and side chain group type (aliphatic, Open-chain compound, acyclic, aromatic, containing hydroxyl or sulfur, etc.). In the form of proteins, amino acid '' residues'' form the second-largest component (water being the largest) of human muscles and other tissues. Beyond their role as residues in proteins, amino acids participate in a number of processes such as neurotransmitter transport and biosynthesis. It is thought that they played a key role in enabling life ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Multiple Sequence Alignment
Multiple sequence alignment (MSA) may refer to the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a linkage and are descended from a common ancestor. From the resulting MSA, sequence homology can be inferred and phylogenetic analysis can be conducted to assess the sequences' shared evolutionary origins. Visual depictions of the alignment as in the image at right illustrate mutation events such as point mutations (single amino acid or nucleotide changes) that appear as differing characters in a single alignment column, and insertion or deletion mutations (indels or gaps) that appear as hyphens in one or more of the sequences in the alignment. Multiple sequence alignment is often used to assess sequence conservation of protein domains, tertiary and secondary structures, and even individual amino acid ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

MAFFT
In bioinformatics, MAFFT (for multiple alignment using fast Fourier transform) is a program used to create multiple sequence alignments of amino acid or nucleotide sequences. Published in 2002, the first version of MAFFT used an algorithm based on progressive alignment, in which the sequences were clustered with the help of the Fast Fourier Transform. Subsequent versions of MAFFT have added other algorithms and modes of operation, including options for faster alignment of large numbers of sequences, higher accuracy alignments, alignment of non-coding RNA sequences, and the addition of new sequences to existing alignments. See also * Sequence alignment software * Clustal Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its a ... References External links * MAFFT Online ServerMAFFT ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




AMAP
AMAP is a multiple sequence alignment program based on sequence annealing. This approach consists of building up the multiple alignment one match at a time, thereby circumventing many of the problems of progressive alignment. The AMAP parameters can be used to tune the sensivitiy-specificity tradeoff. The program can be used through the AMAP web server or as a standalone program which can be installed with the source code. Input/Output This program accepts sequences in FASTA format. The output format includes: FASTA format, Clustal Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its a .... References {{reflist External links AMAP web server Bioinformatics software ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]