Profile HMMs
A profile HMM is a variant of an HMM relating specifically to biological sequences. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system, which can be used to align sequences and search databases for remotely homologous sequences. They capitalise on the fact that certain positions in a sequence alignment tend to have biases in which residues are most likely to occur, and are likely to differ in their probability of containing an insertion or a deletion. Capturing this information gives them a better ability to detect true homologs than traditional BLAST-based approaches, which penalise substitutions, insertions and deletions equally, regardless of where in an alignment they occur. Profile HMMs center around a linear set of match (M) states, with one state corresponding to each consensus column in a sequence alignment. Each M state emits a single residue (amino acid or nucleotide). The probability of emitting a particular residue is determined largely by the frequency at which that residue has been observed in that column of the alignment, but also incorporates prior information on patterns of residues that tend to co-occur in the same columns of sequence alignments. This string of match states emitting amino acids at particular frequencies are analogous to position specific score matrices or weight matrices. A profile HMM takes this modelling of sequence alignments further by modelling insertions and deletions, using I and D states, respectively. D states do not emit a residue, while I states do emit a residue. Multiple I states can occur consecutively, corresponding to multiple residues between consensus columns in an alignment. M, I and D states are connected by state transition probabilities, which also vary by position in the sequence alignment, to reflect the different frequencies of insertions and deletions across sequence alignments. The HMMER2 and HMMER3 releases used an architecture for building profile HMMs called the Plan 7 architecture, named after the seven states captured by the model. In addition to the three major states (M, I and D), six additional states capture non-homologous flanking sequence in the alignment. These 6 states collectively are important for controlling how sequences are aligned to the model e.g. whether a sequence can have multiple consecutive hits to the same model (in the case of sequences with multiple instances of the same domain).Programs in the HMMER package
The HMMER package consists of a collection of programs for performing functions using profile hidden Markov models. The programs include:Profile HMM building
*hmmbuild - construct profile HMMs from multiple sequence alignmentsHomology searching
*hmmscan - search protein sequences against a profile HMM database *hmmsearch - search profile HMMs against a sequence database *jackhmmer - iteratively search sequences against a protein database *nhmmer - search DNA/RNA queries against a DNA/RNA sequence database *nhmmscan - search nucleotide sequences against a nucleotide profile *phmmer - search protein sequences against a protein databaseOther functions
*hmmalign - align sequences to a profile HMM *hmmemit - produce sample sequences from a profile HMM *hmmlogo - produce data for an HMM logo from an HMM file The package contains numerous other specialised functions.The HMMER web server
In addition to the software package, the HMMER search function is available in the form of a web server. The service facilitates searches across a range of databases, including sequence databases such as UniProt,The HMMER3 release
The latest stable release of HMMER is version 3.0. HMMER3 is complete rewrite of the earlier HMMER2 package, with the aim of improving the speed of profile-HMM searches. Major changes are outlined below:Improvements in speed
A major aim of the HMMER3 project, started in 2004 was to improve the speed of HMMER searches. While profile HMM-based homology searches were more accurate than BLAST-based approaches, their slower speed limited their applicability. The main performance gain is due to a heuristic filter that finds high-scoring un-gapped matches within database sequences to a query profile. This heuristic results in a computation time comparable to BLAST with little impact on accuracy. Further gains in performance are due to a log-likelihood model that requires no calibration for estimatingImprovements in remote homology searching
The major advance in speed was made possible by the development of an approach for calculating the significance of results integrated over a range of possible alignments. In discovering remote homologs, alignments between query and hit proteins are often very uncertain. While most sequence alignment tools calculate match scores using only the best scoring alignment, HMMER3 calculates match scores by integrating across all possible alignments, to account for uncertainty in which alignment is best. HMMER sequence alignments are accompanied by posterior probability annotations, indicating which portions of the alignment have been assigned high confidence and which are more uncertain.DNA sequence comparison
A major improvement in HMMER3 was the inclusion of DNA/DNA comparison tools. HMMER2 only had functionality to compare protein sequences.Restriction to local alignments
While HMMER2 could perform local alignment (align a complete model to a subsequence of the target) and global alignment (align a complete model to a complete target sequence), HMMER3 only performs local alignment. This restriction is due to the difficulty in calculating the significance of hits when performing local/global alignments using the new algorithm.See also
* Hidden Markov model * Sequence alignment software *References
External links
*