HOME

TheInfoList



OR:

PatternHunter is a commercially available
homology Homology may refer to: Sciences Biology *Homology (biology), any characteristic of biological organisms that is derived from a common ancestor * Sequence homology, biological homology between DNA, RNA, or protein sequences *Homologous chrom ...
search instrument software that uses
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Alig ...
techniques. It was initially developed in the year 2002 by three scientists: Bin Ma, John Tramp and Ming Li. These scientists were driven by the desire to solve the problem that many investigators face during studies that involve
genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
and
proteomics Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
. These scientists realized that such studies greatly relied on homology studies that established short seed matches that were subsequently lengthened. Describing homologous genes was an essential part of most evolutionary studies and was crucial to the understanding of the evolution of gene families, the relationship between domains and families. Homologous genes could only be studied effectively using search tools that established like portions or local placement between two proteins or
nucleic acid Nucleic acids are biopolymers, macromolecules, essential to all known forms of life. They are composed of nucleotides, which are the monomers made of three components: a 5-carbon sugar, a phosphate group and a nitrogenous base. The two main cl ...
sequences. Homology was quantified by scores obtained from matching sequences, “mismatch and gap scores”.


Development

In
comparative genomics Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural lan ...
, for example, it is necessary to compare huge
chromosomes A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins are ...
such as those found in the human genome. However, the immense expansion of genomic data introduces a predicament in the available methods of carrying out homology searches. For instance, enlarging the seed size lowers sensitivity while reducing seed size reduces the speed of calculations. Several
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Alig ...
programs have been developed to determine homology between genes. These include
FASTA FASTA is a DNA and protein sequence alignment software package first described by David J. Lipman and William R. Pearson in 1985. Its legacy is the FASTA format which is now ubiquitous in bioinformatics. History The original FASTA program ...
, the
BLAST Blast or The Blast may refer to: * Explosion, a rapid increase in volume and release of energy in an extreme manner *Detonation, an exothermic front accelerating through a medium that eventually drives a shock front Film * ''Blast'' (1997 film) ...
family, QUASAR,
MUMmer Mummers' plays are folk plays performed by troupes of amateur actors, traditionally all male, known as mummers or guisers (also by local names such as ''rhymers'', ''pace-eggers'', ''soulers'', ''tipteerers'', ''wrenboys'', and ''galoshins''). ...
, SENSEI, SIM, and REPuter. They mostly use Smith-Waterman alignment technique, which compares bases against other bases, but is too slow. BLAST makes an improvement to this technique by establishing brief, precise seed matches that it later joins up to form longer alignments. However, when dealing with lengthy sequences, the above-mentioned techniques are extremely sluggish and required considerable memory sizes. SENSEI, however, is more efficient than the other methods, but is incompetent in other forms of alignment as its strength lies in handling ungapped alignments. The quality of the production from Megablast, on the other hand, is of poor quality and does not adapt well to large sequences. Techniques such as MUMmer and QUASAR employ suffix trees, which are supposed to handle exact matches. However, these methods can only apply to the comparison of sequences that display elevated similarities. All the above-mentioned problems necessitate the development of a fast reliable tool that can handle all types of sequences efficiently without consuming too many resources in a computer.


Approach

PatternHunter utilizes numerous seeds (tiny search strings) with optimal intervals between them. Searches that employ seeds are extremely fast because they only determine homology in places where hits are established. The sensitivity of a search string is greatly influenced by the amount of space between adjacent strings. Large seeds are unable to find isolated homologies, whereas small ones generate numerous arbitrary hits that delay computation. PatternHunter strikes a delicate balance in this area by providing optimal spacing between search strings. It uses alternate ''k'' (''k'' = 11) letters as seeds in contrast with BLAST, which utilizes successive ''k'' letters as seeds. The first stage in PatternHunter analysis entails a filtering phase where the program hunts for matches in k alternating points as denoted by the most advantageous pattern. The second stage is the alignment phase, which is identical to BLAST. In addition, it is possible to use more than one seed at a go with PatternHunter. This elevates the sensitivity of the tool without interfering with its speed.


Speed

PatternHunter takes a short time to analyze all types of sequences. On a modern computer, it can take a few seconds to handle
prokaryotic A prokaryote () is a Unicellular organism, single-celled organism that lacks a cell nucleus, nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek language, Greek wikt:πρό#Ancient Greek, πρό (, 'before') a ...
genomes, minutes to process ''
Arabidopsis thaliana ''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small flowering plant native to Eurasia and Africa. ''A. thaliana'' is considered a weed; it is found along the shoulders of roads and in disturbed land. A winter a ...
'' sequences and several hours to process a human chromosome. When compared to other tools, PatternHunter exhibits speeds that are approximately a hundred times faster than BLAST and Mega BLAST. These speeds are 3000-fold those attained from a Smith-Waterman algorithm. In addition, the program has a user-friendly interface that allows one to customize the search parameters.


Sensitivity

In terms of sensitivity, it is possible to attain the optimum sensitivity with PatternHunter while still retaining the same speed as a conventional BLAST search.


Specifications

The designing of PatternHunter uses
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
technology. Consequently, the program runs smoothly when installed in any Java 1.4 environments.


Future advances

Homology search is a very lengthy procedure that requires a lot of time. Challenges still remain in handling DNA-DNA searches as well as translated DNA-protein searches because of the vast sizes of databases and the tiny query that is used. PatternHunter has been improved to an upgraded PatternHunter II version, which hastens DNA-protein searches a hundredfold without altering the sensitivity. However, there are plans to improve PatternHunter to attain the high sensitivity of the Smith - Waterman tool while obtaining BLAST pace. A novel translated PatternHunter that intends to hasten tBLASTx. is also in the developmental stages.


References

{{reflist, refs= {{cite thesis , last=Joseph, first=Jacob M. , year=2012 , title=On the identification and investigation of homologous gene families, with particular emphasis on the accuracy of multidomain families , url=http://reports-archive.adm.cs.cmu.edu/anon/lane/CMU-CB-12-103.pdf , type=PhD , publisher=Carnegie Mellon University {{Cite journal , last1 = Li , first1 = M. , last2 = Ma , first2 = B. , last3 = Kisman , first3 = D. , last4 = Tromp , first4 = J. , title = PatternHunter II: Highly sensitive and fast homology search , journal = Genome Informatics. International Conference on Genome Informatics , volume = 14 , pages = 164–175 , year = 2003 , pmid = 15706531 {{cite web , first=Louxin, last=Zhang , title=Sequence Database Search Techniques I: Blast and PatternHunter tools , url=http://www.bii.a-star.edu.sg/docs/education/lsm5192_04/Sequence%20Database%20Search%20Techniques.pdf , accessdate=6 December 2013 {{cite journal , last1=Ma, first1=Bin , first2=John, last2=Tromp , first3=Ming, last3=Li , title=PatternHunter: Faster and More Sensitive Homology Search , journal=Bioinformatics , volume=18, number=2 , year=2002 , pages=440–445 , doi=10.1093/bioinformatics/18.3.440 , pmid=11934743 , doi-access=free {{cite web , title=PatternHunter Brochure , accessdate=30 November 2013 , url=http://www.bioinfor.com/images/stories/pdf/patternhunterbrochure.pdf , archive-url=https://web.archive.org/web/20131211041309/http://www.bioinfor.com/images/stories/pdf/patternhunterbrochure.pdf , archive-date=11 December 2013 , url-status=dead {{Cite journal , doi = 10.1016/0888-7543(91)90071-L , last1 = Pearson , first1 = W. R. , title = Searching protein sequence libraries: Comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms , journal = Genomics , volume = 11 , issue = 3 , pages = 635–650 , year = 1991 , pmid = 1774068 {{cite book , last=Pevsner, first=Jonathan , title=Bioinformatics and Functional Genomics , url=https://archive.org/details/bioinformaticsfu00pevs_0, url-access=registration, location=New Jersey, publisher=Wiley Blackwell , year=2009 , isbn=9780470451489 , edition=2nd Bioinformatics_software