HOME

TheInfoList



OR:

In
biochemistry Biochemistry or biological chemistry is the study of chemical processes within and relating to living organisms. A sub-discipline of both chemistry and biology, biochemistry may be divided into three fields: structural biology, enzymology and ...
, a hypothetical protein is a
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
whose existence has been predicted, but for which there is a lack of experimental evidence that it is expressed
in vivo Studies that are ''in vivo'' (Latin for "within the living"; often not italicized in English) are those in which the effects of various biological entities are tested on whole, living organisms or cells, usually animals, including humans, and ...
. Sequencing of several genomes has resulted in numerous predicted open reading frames to which functions cannot be readily assigned. These proteins, either orphan or conserved hypothetical proteins, make up ~ 20% to 40% of proteins encoded in each newly sequenced genome. The real evidences for the hypothetical protein functioning in the metabolism of the organism can be predicted by comparing its sequence or structure homology by considering the conserved domain analysis. Even when there is enough evidence that the product of the gene is expressed, by techniques such as microarray and mass-spectrometry, it is difficult to assign a function to it given its lack of identity to protein sequences with annotated biochemical function. Nowadays, most protein sequences are inferred from computational analysis of genomic
DNA sequence DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
. Hypothetical proteins are created by
gene prediction In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functiona ...
software during
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
analysis. When the
bioinformatic Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combine ...
tool used for the gene identification finds a large
open reading frame In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible readin ...
without a characterised homologue in the protein database, it returns "hypothetical protein" as an annotation remark. The function of a hypothetical protein can be predicted by
domain Domain may refer to: Mathematics *Domain of a function, the set of input values for which the (total) function is defined **Domain of definition of a partial function **Natural domain of a partial function **Domain of holomorphy of a function * Do ...
homology searches with various confidence levels. Conserved domains are available in the hypothetical proteins which need to be compared with the known family domains by which hypothetical protein could be classified into particular protein families even though they have not been in vivo investigated. The function of hypothetical protein could also be predicted by homology modelling, in which hypothetical protein has to align with known protein sequence whose three dimensional structure is known and by modelling method if structure predicted then the capability of hypothetical protein to function could be ascertained computationally. Further, approaches to annotate function to hypothetical proteins include determination of 3-dimensional structure of these proteins by structural genomics initiatives, understanding the nature and mode of prosthetic group/metal ion binding, fold similarity with other proteins of known functions and annotating possible catalytic site and regulatory site. Structure prediction with biochemical function assessment by screening for various substrate is another promising approach to annotate function


See also

*
Domain of unknown function A domain of unknown function (DUF) is a protein domain that has no characterised function. These families have been collected together in the Pfam database using the prefix DUF followed by a number, with examples being DUF2992 and DUF1220. As of 201 ...


References

* * * * * * * * *


External links


ExPASy
Bioinformatics Protein classification {{protein-stub