TopFIND is the Termini oriented protein Function Inferred Database (TopFIND) is an integrated
knowledgebase
A knowledge base (KB) is a technology used to store complex structured and unstructured information used by a computer system. The initial use of the term was in connection with expert systems, which were the first knowledge-based systems.
Ori ...
focused on
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
termini, their formation by
protease
A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes (increases reaction rate or "speeds up") proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the ...
s and functional implications. It contains information about the processing and the processing state of proteins and functional implications thereof derived from research literature, contributions by the scientific community and biological
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
s.
Background
Among the most fundamental characteristics of a
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
are the N- and C-termini defining the start and end of the
polypeptide chain
Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides.
...
. While genetically encoded, protein termini isoforms are also often generated during
translation
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
, following which, termini are highly dynamic, being frequently trimmed at their ends by a large array of exopeptidases. Neo-termini can also be generated by endopeptidases after precise and limited proteolysis, termed processing. Necessary for the maturation of many proteins, processing can also occur afterwards, often resulting in dramatic functional consequences. Aberrant proteolysis can cause wide range of diseases like arthritis
or cancer.
Hence, proteolytic generation of pleiotrophic stable forms of proteins, the universal susceptibility of proteins to proteolysis, and its irreversibility, distinguishes proteolysis from many highly studied posttranslational modifications. Proteases are tightly interconnected in the protease web and their aberrant activity in disease can lead to diagnostic fragment profiles with characteristic protein termini. Following proteolysis, the newly formed protein termini can be further modified, a process that affects protein function and stability.
Knowledgebase content
TopFIND is a resource for comprehensive coverage of protein N- and C-termini discovered by all available in silico, in vitro as well as in vivo methodologies. It makes use of existing knowledge by seamless integration of data from
UniProt
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from ...
and
MEROPS
MEROPS is an online database for peptidases (also known as proteases, proteinases and proteolytic enzymes) and their inhibitors. The classification scheme for peptidases was published by Rawlings & Barrett in 1993, and that for protein inhibitor ...
and provides access to new data from community submission and manual literature curating. It renders modifications of
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
termini, such as acetylation and citrullination, easily accessible and searchable and provides the means to identify and analyse extend and distribution of terminal modifications across a protein. Since its inception TopFIND has been expanded to further species.
Data access
The data is presented to the user with a strong emphasis on the relation to curated background information and underlying evidence that led to the observation of a terminus, its modification or proteolytic cleavage. In brief the
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
information, its domain structure, protein termini, terminus modifications and proteolytic processing of and by other proteins is listed. All information is accompanied by
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
like its original source, method of identification, confidence measurement or related publication. A positional cross correlation evaluation matches termini and cleavage sites with protein features (such as amino acid variants) and domains to highlight potential effects and dependencies in a unique way. Also, a network view of all proteins showing their functional dependency as
protease
A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes (increases reaction rate or "speeds up") proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the ...
,
substrate or
protease inhibitor tied in with
protein interactions
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respond ...
is provided for the easy evaluation of network wide effects. A powerful yet user friendly filtering mechanism allows the presented data to be filtered based on parameters like methodology used, in vivo relevance, confidence or data source (e.g. limited to a single laboratory or publication). This provides means to assess physiological relevant data and to deduce functional information and hypotheses relevant to the bench scientist. In a later release analysis tools for the evaluation of proteolytic pathways in experimental data have been added.
See also
*
MEROPS
MEROPS is an online database for peptidases (also known as proteases, proteinases and proteolytic enzymes) and their inhibitors. The classification scheme for peptidases was published by Rawlings & Barrett in 1993, and that for protein inhibitor ...
*
UniProt
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from ...
*
Cytoscape
Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating with gene expression profiles and other state data. Additional features are available as plugins. Plugins are availabl ...
*
Computational genomics Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data (i.e., experimental data obtai ...
*
Metabolic network modelling
Metabolic network modelling, also known as metabolic network reconstruction or metabolic pathway analysis, allows for an in-depth insight into the molecular mechanisms of a particular organism. In particular, these models correlate the genome wi ...
*
Protein-protein interaction prediction
References
External links
TopFIND - main website and web interfaceHost institution websiteResearch group of Philipp Lange - inventor & core developer
Research Group of Christopher Overall - home of TopFIND
Merops - the peptidase databaseUniProt*
{{Proteases
Molecular biology
Biological databases
Bioinformatics software
Systems biology
Mathematical and theoretical biology
Protein domains
Protein families
Post-translational modification