Superfamily (proteins)
   HOME

TheInfoList



OR:

SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into
SCOP A ( or ) was a poet as represented in Old English poetry. The scop is the Old English counterpart of the Old Norse ', with the important difference that "skald" was applied to historical persons, and scop is used, for the most part, to designa ...
superfamilies. Domains are functional, structural, and evolutionary units that form proteins. Domains of common Ancestry are grouped into superfamilies. The domains and domain superfamilies are defined and described in SCOP. Superfamilies are groups of proteins which have structural evidence to support a common evolutionary ancestor but may not have detectable
sequence homology Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spe ...
.


Annotations

The SUPERFAMILY annotation is based on a collection of
hidden Markov models A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an obs ...
(HMM), which represent structural protein domains at the
SCOP A ( or ) was a poet as represented in Old English poetry. The scop is the Old English counterpart of the Old Norse ', with the important difference that "skald" was applied to historical persons, and scop is used, for the most part, to designa ...
superfamily level. A superfamily groups together domains which have an
evolutionary Evolution is change in the heredity, heritable Phenotypic trait, characteristics of biological populations over successive generations. These characteristics are the Gene expression, expressions of genes, which are passed on from parent to ...
relationship. The annotation is produced by scanning protein sequences from completely sequenced
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
s against the hidden Markov models. For each protein you can: * Submit sequences for SCOP classification * View domain organisation,
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Alig ...
s and protein sequence details For each genome you can: * Examine superfamily assignments,
phylogenetic tree A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
s, domain organisation lists and networks * Check for over- and under-represented superfamilies within a genome For each superfamily you can: * Inspect SCOP classification, functional annotation,
Gene Ontology The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and g ...
annotation, InterPro abstract and genome assignments * Explore
taxonomic Taxonomy is the practice and science of categorization or classification. A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types. ...
distribution of a superfamily across the tree of life All annotation, models and the database dump are freely available for download to everyone.


Features

Sequence Search Submit a protein or DNA sequence for SCOP superfamily and family level classification using the SUPERFAMILY HMM's. Sequences can be submitted either by raw input or by uploading a file, but all must be in
FASTA format In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes. The format a ...
. Sequences can be amino acids, a fixed frame nucleotide sequence, or all frames of a submitted nucleotide sequence. Up to 1000 sequences can be run at a time. Keyword Search Search the database using a superfamily, family, or species name plus a sequence, SCOP, PDB, or HMM ID's. A successful search yields the class, folds, superfamilies, families, and individual proteins matching the query. Domain Assignments The database has domain assignments, alignments, and architectures for completely sequence eukaryotic and prokaryotic organisms, plus sequence collections. Comparative Genomics Tools Browse unusual (over- and under-represented) superfamilies and families, adjacent domain pair lists and graphs, unique domain pairs, domain combinations, domain architecture
co-occurrence network Co-occurrence network, sometimes referred to as a semantic network, is a method to analyze text that includes a graphic graph visualization, visualization of potential ontology components#Relationships, relationships social relationship, between p ...
s, and domain distribution across taxonomic kingdoms for each organism. Genome Statistics For each genome: number of sequences, number of sequences with assignment, percentage of sequences with assignment, percentage total sequence coverage, number of domains assigned, number of superfamilies assigned, number of families assigned, average superfamily size, percentage produced by duplication, average sequence length, average length matched, number of domain pairs, and number of unique domain architectures. Gene Ontology Domain-centric
Gene Ontology The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and g ...
(GO) automatically annotated. Due to the growing gap between sequenced proteins and known functions of proteins, it is becoming increasingly important to develop a more automated method for functionally annotating proteins, especially for proteins with known domains. SUPERFAMILY uses protein-level GO annotations taken from the Genome Ontology Annotation (GOA) project, which offers high-quality GO annotations directly associated to proteins in the UniprotKB over a wide spectrum of species. SUPERFAMILY has generated GO annotations for evolutionarily closed domains (at the SCOP family level) and distant domains (at the SCOP superfamily level). Phenotype Ontology Domain-centric
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology or physical form and structure, its developmental processes, its biochemical and physiological proper ...
/anatomy ontology including Disease Ontology, Human Phenotype, Mouse Phenotype, Worm Phenotype, Yeast Phenotype, Fly Phenotype, Fly Anatomy, Zebrafish Anatomy, Xenopus Anatomy, and Arabidopsis Plant. Superfamily Annotation InterPro abstracts for over 1,000 superfamilies, and Gene Ontology (GO) annotation for over 700 superfamilies. This feature allows for the direct
annotation An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented in the margin of book pages. For anno ...
of key features, functions, and structures of a superfamily. Functional Annotation Functional annotation of SCOP 1.73 superfamilies. The SUPERFAMILY database uses a scheme of 50 detailed function categories which map to 7 general function categories, similar to the scheme used in the COG database. A general function assigned to a superfamily was used to reflect the major function for that superfamily. The general categories of function are: # Information: storage, maintenance of genetic code; DNA replication and repair; general
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, the fir ...
and
translation Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
. # Regulation: Regulation of gene expression and protein activity; information processing in response to environmental input;
signal transduction Signal transduction is the process by which a chemical or physical signal is transmitted through a cell as a series of molecular events, most commonly protein phosphorylation catalyzed by protein kinases, which ultimately results in a cellula ...
; general regulatory or receptor activity. #
Metabolism Metabolism (, from el, μεταβολή ''metabolē'', "change") is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run cell ...
:
Anabolic Anabolism () is the set of metabolic pathways that construct molecules from smaller units. These reactions require energy, known also as an endergonic process. Anabolism is the building-up aspect of metabolism, whereas catabolism is the breaking-do ...
and
catabolic Catabolism () is the set of metabolic pathways that breaks down molecules into smaller units that are either oxidized to release energy or used in other anabolic reactions. Catabolism breaks down large molecules (such as polysaccharides, lipids, ...
processes; cell maintenance and
homeostasis In biology, homeostasis (British English, British also homoeostasis) Help:IPA/English, (/hɒmɪə(ʊ)ˈsteɪsɪs/) is the state of steady internal, physics, physical, and chemistry, chemical conditions maintained by organism, living systems. Thi ...
; secondary metabolism. # Intra-cellular processes: cell motility and division;
cell death Cell death is the event of a biological cell ceasing to carry out its functions. This may be the result of the natural process of old cells dying and being replaced by new ones, as in programmed cell death, or may result from factors such as dis ...
; intra-cellular transport;
secretion 440px Secretion is the movement of material from one point to another, such as a secreted chemical substance from a cell or gland. In contrast, excretion is the removal of certain substances or waste products from a cell or organism. The classical ...
. # Extra-cellular processes: inter-, extr-cellular processes like cell adhesion; organismal process like blood clotting or the immune system. # General: General and multiple functions; interactions with
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
s,
lipid Lipids are a broad group of naturally-occurring molecules which includes fats, waxes, sterols, fat-soluble vitamins (such as vitamins A, D, E and K), monoglycerides, diglycerides, phospholipids, and others. The functions of lipids include ...
s,
small molecule Within the fields of molecular biology and pharmacology, a small molecule or micromolecule is a low molecular weight (≤ 1000 daltons) organic compound that may regulate a biological process, with a size on the order of 1 nm. Many drugs ar ...
s, and
ion An ion () is an atom or molecule with a net electrical charge. The charge of an electron is considered to be negative by convention and this charge is equal and opposite to the charge of a proton, which is considered to be positive by conven ...
s. # Other/Unknown: an unknown function,
viral protein A viral protein is both a component and a product of a virus. Viral proteins are grouped according to their functions, and groups of viral proteins include structural proteins, nonstructural proteins, regulatory proteins, and accessory proteins. Vi ...
s, or
toxin A toxin is a naturally occurring organic poison produced by metabolic activities of living cells or organisms. Toxins occur especially as a protein or conjugated protein. The term toxin was first used by organic chemist Ludwig Brieger (1849– ...
s. Each domain superfamily in SCOP classes a to g were manually annotated using this scheme and the information used was provided by
SCOP A ( or ) was a poet as represented in Old English poetry. The scop is the Old English counterpart of the Old Norse ', with the important difference that "skald" was applied to historical persons, and scop is used, for the most part, to designa ...
,
InterPro InterPro is a database of protein families, protein domains and functional sites in which identifiable features found in known proteins can be applied to new protein sequences in order to functionally characterise them. The contents of InterPro ...
,
Pfam Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 35.0, was released in November 2021 and contains 19,632 families. Uses ...
, Swiss Prot, and various literature sources. Phylogenetic Trees Create custom
phylogenetic tree A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
s by selecting 3 or more available genomes on the SUPERFAMILY site. Trees are generated using heuristic parsimony methods, and are based on protein domain architecture data for all genomes in SUPERFAMILY. Genome combinations, or specific clades, can be displayed as individual trees. Similar Domain Architectures This feature allows the user to find the 10 domain architectures which are most similar to the domain architecture of interest. Hidden Markov Models Produce SCOP domain assignments for a sequence using the SUPERFAMILY
hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
s. Profile Comparison Find remote domain matches when the HMM search fails to find a significant match. Profile comparison (PRC) for aligning and scoring two profile HMM's are used. Web Services Distributed Annotation Server and linking to SUPERFAMILY. Downloads Sequences, assignments, models, MySQL database, and scripts - updated weekly.


Use in Research

The SUPERFAMILY database has numerous research applications and has been used by many research groups for various studies. It can serve either as a database for proteins that the user wishes to examine with other methods, or to assign a function and structure to a novel or uncharacterized protein. One study found SUPERFAMILY to be very adept at correctly assigning an appropriate function and structure to a large number of domains of unknown function by comparing them to the databases hidden Markov models. Another study used SUPERFAMILY to generate a data set of 1,733 Fold superfamily domains (FSF) in use of a comparison of proteomes and functionomes for to identify the origin of cellular diversification.


References

{{Reflist


External links


SUPERFAMILY database

SCOP: Structural Classification of Proteins
Genetics in the United Kingdom Genome databases Protein classification Protein superfamilies