protein superfamily
   HOME

TheInfoList



A protein superfamily is the largest grouping (
clade A clade (), also known as a monophyletic group or natural group, is a group of organism In biology, an organism () is any organic, life, living system that functions as an individual entity. All organisms are composed of cells (cell t ...

clade
) of
protein Proteins are large s and s that comprise one or more long chains of . Proteins perform a vast array of functions within organisms, including , , , providing and , and from one location to another. Proteins differ from one another primarily ...

protein
s for which
common ancestry Common descent is a concept in evolutionary biology Evolutionary biology is the subfield of biology that studies the evolution, evolutionary processes (natural selection, common descent, speciation) that produced the Biodiversity, diversity ...
can be inferred (see homology). Usually this common ancestry is inferred from
structural alignment s from humans and the fly Drosophila melanogaster. The proteins are shown as ribbons, with the human protein in red, and the fly protein in yellow. Generated from PD3TRXan1XWC Structural alignment attempts to establish Sequence homology, homology ...
and mechanistic similarity, even if no sequence similarity is evident.
Sequence homology Sequence homology is the homology (biology), biological homology between DNA sequence, DNA, RNA sequence, RNA, or Protein primary structure, protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments o ...
can then be deduced even if not apparent (due to low sequence similarity). Superfamilies typically contain several
protein families A protein family is a group of evolutionarily-related proteins. In many cases a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term ''protein family'' should not be co ...
which show sequence similarity within each family. The term ''protein clan'' is commonly used for
protease A protease (also called a peptidase or proteinase) is an enzyme Enzymes () are s that act as s (biocatalysts). Catalysts accelerate . The molecules upon which enzymes may act are called , and the enzyme converts the substrates into differe ...

protease
and
glycosyl hydrolase 1HNY, a glycoside hydrolase Glycoside hydrolases (also called glycosidases or glycosyl hydrolases) catalysis, catalyze the hydrolysis of glycosidic bonds in polysaccharide, complex sugars. They are extremely common enzymes with roles in nature incl ...
s superfamilies based on the
MEROPS MEROPS is an online databaseAn online database is a database A database is an organized collection of data Data are units of information Information can be thought of as the resolution of uncertainty; it answers the question of "What ...
and
CAZy CAZy is a database of Carbohydrate-Active enZYmes (CAZymes). The database contains a classification and associated information about enzymes involved in the synthesis, metabolism, and recognition of complex carbohydrates, i.e. disaccharides, oligo ...
classification systems.


Identification

Superfamilies of proteins are identified using a number of methods. Closely related members can be identified by different methods to those needed to group the most evolutionarily divergent members.


Sequence similarity

Historically, the similarity of different amino acid sequences has been the most common method of inferring homology. Sequence similarity is considered a good predictor of relatedness, since similar sequences are more likely the result of
gene duplicationGene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution Molecular evolution is the process of change in the sequence composition of ce ...
and
divergent evolution Divergence is a function that associates a scalar with every point of a vector field. Divergence or Divergent may also refer to: Mathematics * Divergence (computer science), a computation which does not terminate (or terminates in an exception ...
, rather than the result of
convergent evolution Convergent evolution is the independent evolution Evolution is change in the heritable Heredity, also called inheritance or biological inheritance, is the passing on of Phenotypic trait, traits from parents to their offspring; eithe ...
. Amino acid sequence is typically more conserved than DNA sequence (due to the degenerate genetic code), so is a more sensitive detection method. Since some of the amino acids have similar properties (e.g., charge, hydrophobicity, size),
conservative mutation A conservative replacement (also called a conservative mutation or a conservative substitution) is an amino acid replacement in a protein that changes a given amino acid Amino acids are organic compounds that contain amino (–NH2) and Carbox ...
s that interchange them are often neutral to function. The most conserved sequence regions of a protein often correspond to functionally important regions like catalytic sites and binding sites, since these regions are less tolerant to sequence changes. Using sequence similarity to infer homology has several limitations. There is no minimum level of sequence similarity guaranteed to produce identical structures. Over long periods of evolution, related proteins may show no detectable sequence similarity to one another. Sequences with many insertions and deletions can also sometimes be difficult to and so identify the homologous sequence regions. In the
PA clan The PA clan (protease, Proteases of mixed nucleophile, Protein superfamily, superfamily A) is the largest group of proteases with common ancestry as identified by structural homology. Members have a chymotrypsin-like fold and similar proteolysis me ...
of
protease A protease (also called a peptidase or proteinase) is an enzyme Enzymes () are s that act as s (biocatalysts). Catalysts accelerate . The molecules upon which enzymes may act are called , and the enzyme converts the substrates into differe ...

protease
s, for example, not a single residue is conserved through the superfamily, not even those in the
catalytic triad A catalytic triad is a set of three coordinated amino acids that can be found in the active site of some enzymes. Catalytic triads are most commonly found in hydrolase and transferase enzymes (e.g. proteases, amidases, esterases, acylases, lipase ...

catalytic triad
. Conversely, the individual families that make up a superfamily are defined on the basis of their sequence alignment, for example the C04 protease family within the PA clan. Nevertheless, sequence similarity is the most commonly used form of evidence to infer relatedness, since the number of known sequences vastly outnumbers the number of known tertiary structures. In the absence of structural information, sequence similarity constrains the limits of which proteins can be assigned to a superfamily.


Structural similarity

Structure A structure is an arrangement and organization of interrelated elements in a material object or system A system is a group of Interaction, interacting or interrelated elements that act according to a set of rules to form a unified whole. A ...

Structure
is much more evolutionarily conserved than sequence, such that proteins with highly similar structures can have entirely different sequences. Over very long evolutionary timescales, very few residues show detectable amino acid sequence conservation, however
secondary structural Biomolecular structure is the intricate folded, three-dimensional shape that is formed by a molecule of protein, DNA, or RNA, and that is important to its function. The structure of these molecules may be considered at any of several length scale ...
elements and tertiary structural motifs are highly conserved. Some protein dynamics and conformational changes of the protein structure may also be conserved, as is seen in the serpin superfamily. Consequently, protein tertiary structure can be used to detect homology between proteins even when no evidence of relatedness remains in their sequences. Structural alignment programs, such as Families of structurally similar proteins, DALI, use the 3D structure of a protein of interest to find proteins with similar folds. However, on rare occasions, related proteins may evolve to be structurally dissimilar and relatedness can only be inferred by other methods.


Mechanistic similarity

The catalytic mechanism of enzymes within a superfamily is commonly conserved, although substrate (biochemistry), substrate specificity may be significantly different. Catalytic residues also tend to occur in the same order in the protein sequence. For the families within the PA clan of proteases, although there has been divergent evolution of the
catalytic triad A catalytic triad is a set of three coordinated amino acids that can be found in the active site of some enzymes. Catalytic triads are most commonly found in hydrolase and transferase enzymes (e.g. proteases, amidases, esterases, acylases, lipase ...

catalytic triad
residues used to perform catalysis, all members use a similar mechanism to perform covalent catalysis, covalent, nucleophilic catalysis on proteins, peptides or amino acids. However, mechanism alone is not sufficient to infer relatedness. Some catalytic mechanisms have been convergently evolved multiple times independently, and so form separate superfamilies, and in some superfamilies display a range of different (though often chemically similar) mechanisms.


Evolutionary significance

Protein superfamilies represent the current limits of our ability to identify common ancestry. They are the largest evolutionary grouping based on direct evidence that is currently possible. They are therefore amongst the most ancient evolutionary events currently studied. Some superfamilies have members present in all Kingdom (biology), kingdoms of life, indicating that the last common ancestor of that superfamily was in the last universal common ancestor of all life (LUCA). Superfamily members may be in different species, with the ancestral protein being the form of the protein that existed in the ancestral species (Sequence homology#Orthology, orthology). Conversely, the proteins may be in the same species, but evolved from a single protein whose gene was gene duplication, duplicated in the genome (Sequence homology#Paralogy, paralogy).


Diversification

A majority of proteins contain multiple domains. Between 66-80% of eukaryotic proteins have multiple domains while about 40-60% of prokaryotic proteins have multiple domains. Over time, many of the superfamilies of domains have mixed together. In fact, it is very rare to find “consistently isolated superfamilies”. When domains do combine, the N- to C-terminal domain order (the "domain architecture") is typically well conserved. Additionally, the number of domain combinations seen in nature is small compared to the number of possibilities, suggesting that selection acts on all combinations.


Examples

α/β hydrolase, α/β hydrolase superfamily - Members share an α/β sheet, containing 8 beta sheet, strands connected by α helix, helices, with
catalytic triad A catalytic triad is a set of three coordinated amino acids that can be found in the active site of some enzymes. Catalytic triads are most commonly found in hydrolase and transferase enzymes (e.g. proteases, amidases, esterases, acylases, lipase ...

catalytic triad
residues in the same order, activities include proteases, lipases, peroxidases, esterases, epoxide hydrolases and dehalogenases. Alkaline phosphatase, Alkaline phosphatase superfamily - Members share an αβα sandwich structure as well as performing common enzyme promiscuity, promiscuous reactions by a common mechanism. Globin, Globin superfamily - Members share an 8-alpha helix globular globin fold. Immunoglobulin superfamily - Members share a sandwich-like structure of two β sheet, sheets of antiparallel β strands (immunoglobulin fold, Ig-fold), and are involved in recognition, binding, and cell adhesion, adhesion.
PA clan The PA clan (protease, Proteases of mixed nucleophile, Protein superfamily, superfamily A) is the largest group of proteases with common ancestry as identified by structural homology. Members have a chymotrypsin-like fold and similar proteolysis me ...
- Members share a chymotrypsin-like double β-barrel fold and similar proteolysis mechanisms but sequence identity of <10%. The clan contains both Cysteine protease, cysteine and serine proteases (different nucleophiles). Ras superfamily - Members share a common catalytic G domain of a 6-strand β sheet surrounded by 5 α-helices. Serpin superfamily - Members share a high-energy, stressed fold which can undergo a large conformational change, which is typically used to inhibit serine protease, serine and cysteine proteases by disrupting their structure. TIM barrel, TIM barrel superfamily - Members share a large α8β8 barrel structure. It is one of the most common protein folds and the Monophyly, monophylicity of this superfamily is still contested.


Protein superfamily resources

Several biological databases document protein superfamilies and protein folds, for example: *Pfam - Protein families database of alignments and HMMs *PROSITE - Database of protein domains, families and functional sites *InterPro, PIRSF - SuperFamily Classification System * PASS2 - Protein Alignment as Structural Superfamilies v2 *SUPERFAMILY - Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms * Structural Classification of Proteins, SCOP and CATH - Classifications of protein structures into superfamilies, families and domains Similarly there are algorithms that search the Protein Data Bank, PDB for proteins with structural homology to a target structure, for example: *Structural alignment#DALI, DALI - Structural alignment based on a distance alignment matrix method


See also

*Structural alignment *Protein domains *Protein family *Protein mimetic *Protein structure *Homology (biology) *Interolog *List of gene families *SUPERFAMILY *CATH


References


External links

* {{Enzymes Molecular evolution Protein families, * Protein folds, * Protein classification Protein superfamilies,