Structural Classification of Proteins database
   HOME

TheInfoList



OR:

The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their
structure A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such a ...
s and
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha ...
sequences. A motivation for this classification is to determine the evolutionary relationship between proteins. Proteins with the same shapes but having little sequence or functional similarity are placed in different superfamilies, and are assumed to have only a very distant common ancestor. Proteins having the same shape and some similarity of sequence and/or function are placed in "families", and are assumed to have a closer common ancestor. Similar to
CATH The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and coll ...
and
Pfam Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 35.0, was released in November 2021 and contains 19,632 families. Uses ...
databases, SCOP provides a classification of individual structural domains of proteins, rather than a classification of the entire proteins which may include a significant number of different domains. The SCOP database is freely accessible on the internet. SCOP was created in 1994 in the Centre for Protein Engineering and the
Laboratory of Molecular Biology The Medical Research Council (MRC) Laboratory of Molecular Biology (LMB) is a research institute in Cambridge, England, involved in the revolution in molecular biology which occurred in the 1950–60s. Since then it has remained a major medical r ...
. It was maintained by Alexey G. Murzin and his colleagues in the Centre for Protein Engineering until its closure in 2010 and subsequently at the Laboratory of Molecular Biology in Cambridge, England. The work on SCOP 1.75 has been discontinued in 2014. Since then SCOPe team from UC Berkeley has been responsible for updating the database in a compatible manner, with a combination of automated and manual methods. , the latest release is SCOPe 2.07 (March 2018). The new Structural Classification of Proteins version 2 (SCOP2) database was released at the beginning of 2020. The new update featured an improved database schema, a new API and modernised web interface. This was the most significant update by the Cambridge group since SCOP 1.75 and builds on the advances in schema from the SCOP 2 prototype.


Hierarchical organisation

The source of protein structures is the
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules, such as proteins and nucleic acids. The data, typically obtained by X-ray crystallography, NMR spectroscopy, or, increasingly, cr ...
. The unit of classification of structure in SCOP is the
protein domain In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of ...
. What the SCOP authors mean by "domain" is suggested by their statement that small proteins and most medium-sized ones have just one domain, and by the observation that human hemoglobin,; which has an α2β2 structure, is assigned two SCOP domains, one for the α and one for the β subunit. The shapes of domains are called "folds" in SCOP. Domains belonging to the same fold have the same major secondary structures in the same arrangement with the same topological connections. 1195 folds are given in SCOP version 1.75. Short descriptions of each fold are given. For example, the "globin-like" fold is described as ''core: 6 helices; folded leaf, partly opened''. The fold to which a domain belongs is determined by inspection, rather than by software. The levels of SCOP version 1.75 are as follows. #
Class Class or The Class may refer to: Common uses not otherwise categorized * Class (biology), a taxonomic rank * Class (knowledge representation), a collection of individuals or objects * Class (philosophy), an analytical concept used differently ...
: Types of folds, e.g., beta sheets. # Fold: The different shapes of domains within a class. #
Superfamily SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, str ...
: The domains in a fold are grouped into superfamilies, which have at least a distant common ancestor. #
Family Family (from la, familia) is a group of people related either by consanguinity (by recognized birth) or affinity (by marriage or other relationship). The purpose of the family is to maintain the well-being of its members and of society. Idea ...
: The domains in a superfamily are grouped into families, which have a more recent common ancestor. # Protein domain: The domains in families are grouped into protein domains, which are essentially the same protein. # Species: The domains in "protein domains" are grouped according to species. # Domain: part of a protein. For simple proteins, it can be the entire protein.


Classes

The broadest groups on SCOP version 1.75 are the
protein fold class In molecular biology, protein fold classes are broad categories of protein tertiary structure topology. They describe groups of proteins that share similar amino acid and secondary structure proportions. Each class contains multiple, independent ...
es. These classes group structures with similar secondary structure composition, but different overall tertiary structures and evolutionarily origins. This is the top level "root" of the SCOP hierarchical classification. # All alpha proteins 6456(284): ''Domains consisting of α-helices'' # All beta proteins 8724(174): ''Domains consisting of
β-sheets The beta sheet, (β-sheet) (also β-pleated sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a gen ...
'' # Alpha and beta proteins (a/b)
1349 Year 1349 ( MCCCXLIX) was a common year starting on Thursday (link will display the full calendar) of the Julian calendar. Events January–December * January 22 – An earthquake affects L'Aquila in southern Italy with a maximum M ...
(147): ''Mainly parallel beta sheets (beta-alpha-beta units)'' # Alpha and beta proteins (a+b) 3931(376): ''Mainly antiparallel beta sheets (segregated alpha and beta regions)'' # Multi-domain proteins (alpha and beta) 6572(66): ''Folds consisting of two or more domains belonging to different classes'' #
membrane A membrane is a selective barrier; it allows some things to pass through but stops others. Such things may be molecules, ions, or other small particles. Membranes can be generally classified into synthetic membranes and biological membranes. ...
and cell surface proteins and
peptide Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. ...
s 6835(58): ''Does not include proteins in the
immune system The immune system is a network of biological processes that protects an organism from diseases. It detects and responds to a wide variety of pathogens, from viruses to parasitic worms, as well as cancer cells and objects such as wood splinte ...
'' # Small proteins 6992(90): ''Usually dominated by metal ligand, cofactor, and/or disulfide bridges'' #
coiled-coil A coiled coil is a structural motif in proteins in which 2–7 alpha-helices are coiled together like the strands of a rope. (Dimers and trimers are the most common types.) Many coiled coil-type proteins are involved in important biological f ...
proteins 7942(7): ''Not a true class'' # Low resolution protein structures 8117(26): ''Peptides and fragments. Not a true class'' # Peptides 8231(121): ''peptides and fragments. Not a true class.'' # Designed proteins 8788(44): ''Experimental structures of proteins with essentially non-natural sequences. Not a true class'' The number in brackets, called a "sunid", is a SCOP unique integer identifier for each node in the SCOP hierarchy. The number in parentheses indicates how many elements are in each category. For example, there are 284 folds in the "All alpha proteins" class. Each member of the hierarchy is a link to the next level of the hierarchy.


Folds

Each class contains a number of distinct folds. This classification level indicates similar tertiary structure, but not necessarily evolutionary relatedness. For example, the "All-α proteins" class contains >280 distinct folds, including: Globin-like (core: 6 helices; folded leaf, partly opened), long alpha-hairpin (2 helices; antiparallel hairpin, left-handed twist) and Type I dockerin domains (tandem repeat of two calcium-binding loop-helix motifs, distinct from the EF-hand).


Superfamilies

Domains within a fold are further classified into superfamilies. This is a largest grouping of proteins for which
structural similarity The structural similarity index measure (SSIM) is a method for predicting the perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. SSIM is used for measuring the similarity between tw ...
is sufficient to indicate evolutionary relatedness and therefore share a common ancestor. However, this ancestor is presumed to be distant, because the different members of a superfamily have low sequence identities. For example, the two superfamilies of the "Globin-like" fold are: the Globin superfamily and alpha-helical ferredoxin superfamily (contains two Fe4-S4 clusters).


Families

Protein families A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be ...
are more closely related than superfamilies. Domains are placed in the same family if that have either: #>30% sequence identity #some sequence identity (e.g., 15%) ''and'' perform the same function The similarity in sequence and structure is evidence that these proteins have a closer evolutionary relationship than do proteins in the same superfamily. Sequence tools, such as
BLAST Blast or The Blast may refer to: *Explosion, a rapid increase in volume and release of energy in an extreme manner *Detonation, an exothermic front accelerating through a medium that eventually drives a shock front Film * ''Blast'' (1997 film), ...
, are used to assist in placing domains into superfamilies and families. For example, the four families in the "globin-like" superfamily of the "globin-like" fold are truncated hemoglobin (lack the first helix), nerve tissue mini-hemoglobin (lack the first helix but otherwise is more similar to conventional globins than the truncated ones), globins (Heme-binding protein), and phycocyanin-like
phycobilisome Phycobilisomes are light harvesting antennae of photosystem II in cyanobacteria, red algae and glaucophytes. It was lost in the plastids of green algae / plants (chloroplasts). General structure Phycobilisomes are protein complexes (up to 6 ...
proteins (oligomers of two different types of globin-like subunits containing two extra helices at the
N-terminus The N-terminus (also known as the amino-terminus, NH2-terminus, N-terminal end or amine-terminus) is the start of a protein or polypeptide, referring to the free amine group (-NH2) located at the end of a polypeptide. Within a peptide, the ami ...
binds a bilin
chromophore A chromophore is the part of a molecule responsible for its color. The color that is seen by our eyes is the one not absorbed by the reflecting object within a certain wavelength spectrum of visible light. The chromophore is a region in the mo ...
). Families in SCOP are each assigned a concise classification string, ''sccs'', where the letter identifies the class to which the domain belongs; the following integers identify the fold, superfamily, and family, respectively (e.g., a.1.1.2 for the "Globin" family).


PDB entry domains

A "TaxId" is the taxonomy ID number and links to the NCBI taxonomy browser, which provides more information about the species to which the protein belongs. Clicking on a species or isoform brings up a list of domains. For example, the "Hemoglobin, alpha-chain from Human (Homo sapiens)" protein has >190 solved protein structures, such as 2dn3 (complexed with cmo), and 2dn1 (complexed with hem, mbn, oxy). Clicking on the PDB numbers is supposed to display the structure of the molecule, but the links are currently broken (links work in pre-SCOP).


Example

Most pages in SCOP contain a search box. Entering "trypsin +human" retrieves several proteins, including the protein trypsinogen from humans. Selecting that entry displays a page that includes the "lineage", which is at the top of most SCOP pages. ;Human trypsonogen lineage # Root: scop # Class: All beta proteins 8724# Fold: Trypsin-like serine proteases 0493#:''barrel, closed; n=6, S=8; greek-key'' #:''duplication: consists of two domains of the same fold'' # Superfamily: Trypsin-like serine proteases 0494# Family: Eukaryotic proteases 0514# Protein: Trypsin(ogen) 0515# Species: Human (Homo sapiens) axId: 9606
0519 __NOTOC__ Year 519 (Roman numerals, DXIX) was a common year starting on Tuesday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Justin I, Iustinus and Eutharic, Cillica (o ...
Searching for "Subtilisin" returns the protein, "Subtilisin from Bacillus subtilis, carlsberg", with the following lineage. ;Subtilisin from Bacillus subtilis, carlsberg lineage # Root: scop # Class: Alpha and beta proteins (a/b)
1349 Year 1349 ( MCCCXLIX) was a common year starting on Thursday (link will display the full calendar) of the Julian calendar. Events January–December * January 22 – An earthquake affects L'Aquila in southern Italy with a maximum M ...
#:''Mainly parallel beta sheets (beta-alpha-beta units)'' # Fold: Subtilisin-like 2742#:''3 layers: a/b/a, parallel beta-sheet of 7 strands, order 2314567; left-handed crossover connection between strands 2 & 3'' # Superfamily: Subtilisin-like 2743# Family: Subtilases 2744# Protein: Subtilisin 2745# Species: Bacillus subtilis, carlsberg axId: 1423 2746 Although both of these proteins are proteases, they do not even belong to the same fold, which is consistent with them being an example of
convergent evolution Convergent evolution is the independent evolution of similar features in species of different periods or epochs in time. Convergent evolution creates analogous structures that have similar form or function but were not present in the last com ...
.


Comparison to other classification systems

SCOP classification is more dependent on manual decisions than the semi-automatic classification by
CATH The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and coll ...
, its chief rival. Human expertise is used to decide whether certain proteins are
evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
ary related and therefore should be assigned to the same
superfamily SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, str ...
, or their similarity is a result of structural constraints and therefore they belong to the same fold. Another database, FSSP, is purely automatically generated (including regular automatic updates) but offers no classification, allowing the user to draw their own conclusion as to the significance of structural relationships based on the pairwise comparisons of individual protein structures.


SCOP successors

By 2009, the original SCOP database manually classified 38,000 PDB entries into a strictly hierarchical structure. With the accelerating pace of protein structure publications, the limited automation of classification could not keep up, leading to a non-comprehensive dataset. The Structural Classification of Proteins extended (SCOPe) database was released in 2012 with far greater automation of the same hierarchical system and is full backwards compatible with SCOP version 1.75. In 2014, manual curation was reintroduced into SCOPe to maintain accurate structure assignment. As of February 2015, SCOPe 2.05 classified 71,000 of the 110,000 total PDB entries. SCOP2 prototype was a beta version of Structural classification of proteins and classification system that aimed to more the evolutionary complexity inherent in protein structure evolution. It is therefore not a simple hierarchy, but a
directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ...
network connecting protein superfamilies representing structural and evolutionary relationships such as circular permutations, domain fusion and domain decay. Consequently, domains are not separated by strict fixed boundaries, but rather are defined by their relationships to the most similar other structures. The prototype was used for the development of the SCOP version 2 database. The SCOP version 2, release January 2020, contains 5134 families and 2485 superfamilies compared to 3902 families and 1962 superfamilies in SCOP 1.75. The classification levels organise more than 41 000 non-redundant domains that represent more than 504 000 protein structures. The Evolutionary Classification of Protein Domains (ECOD) database released in 2014 is a similar to SCOPe expansion of SCOP version 1.75. Unlike the compatible SCOPe, it renames the class-fold-superfamily-family hierarchy into an architecture-X-homology-topology-family (A-XHTF) grouping, with the last level mostly defined by
Pfam Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 35.0, was released in November 2021 and contains 19,632 families. Uses ...
and supplemented by HHsearch clustering for uncategorized sequences. ECOD has the best PDB coverage of all three successors: it covers ''every'' PDB structure, and is updated biweekly. The direct mapping to Pfam has proven useful to Pfam curators who use the homology-level category to supplement their "clan" grouping.


See also

*
Structural alignment Structural alignment attempts to establish homology between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large R ...
*
CATH The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains. It was created in the mid-1990s by Professor Christine Orengo and coll ...
* FSSP *
SUPERFAMILY SUPERFAMILY is a database and search platform of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP superfamilies. Domains are functional, str ...
*
Pfam Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. The most recent version, Pfam 35.0, was released in November 2021 and contains 19,632 families. Uses ...


References

{{Reflist


External links


Structural Classification of Proteins
(SCOP 2) - Manual classification of representative domains, regularly updated by the SCOP authors
Structural Classification of Proteins
(SCOP 1.75) - Legacy SCOP 1.75 site, no longer updated
Structural Classification of Proteins extended
(SCOPe) - The more automated successor of SCOP version 1.75
Evolutionary Classification of Protein Domains
(ECOD) - Evolutionary classification based on SCOP version 1.75 and Pfam
Structural Classification of Proteins 2
(SCOP2 prototype) - Legacy site of the SCOP 2 prototype, no longer updated
SUPERFAMILY
- Library of HMMs representing SCOP superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
Protein Structure Classification
- a book chapter that discusses different protein classifications in detail. Biological databases Medical Research Council (United Kingdom) Protein structure Protein folds Protein classification Protein superfamilies Science and technology in Cambridgeshire