HOME

TheInfoList



OR:

A protein contact map represents the distance between all possible
amino acid residue In chemistry, amines (, ) are organic compounds that contain carbon-nitrogen bonds. Amines are formed when one or more hydrogen atoms in ammonia are replaced by alkyl or aryl groups. The nitrogen atom in an amine possesses a lone pair of elec ...
pairs of a three-dimensional
protein structure Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid ...
using a binary two-dimensional
matrix Matrix (: matrices or matrixes) or MATRIX may refer to: Science and mathematics * Matrix (mathematics), a rectangular array of numbers, symbols or expressions * Matrix (logic), part of a formula in prenex normal form * Matrix (biology), the m ...
. For two residues i and j, the ij element of the matrix is 1 if the two residues are closer than a predetermined threshold, and 0 otherwise. Various contact definitions have been proposed: The distance between the Cα-Cα atom with threshold 6-12 Å; distance between Cβ-Cβ atoms with threshold 6-12 Å (Cα is used for
Glycine Glycine (symbol Gly or G; ) is an amino acid that has a single hydrogen atom as its side chain. It is the simplest stable amino acid. Glycine is one of the proteinogenic amino acids. It is encoded by all the codons starting with GG (G ...
); and distance between the side-chain centers of mass.


Overview

Contact maps provide a more reduced representation of a protein structure than its full 3D atomic coordinates. The advantage is that contact maps are invariant to rotations and translations. They are more easily predicted by
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
methods. It has also been shown that under certain circumstances (e.g. low content of erroneously predicted contacts) it is possible to reconstruct the 3D coordinates of a protein using its contact map. Contact maps are also used for protein
superimposition Superimposition is the placement of one thing over another, typically so that both are still evident. Superimpositions are often related to the mathematical procedure of superposition. Audio Superimposition (SI) during sound recording and repro ...
and to describe similarity between protein structures. They are either predicted from
protein sequence Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthe ...
or calculated from a given structure.


Contact map prediction

With the availability of high numbers of genomic sequences it becomes feasible to analyze such sequences for ''coevolving residues''. The effectiveness of this approach results from the fact that a mutation in position ''i'' of a protein is more likely to be associated with a mutation in position ''j'' than with a back-mutation in ''i'' if both positions are functionally coupled (e.g. by taking part in an enzymatic domain, or by being adjacent in a folded protein, or even by being adjacent in an oligomer of that protein). Several statistical methods exist to extract from a
multiple sequence alignment Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis an ...
such coupled residue pairs: observed versus expected frequencies of residue pairs (OMES); the McLachlan Based Substitution correlation (McBASC); statistical coupling analysis;
Mutual Information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual Statistical dependence, dependence between the two variables. More specifically, it quantifies the "Information conten ...
(MI) based methods; and recently direct coupling analysis (DCA).
Machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
algorithms have been able to enhance MSA analysis methods, especially for non-homologous proteins (ie. shallow MSA's). Predicted contact maps have been used in the prediction of
membrane proteins Membrane proteins are common proteins that are part of, or interact with, biological membranes. Membrane proteins fall into several broad categories depending on their location. Integral membrane proteins are a permanent part of a cell membrane ...
where helix-helix interactions are targeted.


HB Plot

Knowledge of the relationship between a
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
's structure and its dynamic behavior is essential for understanding protein function. The description of a protein three dimensional structure as a network of
hydrogen bonding In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...
interactions (HB plot) was introduced as a tool for exploring protein structure and function. By analyzing the network of tertiary interactions, the possible spread of information within a protein can be investigated. HB plot offers a simple way of analyzing protein
secondary structure Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common Protein structure#Secondary structure, secondary structural elements are alpha helix, alpha helices and beta ...
and
tertiary structure Protein tertiary structure is the three-dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains and the ...
.
Hydrogen bonds In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, covalently bonded to a mo ...
stabilizing secondary structural elements (
secondary hydrogen bond In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, covalently bonded to a mo ...
s) and those formed between distant
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
residues - defined as
tertiary hydrogen bond In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, covalently bonded to a mo ...
s - can be easily distinguished in HB plot, thus, amino acid residues involved in stabilizing
protein structure Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, which are the monomers of the polymer. A single amino acid ...
and function can be identified.


Features

The plot distinguishes between main chain-main chain, main chain-
side chain In organic chemistry and biochemistry, a side chain is a substituent, chemical group that is attached to a core part of the molecule called the "main chain" or backbone chain, backbone. The side chain is a hydrocarbon branching element of a mo ...
and side chain-side chain
hydrogen bonding In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, Covalent bond, covalently b ...
interactions. Bifurcated hydrogen bonds and multiple hydrogen bonds between
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although over 500 amino acids exist in nature, by far the most important are the 22 α-amino acids incorporated into proteins. Only these 22 a ...
residues; and intra- and interchain
hydrogen bonds In chemistry, a hydrogen bond (H-bond) is a specific type of molecular interaction that exhibits partial covalent character and cannot be described as a purely electrostatic force. It occurs when a hydrogen (H) atom, covalently bonded to a mo ...
are also indicated on the plots. Three classes of hydrogen bondings are distinguished by color-coding; short (distance smaller than 2.5 Ã… between donor and acceptor), intermediate (between 2.5 Ã… and 3.2 Ã…) and long hydrogen bonds (greater than 3.2 Ã…).


Secondary structure elements in HB plot

In representations of the HB plot, characteristic patterns of
secondary structure Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common Protein structure#Secondary structure, secondary structural elements are alpha helix, alpha helices and beta ...
elements can be recognised easily, as follows: #
Helices A helix (; ) is a shape like a cylindrical coil spring or the thread of a machine screw. It is a type of smoothness (mathematics), smooth space curve with tangent lines at a constant angle to a fixed axis. Helices are important in biology, as ...
can be identified as strips directly adjacent to the diagonal. #Antiparallel
beta sheet The beta sheet (β-sheet, also β-pleated sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a gene ...
s appear in HB plot as cross-diagonal. #Parallel
beta sheet The beta sheet (β-sheet, also β-pleated sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a gene ...
s appears in the HB plot as parallel to the diagonal. # Loops appear as breaks in the diagonal between the cross-diagonal
beta-sheet The beta sheet (β-sheet, also β-pleated sheet) is a common structural motif, motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone chain, backbon ...
motifs.


Examples of usage


Cytochrome P450s

The
cytochrome P450 Cytochromes P450 (P450s or CYPs) are a Protein superfamily, superfamily of enzymes containing heme as a cofactor (biochemistry), cofactor that mostly, but not exclusively, function as monooxygenases. However, they are not omnipresent; for examp ...
s (P450s) are
xenobiotic A xenobiotic is a chemical substance found within an organism that is not naturally produced or expected to be present within the organism. It can also cover substances that are present in much higher concentrations than are usual. Natural compo ...
-metabolizing
membrane A membrane is a selective barrier; it allows some things to pass through but stops others. Such things may be molecules, ions, or other small particles. Membranes can be generally classified into synthetic membranes and biological membranes. Bi ...
-bound
heme Heme (American English), or haem (Commonwealth English, both pronounced /Help:IPA/English, hi:m/ ), is a ring-shaped iron-containing molecule that commonly serves as a Ligand (biochemistry), ligand of various proteins, more notably as a Prostheti ...
-containing enzymes that use molecular
oxygen Oxygen is a chemical element; it has chemical symbol, symbol O and atomic number 8. It is a member of the chalcogen group (periodic table), group in the periodic table, a highly reactivity (chemistry), reactive nonmetal (chemistry), non ...
and electrons from NADPH cytochrome P450 reductase to oxidize their
substrate Substrate may refer to: Physical layers *Substrate (biology), the natural environment in which an organism lives, or the surface or medium on which an organism grows or is attached ** Substrate (aquatic environment), the earthy material that exi ...
s. CYP2B4, a member of the cytochrome P450 family is the only protein within this family, whose X-ray structure in both open 11 and closed form 12 is published. The comparison of the open and closed structures of CYP2B4 structures reveals large-scale conformational rearrangement between the two states, with the greatest conformational change around the residues 215-225, which is widely open in ligand-free state and shut after ligand binding; and the region around loop C near the heme. Examining the HB plot of the closed and open state of CYP2B4 revealed that the rearrangement of tertiary hydrogen bonds was in excellent agreement with the current knowledge of the cytochrome P450
catalytic cycle In chemistry, a catalytic cycle is a multistep reaction mechanism that involves a catalyst. The catalytic cycle is the main method for describing the role of catalysts in biochemistry, organometallic chemistry, bioinorganic chemistry, materials s ...
. The first step in
P450 Cytochromes P450 (P450s or CYPs) are a superfamily of enzymes containing heme as a cofactor that mostly, but not exclusively, function as monooxygenases. However, they are not omnipresent; for example, they have not been found in ''Escherichi ...
catalytic cycle is identified as substrate binding. Preliminary binding of a ligand near to the entrance breaks hydrogen bonds S212-E474, S207-H172 in the open form of CYP2B4 and hydrogen bonds E218-A102, Q215-L51 are formed that fix the entrance in the closed form as the HB plot reveals. The second step is the transfer of the first electron from
NADPH Nicotinamide adenine dinucleotide phosphate, abbreviated NADP or, in older notation, TPN (triphosphopyridine nucleotide), is a cofactor used in anabolic reactions, such as the Calvin cycle and lipid and nucleic acid syntheses, which require N ...
via an electron transfer chain. For the electron transfer a conformational change occurs that triggers interaction of the P450 with the NADPH cytochrome P450 reductase. Breaking of hydrogen bonds between S128-N287, S128-T291, L124-N287 and forming S96-R434, A116-R434, R125-I435, D82-R400 at the NADPH cytochrome P450 reductase
binding site In biochemistry and molecular biology, a binding site is a region on a macromolecule such as a protein that binds to another molecule with specificity. The binding partner of the macromolecule is often referred to as a ligand. Ligands may includ ...
—as seen in HB plot—transform CYP2B4 to a conformation state, where binding of NADPH cytochrome P450 reductase occurs. In the third step, oxygen enters CYP2B4 in the closed state - the state where newly formed hydrogen bonds S176-T300, H172-S304, N167-R308 open a tunnel which is exactly the size and shape of an
oxygen Oxygen is a chemical element; it has chemical symbol, symbol O and atomic number 8. It is a member of the chalcogen group (periodic table), group in the periodic table, a highly reactivity (chemistry), reactive nonmetal (chemistry), non ...
molecule.


Lipocalin family

The
lipocalin The lipocalins are a family of proteins which transport small hydrophobic molecules such as steroids, bilins, retinoids, and lipids, and most lipocalins are also able to bind to complexed iron (via siderophores or flavonoids) as well as heme. Th ...
family is a large and diverse family of proteins with functions as small
hydrophobic In chemistry, hydrophobicity is the chemical property of a molecule (called a hydrophobe) that is seemingly repelled from a mass of water. In contrast, hydrophiles are attracted to water. Hydrophobic molecules tend to be nonpolar and, thu ...
molecule transporters. Beta-lactoglobulin is a typical member of the lipocalin family. Beta-lactoglobulin was found to have a role in the transport of hydrophobic ligands such as
retinol Retinol, also called vitamin A1, is a fat-soluble vitamin in the vitamin A family that is found in food and used as a dietary supplement. Retinol or other forms of vitamin A are needed for vision, cellular development, maintenance of skin and ...
or
fatty acid In chemistry, in particular in biochemistry, a fatty acid is a carboxylic acid with an aliphatic chain, which is either saturated and unsaturated compounds#Organic chemistry, saturated or unsaturated. Most naturally occurring fatty acids have an ...
s. Its
crystal structure In crystallography, crystal structure is a description of ordered arrangement of atoms, ions, or molecules in a crystalline material. Ordered structures occur from intrinsic nature of constituent particles to form symmetric patterns that repeat ...
were determined .g. Qin, 1998!-- This is not a proper reference format --> with different ligands and in ligand-free form as well. The crystal structures determined so far reveal that the typical lipocalin contains eight-stranded antiparallel-barrel arranged to form a conical central cavity in which the hydrophobic ligand is bound. The structure of beta-lactoglobulin reveals that the barrel-form structure with the central cavity of the protein has an "entrance" surrounded by five beta-loops with centers around 26, 35, 63, 87, and 111, which undergo a conformational change during the ligand binding and close the cavity. The overall shape of beta-lactoglobulin is characteristic of the lipocalin family. In the absence of
alpha-helices An alpha helix (or α-helix) is a sequence of amino acids in a protein that are twisted into a coil (a helix). The alpha helix is the most common structural arrangement in the secondary structure of proteins. It is also the most extreme type of l ...
, the main diagonal almost disappears and the cross-diagonals representing the
beta-sheet The beta sheet (β-sheet, also β-pleated sheet) is a common structural motif, motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone chain, backbon ...
s dominate the plot. Relatively low number of tertiary hydrogen bonds can be found in the plot, with three high-density regions, one of which is connected to a loop at the residues around 63, a second is connected to the loop around 87, and a third region which is connected to the regions 26 and 35. The fifth loop around 111 is represented only one tertiary hydrogen bond in the HB plot. In the three-dimensional structure, tertiary hydrogen bonds are formed (1) near to the entrance, directly involved in conformational rearrangement during ligand binding; and (2) at the bottom of the "barrel". HB plots of the open and closed forms of beta-lactoglobulin are very similar, all unique motifs can be recognized in both forms. Difference in HB plots of open and ligand-bound form show few important individual changes in tertiary hydrogen bonding pattern. Especially, the formation of hydrogen bonds between Y20-E157 and S21-H161 in closed form might be crucial in conformational rearrangement. These hydrogen bonds lie at the bottom of the cavity, which suggests that the closure of the entrance of a lipocalin starts when a ligand reached the bottom of the cavity and broke hydrogen bonds R123-Y99, R123-T18, and V41-Q120. Lipocalins are known to have very low sequence similarity with high structural similarity. The only conserved regions are exactly the region around 20 and 160 with an unknown role.


See also

*
Ramachandran plot In biochemistry, a Ramachandran plot (also known as a Rama plot, a Ramachandran diagram or a �,ψplot), originally developed in 1963 by G. N. Ramachandran, C. Ramakrishnan, and V. Sasisekharan, is a way to visualize energetically allowed regio ...
*
Circuit topology The circuit topology of a folded linear polymer refers to the arrangement of its intra-molecular contacts. Examples of linear polymers with intra-molecular contacts are nucleic acids and proteins. Proteins fold via the formation of contacts of v ...
*
Structural classification of proteins The Structural Classification of Proteins (SCOP) database is a largely manual classification of protein structural domains based on similarities of their structures and amino acid sequences. A motivation for this classification is to determine t ...
*
CATH Cath may refer to: __NOTOC__ People * Cath Bishop (born 1971), British former rower and 2003 world champion * Cath Carroll (born 1960), British musician and music journalist * Cath Coffey (), one of the earliest members of British rap band Stereo ...
* HB plot *
Dot plot (bioinformatics) In bioinformatics a dot plot is a graphical method for comparing two biological sequences and identifying regions of close similarity after sequence alignment. It is a type of recurrence plot. History One way to visualize the similarity betwee ...
*
Self-similarity matrix In data analysis, the self-similarity matrix is a graphical representation of similar sequences in a data series. Similarity can be explained by different measures, like spatial distance ( distance matrix), correlation, or comparison of local hist ...


References


External links


DISTILL
mdash; prediction of protein structural features (including protein residue contact maps)
Structural Proteomics Tools
— includes amino acid contact maps
ProfCon
— prediction of inter-residue contacts
TMHcon
— prediction of helix-helix contacts specifically within the transmembrane parts of membrane proteins
TMhit
— A new transmembrane helix-helix interaction prediction method based on residue contacts{{dead link, date=July 2017

— A protein contact map prediction server

—A Tool for Protein Contact-Map Visualization in jerseysforcheapshop Proteomics Protein structure