Circular Permutation In Proteins
   HOME

TheInfoList



OR:

A circular permutation is a relationship between
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
s whereby the proteins have a changed order of
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
s in their
peptide sequence Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A ...
. The result is a
protein structure Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, the monomers of the polymer. A single amino acid monomer ma ...
with different connectivity, but overall similar three-dimensional (3D) shape. In 1979, the first pair of circularly permuted proteins –
concanavalin A Concanavalin A (ConA) is a lectin (carbohydrate-binding protein) originally extracted from the jack-bean (''Canavalia ensiformis''). It is a member of the legume lectin family. It binds specifically to certain structures found in various sugars, ...
and
lectin Lectins are carbohydrate-binding proteins that are highly specific for sugar groups that are part of other molecules, so cause agglutination of particular cells or precipitation of glycoconjugates and polysaccharides. Lectins have a role in rec ...
– were discovered; over 2000 such proteins are now known. Circular permutation can occur as the result of
evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
ary events,
posttranslational modification Post-translational modification (PTM) is the covalent and generally enzymatic modification of proteins following protein biosynthesis. This process occurs in the endoplasmic reticulum and the golgi apparatus. Proteins are synthesized by ribosome ...
s, or artificially engineered mutations. The two main models proposed to explain the evolution of circularly permuted proteins are ''permutation by duplication'' and ''fission and fusion''. Permutation by duplication occurs when a
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
undergoes
duplication Duplication, duplicate, and duplicator may refer to: Biology and genetics * Gene duplication, a process which can result in free mutation * Chromosomal duplication, which can cause Bloom and Rett syndrome * Polyploidy, a phenomenon also known ...
to form a
tandem repeat Tandem repeats occur in DNA when a pattern of one or more nucleotides is repeated and the repetitions are directly adjacent to each other. Several protein domains also form tandem repeats within their amino acid primary structure, such as armadil ...
, before redundant sections of the protein are removed; this relationship is found between
saposin Prosaposin, also known as PSAP, is a protein which in humans is encoded by the ''PSAP'' gene. This highly conserved glycoprotein is a precursor for 4 cleavage products: saposins A, B, C, and D. Saposin is an acronym for Sphingolipid Activator Pr ...
and swaposin. Fission and fusion occurs when partial proteins fuse to form a single polypeptide, such as in nicotinamide nucleotide transhydrogenases. Circular permutations are routinely engineered in the laboratory to improve their
catalytic activity Catalysis () is the process of increasing the rate of a chemical reaction by adding a substance known as a catalyst (). Catalysts are not consumed in the reaction and remain unchanged after it. If the reaction is rapid and the catalyst recyc ...
or
thermostability In materials science and molecular biology, thermostability is the ability of a substance to resist irreversible change in its chemical or physical structure, often by resisting decomposition or polymerization, at a high relative temperature. ...
, or to investigate properties of the original protein. Traditional
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s for
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Alig ...
and structure alignment are not able to detect circular permutations between proteins. New
non-linear In mathematics and science, a nonlinear system is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathematicians, and many other ...
approaches have been developed that overcome this and are able to detect
topology In mathematics, topology (from the Greek language, Greek words , and ) is concerned with the properties of a mathematical object, geometric object that are preserved under Continuous function, continuous Deformation theory, deformations, such ...
-independent similarities.


History

In 1979, Bruce Cunningham and his colleagues discovered the first instance of a circularly permuted protein in nature. After determining the peptide sequence of the
lectin Lectins are carbohydrate-binding proteins that are highly specific for sugar groups that are part of other molecules, so cause agglutination of particular cells or precipitation of glycoconjugates and polysaccharides. Lectins have a role in rec ...
protein favin, they noticed its similarity to a known protein –
concanavalin A Concanavalin A (ConA) is a lectin (carbohydrate-binding protein) originally extracted from the jack-bean (''Canavalia ensiformis''). It is a member of the legume lectin family. It binds specifically to certain structures found in various sugars, ...
 – except that the ends were circularly permuted. Later work confirmed the circular permutation between the pair and showed that concanavalin A is permuted post-translationally through cleavage and an unusual protein ligation. After the discovery of a natural circularly permuted protein, researchers looked for a way to emulate this process. In 1983, David Goldenberg and Thomas Creighton were able to create a circularly permuted version of a protein by chemically ligating the termini to create a cyclic protein, then introducing new termini elsewhere using
trypsin Trypsin is an enzyme in the first section of the small intestine that starts the digestion of protein molecules by cutting these long chains of amino acids into smaller pieces. It is a serine protease from the PA clan superfamily, found in the dig ...
. In 1989,
Karolin Luger Karolin Luger is an Austrian-American biochemist and biophysicist known for her work with nucleosomes and discovery of the three-dimensional structure of chromatin. She is a University Distinguished Professor at Colorado State University in Fort C ...
and her colleagues introduced a genetic method for making circular permutations by carefully fragmenting and ligating DNA. This method allowed for permutations to be introduced at arbitrary sites. Despite the early discovery of post-translational circular permutations and the suggestion of a possible genetic mechanism for evolving circular permutants, it was not until 1995 that the first circularly permuted pair of genes were discovered.
Saposin Prosaposin, also known as PSAP, is a protein which in humans is encoded by the ''PSAP'' gene. This highly conserved glycoprotein is a precursor for 4 cleavage products: saposins A, B, C, and D. Saposin is an acronym for Sphingolipid Activator Pr ...
s are a class of proteins involved in
sphingolipid Sphingolipids are a class of lipids containing a backbone of sphingoid bases, a set of aliphatic amino alcohols that includes sphingosine. They were discovered in brain extracts in the 1870s and were named after the mythological sphinx because ...
catabolism and
antigen presentation Antigen presentation is a vital immune process that is essential for T cell immune response triggering. Because T cells recognize only fragmented antigens displayed on cell surfaces, antigen processing must occur before the antigen fragment, now ...
of
lipid Lipids are a broad group of naturally-occurring molecules which includes fats, waxes, sterols, fat-soluble vitamins (such as vitamins A, D, E and K), monoglycerides, diglycerides, phospholipids, and others. The functions of lipids include ...
s in humans. Chris Ponting and Robert Russell identified a circularly permuted version of a saposin inserted into plant
aspartic proteinase Aspartic proteases are a catalytic type of protease enzymes that use an activated water molecule bound to one or more aspartate residues for catalysis of their peptide substrates. In general, they have two highly conserved aspartates in the active ...
, which they nicknamed swaposin. Saposin and swaposin were the first known case of two natural genes related by a circular permutation. Hundreds of examples of protein pairs related by a circular permutation were subsequently discovered in nature or produced in the laboratory. As of February 2012, the Circular Permutation Database contains 2,238 circularly permuted protein pairs with known structures, and many more are known without structures. The CyBase database collects proteins that are cyclic, some of which are permuted variants of cyclic wild-type proteins. SISYPHUS is a database that contains a collection of hand-curated manual alignments of proteins with non-trivial relationships, several of which have circular permutations.


Evolution

There are two main models that are currently being used to explain the evolution of circularly permuted proteins: ''permutation by duplication'' and ''fission and fusion''. The two models have compelling examples supporting them, but the relative contribution of each model in evolution is still under debate. Other, less common, mechanisms have been proposed, such as "cut and paste" or "
exon shuffling Exon shuffling is a molecular mechanism for the formation of new genes. It is a process through which two or more exons from different genes can be brought together ectopically, or the same exon can be duplicated, to create a new exon-intron st ...
".


Permutation by duplication

The earliest model proposed for the evolution of circular permutations is the permutation by duplication mechanism. In this model, a precursor gene first undergoes a
duplication Duplication, duplicate, and duplicator may refer to: Biology and genetics * Gene duplication, a process which can result in free mutation * Chromosomal duplication, which can cause Bloom and Rett syndrome * Polyploidy, a phenomenon also known ...
and fusion to form a large
tandem repeat Tandem repeats occur in DNA when a pattern of one or more nucleotides is repeated and the repetitions are directly adjacent to each other. Several protein domains also form tandem repeats within their amino acid primary structure, such as armadil ...
. Next, start and stop codons are introduced at corresponding locations in the duplicated gene, removing redundant sections of the protein. One surprising prediction of the permutation by duplication mechanism is that intermediate permutations can occur. For instance, the duplicated version of the protein should still be functional, since otherwise evolution would quickly select against such proteins. Likewise, partially duplicated intermediates where only one terminus was truncated should be functional. Such intermediates have been extensively documented in protein families such as
DNA methyltransferase In biochemistry, the DNA methyltransferase (DNA MTase, DNMT) family of enzymes catalyze the transfer of a methyl group to DNA. DNA methylation serves a wide variety of biological functions. All the known DNA methyltransferases use S-adenosyl m ...
s.


Saposin and swaposin

An example for permutation by duplication is the relationship between saposin and swaposin. Saposins are highly conserved
glycoprotein Glycoproteins are proteins which contain oligosaccharide chains covalently attached to amino acid side-chains. The carbohydrate is attached to the protein in a cotranslational or posttranslational modification. This process is known as glycos ...
s, approximately 80 amino acid residues long and forming a four
alpha helical The alpha helix (α-helix) is a common motif in the secondary structure of proteins and is a right hand-helix conformation in which every backbone N−H group hydrogen bonds to the backbone C=O group of the amino acid located four residues ear ...
structure. They have a nearly identical placement of cysteine residues and glycosylation sites. The
cDNA In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a speci ...
sequence that codes for saposin is called
prosaposin Prosaposin, also known as PSAP, is a protein which in humans is encoded by the ''PSAP'' gene. This highly conserved glycoprotein is a precursor for 4 cleavage products: saposins A, B, C, and D. Saposin is an acronym for Sphingolipid Activator Pr ...
. It is a precursor for four cleavage products, the saposins A, B, C, and D. The four saposin domains most likely arose from two tandem duplications of an ancestral gene. This repeat suggests a mechanism for the evolution of the relationship with the
plant-specific insert The plant-specific insert (PSI) or plant-specific sequence (PSS) is an independent domain, exclusively found in plants, consisting of approximately 100 residues, found on the C-terminal lobe on some aspartic proteases (AP) called phytepsins. The ...
(PSI). The PSI is a domain exclusively found in plants, consisting of approximately 100 residues and found in plant aspartic proteases. It belongs to the saposin-like protein family (SAPLIP) and has the N- and C- termini "swapped", such that the order of helices is 3-4-1-2 compared with saposin, thus leading to the name "swaposin".


Fission and fusion

Another model for the evolution of circular permutations is the fission and fusion model. The process starts with two partial proteins. These may represent two independent polypeptides (such as two parts of a
heterodimer In biochemistry, a protein dimer is a macromolecular complex formed by two protein monomers, or single proteins, which are usually non-covalently bound. Many macromolecules, such as proteins or nucleic acids, form dimers. The word ''dimer'' has ...
), or may have originally been halves of a single protein that underwent a fission event to become two polypeptides. The two proteins can later fuse together to form a single polypeptide. Regardless of which protein comes first, this fusion protein may show similar function. Thus, if a fusion between two proteins occurs twice in evolution (either between
paralogues Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spec ...
within the same species or between
orthologues Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a sp ...
in different species) but in a different order, the resulting fusion proteins will be related by a circular permutation. Evidence for a particular protein having evolved by a fission and fusion mechanism can be provided by observing the halves of the permutation as independent polypeptides in related species, or by demonstrating experimentally that the two halves can function as separate polypeptides.


Transhydrogenases

An example for the fission and fusion mechanism can be found in nicotinamide nucleotide transhydrogenases. These are
membrane A membrane is a selective barrier; it allows some things to pass through but stops others. Such things may be molecules, ions, or other small particles. Membranes can be generally classified into synthetic membranes and biological membranes. B ...
-bound
enzyme Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products. A ...
s that catalyze the transfer of a hydride ion between NAD(H) and NADP(H) in a reaction that is coupled to transmembrane proton translocation. They consist of three major functional units (I, II, and III) that can be found in different arrangement in
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
,
protozoa Protozoa (singular: protozoan or protozoon; alternative plural: protozoans) are a group of single-celled eukaryotes, either free-living or parasitic, that feed on organic matter such as other microorganisms or organic tissues and debris. Histo ...
, and higher
eukaryote Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
s.
Phylogenetic analysis In biology, phylogenetics (; from Greek φυλή/ φῦλον [] "tribe, clan, race", and wikt:γενετικός, γενετικός [] "origin, source, birth") is the study of the evolutionary history and relationships among or within groups o ...
suggests that the three groups of domain arrangements were acquired and fused independently.


Other processes that can lead to circular permutations


Post-translational modification

The two evolutionary models mentioned above describe ways in which genes may be circularly permuted, resulting in a circularly permuted
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
after
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, the fir ...
. Proteins can also be circularly permuted via
post-translational modification Post-translational modification (PTM) is the covalent and generally enzymatic modification of proteins following protein biosynthesis. This process occurs in the endoplasmic reticulum and the golgi apparatus. Proteins are synthesized by ribosome ...
, without permuting the underlying gene. Circular permutations can happen spontaneously through
autocatalysis A single chemical reaction is said to be autocatalytic if one of the reaction products is also a catalyst for the same or a coupled reaction.Steinfeld J.I., Francisco J.S. and Hase W.L. ''Chemical Kinetics and Dynamics'' (2nd ed., Prentice-Hall 199 ...
, as in the case of
concanavalin A Concanavalin A (ConA) is a lectin (carbohydrate-binding protein) originally extracted from the jack-bean (''Canavalia ensiformis''). It is a member of the legume lectin family. It binds specifically to certain structures found in various sugars, ...
. Alternately, permutation may require
restriction enzyme A restriction enzyme, restriction endonuclease, REase, ENase or'' restrictase '' is an enzyme that cleaves DNA into fragments at or near specific recognition sites within molecules known as restriction sites. Restriction enzymes are one class o ...
s and
ligase In biochemistry, a ligase is an enzyme that can catalyze the joining (ligation) of two large molecules by forming a new chemical bond. This is typically via hydrolysis of a small pendant chemical group on one of the larger molecules or the enzym ...
s.


Role in protein engineering

Many proteins have their termini located close together in 3D space. Because of this, it is often possible to design circular permutations of proteins. Today, circular permutations are generated routinely in the lab using standard genetics techniques. Although some permutation sites prevent the protein from
folding Fold, folding or foldable may refer to: Arts, entertainment, and media * ''Fold'' (album), the debut release by Australian rock band Epicure * Fold (poker), in the game of poker, to discard one's hand and forfeit interest in the current pot *Abov ...
correctly, many permutants have been created with nearly identical structure and function to the original protein. The motivation for creating a circular permutant of a protein can vary. Scientists may want to improve some property of the protein, such as: * Reduce
proteolytic Proteolysis is the breakdown of proteins into smaller polypeptides or amino acids. Uncatalysed, the hydrolysis of peptide bonds is extremely slow, taking hundreds of years. Proteolysis is typically catalysed by cellular enzymes called proteases, ...
susceptibility. The rate at which proteins are broken down can have a large impact on their activity in cells. Since termini are often accessible to
protease A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes (increases reaction rate or "speeds up") proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the ...
s, designing a circularly permuted protein with less-accessible termini can increase the lifespan of that protein in the cell. * Improve
catalytic activity Catalysis () is the process of increasing the rate of a chemical reaction by adding a substance known as a catalyst (). Catalysts are not consumed in the reaction and remain unchanged after it. If the reaction is rapid and the catalyst recyc ...
. Circularly permuting a protein can sometimes increase the rate at which it catalyzes a chemical reaction, leading to more efficient proteins. * Alter substrate or
ligand binding In biochemistry and pharmacology, a ligand is a substance that forms a complex with a biomolecule to serve a biological purpose. The etymology stems from ''ligare'', which means 'to bind'. In protein-ligand binding, the ligand is usually a mo ...
. Circularly permuting a protein can result in the loss of substrate binding, but can occasionally lead to novel ligand binding activity or altered substrate specificity. (primary source) * Improve
thermostability In materials science and molecular biology, thermostability is the ability of a substance to resist irreversible change in its chemical or physical structure, often by resisting decomposition or polymerization, at a high relative temperature. ...
. Making proteins active over a wider range of temperatures and conditions can improve their utility. (primary source) Alternately, scientists may be interested in properties of the original protein, such as: * Fold order. Determining the order in which different parts of a protein fold is challenging due to the extremely fast time scales involved. Circularly permuted versions of proteins will often fold in a different order, providing information about the folding of the original protein. (primary source) * Essential structural elements. Artificial circularly permuted proteins can allow parts of a protein to be selectively deleted. This gives insight into which structural elements are essential or not. (primary source) * Modify
quaternary structure Protein quaternary structure is the fourth (and highest) classification level of protein structure. Protein quaternary structure refers to the structure of proteins which are themselves composed of two or more smaller protein chains (also refe ...
. Circularly permuted proteins have been shown to take on different quaternary structure than wild-type proteins. * Find insertion sites for other proteins. Inserting one protein as a domain into another protein can be useful. For instance, inserting
calmodulin Calmodulin (CaM) (an abbreviation for calcium-modulated protein) is a multifunctional intermediate calcium-binding messenger protein expressed in all eukaryotic cells. It is an intracellular target of the secondary messenger Ca2+, and the bind ...
into
green fluorescent protein The green fluorescent protein (GFP) is a protein that exhibits bright green fluorescence when exposed to light in the blue to ultraviolet range. The label ''GFP'' traditionally refers to the protein first isolated from the jellyfish ''Aequorea ...
(GFP) allowed researchers to measure the activity of calmodulin via the
fluorescence Fluorescence is the emission of light by a substance that has absorbed light or other electromagnetic radiation. It is a form of luminescence. In most cases, the emitted light has a longer wavelength, and therefore a lower photon energy, tha ...
of the split-GFP. Regions of GFP that tolerate the introduction of circular permutation are more likely to accept the addition of another protein while retaining the function of both proteins. * Design of novel
biocatalyst Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products. A ...
s and biosensors. Introducing circular permutations can be used to design proteins to catalyze specific chemical reactions, or to detect the presence of certain molecules using proteins. For instance, the GFP-calmodulin fusion described above can be used to detect the level of calcium ions in a sample.


Algorithmic detection

Many
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Alig ...
and protein structure alignment algorithms have been developed assuming linear data representations and as such are not able to detect circular permutations between proteins. Two examples of frequently used methods that have problems correctly aligning proteins related by circular permutation are
dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. I ...
and many
hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
s. As an alternative to these, a number of algorithms are built on top of non-linear approaches and are able to detect
topology In mathematics, topology (from the Greek language, Greek words , and ) is concerned with the properties of a mathematical object, geometric object that are preserved under Continuous function, continuous Deformation theory, deformations, such ...
-independent similarities, or employ modifications allowing them to circumvent the limitations of dynamic programming. The table below is a collection of such methods. The algorithms are classified according to the type of input they require. ''Sequence''-based algorithms require only the sequence of two proteins in order to create an alignment. Sequence methods are generally fast and suitable for searching whole genomes for circularly permuted pairs of proteins. ''Structure''-based methods require 3D structures of both proteins being considered. They are often slower than sequence-based methods, but are able to detect circular permutations between distantly related proteins with low sequence similarity. Some structural methods are ''topology independent'', meaning that they are also able to detect more complex rearrangements than circular permutation.


References


Further reading

* David Goodsell (April 2010
''Concanavalin A and Circular Permutation''
Protein Data Bank (PDB) ''Molecule of the Month''


External links

* {{PDBe-KB2, P02866, Concanavalin-A Proteins Permutations