Protein primary structure is the
linear sequence of
amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
s in a
peptide
Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides.
A ...
or
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
.
By convention, the
primary structure of a protein is reported starting from the
amino
In chemistry, amines (, ) are compounds and functional groups that contain a basic nitrogen atom with a lone pair. Amines are formally derivatives of ammonia (), wherein one or more hydrogen atoms have been replaced by a substituent ...
-terminal (N) end to the
carboxyl
In organic chemistry, a carboxylic acid is an organic acid that contains a carboxyl group () attached to an R-group. The general formula of a carboxylic acid is or , with R referring to the alkyl, alkenyl, aryl, or other group. Carboxylic ...
-terminal (C) end.
Protein biosynthesis is most commonly performed by
ribosome
Ribosomes ( ) are macromolecular machines, found within all cells, that perform biological protein synthesis (mRNA translation). Ribosomes link amino acids together in the order specified by the codons of messenger RNA (mRNA) molecules to ...
s in cells. Peptides can also be
synthesized in the laboratory. Protein primary structures can be
directly sequenced, or inferred from
DNA sequence
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
s.
Formation
Biological
Amino acids are polymerised via peptide bonds to form a long
backbone
The backbone is the vertebral column of a vertebrate.
Arts, entertainment, and media Film
* ''Backbone'' (1923 film), a 1923 lost silent film starring Alfred Lunt
* ''Backbone'' (1975 film), a 1975 Yugoslavian drama directed by Vlatko Gilić
...
, with the different amino acid side chains protruding along it. In biological systems, proteins are produced during
translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
by a cell's
ribosome
Ribosomes ( ) are macromolecular machines, found within all cells, that perform biological protein synthesis (mRNA translation). Ribosomes link amino acids together in the order specified by the codons of messenger RNA (mRNA) molecules to ...
s. Some organisms can also make short peptides by
non-ribosomal peptide synthesis Nonribosomal peptides (NRP) are a class of peptide secondary metabolites, usually produced by microorganisms like bacteria and fungi. Nonribosomal peptides are also found in higher organisms, such as nudibranchs, but are thought to be made by bacter ...
, which often use amino acids other than the standard 20, and may be cyclised, modified and cross-linked.
Chemical
Peptides can be
synthesised chemically via a range of laboratory methods. Chemical methods typically synthesise peptides in the opposite order (starting at the C-terminus) to biological protein synthesis (starting at the N-terminus).
Notation
Protein sequence is typically notated as a string of letters, listing the amino acids starting at the
amino
In chemistry, amines (, ) are compounds and functional groups that contain a basic nitrogen atom with a lone pair. Amines are formally derivatives of ammonia (), wherein one or more hydrogen atoms have been replaced by a substituent ...
-terminal end through to the
carboxyl
In organic chemistry, a carboxylic acid is an organic acid that contains a carboxyl group () attached to an R-group. The general formula of a carboxylic acid is or , with R referring to the alkyl, alkenyl, aryl, or other group. Carboxylic ...
-terminal end. Either a three letter code or single letter code can be used to represent the 20 naturally occurring amino acids, as well as mixtures or ambiguous amino acids (similar to
nucleic acid notation
The nucleic acid notation currently in use was first formalized by the International Union of Pure and Applied Chemistry (IUPAC) in 1970. This universally accepted notation uses the Roman characters G, C, A, and T, to represent the four nucleotides ...
).
Peptides can be
directly sequenced, or inferred from
DNA sequence
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
s. Large
sequence database
In the field of bioinformatics, a sequence database is a type of biological database that is composed of a large collection of computerized (" digital") nucleic acid sequences, protein sequences, or other polymer sequences stored on a computer. T ...
s now exist that collate known protein sequences.
Modification
In general, polypeptides are unbranched polymers, so their primary structure can often be specified by the sequence of
amino acid
Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
s along their backbone. However, proteins can become cross-linked, most commonly by
disulfide bonds
In biochemistry, a disulfide (or disulphide in British English) refers to a functional group with the structure . The linkage is also called an SS-bond or sometimes a disulfide bridge and is usually derived by the coupling of two thiol groups. In ...
, and the primary structure also requires specifying the cross-linking atoms, e.g., specifying the
cysteines involved in the protein's disulfide bonds. Other crosslinks include
desmosine
Desmosine is an amino acid found uniquely in elastin, a protein found in connective tissue such as skin, lungs, and elastic arteries.
Desmosine is a component of elastin and cross links with its isomer, isodesmosine, giving elasticity to the t ...
.
Isomerisation
The chiral centers of a polypeptide chain can undergo
racemization In chemistry, racemization is a conversion, by heat or by chemical reaction, of an optically active compound into a racemic (optically inactive) form. This creates a 1:1 molar ratio of enantiomers and is referred too as a racemic mixture (i.e. conta ...
. Although it does not change the sequence, it does affect the chemical properties of the sequence. In particular, the
L-amino acids normally found in proteins can spontaneously isomerize at the
atom to form
D-amino acids, which cannot be cleaved by most
protease
A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes (increases reaction rate or "speeds up") proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the ...
s. Additionally,
proline can form stable trans-isomers at the peptide bond.
Posttranslational modification
Finally, the protein can undergo a variety of
posttranslational modification
Post-translational modification (PTM) is the covalent and generally enzymatic modification of proteins following protein biosynthesis. This process occurs in the endoplasmic reticulum and the golgi apparatus. Proteins are synthesized by ribo ...
s, which are briefly summarized here.
The N-terminal amino group of a polypeptide can be modified covalently, e.g.,
* acetylation
:The positive charge on the N-terminal amino group may be eliminated by changing it to an acetyl group (N-terminal blocking).
* formylation
:The N-terminal methionine usually found after translation has an N-terminus blocked with a formyl group. This formyl group (and sometimes the methionine residue itself, if followed by Gly or Ser) is removed by the enzyme
deformylase.
* pyroglutamate
:An N-terminal glutamine can attack itself, forming a cyclic pyroglutamate group.
* myristoylation
:Similar to acetylation. Instead of a simple methyl group, the myristoyl group has a tail of 14 hydrophobic carbons, which make it ideal for anchoring proteins to
cellular membrane
The cell membrane (also known as the plasma membrane (PM) or cytoplasmic membrane, and historically referred to as the plasmalemma) is a biological membrane that separates and protects the interior of all cells from the outside environment (th ...
s.
The C-terminal carboxylate group of a polypeptide can also be modified, e.g.,
* amination (see Figure)
:The C-terminus can also be blocked (thus, neutralizing its negative charge) by amination.
* glycosyl phosphatidylinositol (GPI) attachment
:
Glycosyl phosphatidylinositol(GPI) is a large, hydrophobic phospholipid prosthetic group that anchors proteins to
cellular membrane
The cell membrane (also known as the plasma membrane (PM) or cytoplasmic membrane, and historically referred to as the plasmalemma) is a biological membrane that separates and protects the interior of all cells from the outside environment (th ...
s. It is attached to the polypeptide C-terminus through an amide linkage that then connects to ethanolamine, thence to sundry sugars and finally to the phosphatidylinositol lipid moiety.
Finally, the peptide
side chain
In organic chemistry and biochemistry, a side chain is a chemical group that is attached to a core part of the molecule called the "main chain" or backbone. The side chain is a hydrocarbon branching element of a molecule that is attached to a ...
s can also be modified covalently, e.g.,
* phosphorylation
:Aside from cleavage,
phosphorylation
In chemistry, phosphorylation is the attachment of a phosphate group to a molecule or an ion. This process and its inverse, dephosphorylation, are common in biology and could be driven by natural selection. Text was copied from this source, wh ...
is perhaps the most important chemical modification of proteins. A phosphate group can be attached to the sidechain hydroxyl group of serine, threonine and tyrosine residues, adding a negative charge at that site and producing an unnatural amino acid. Such reactions are catalyzed by
kinases and the reverse reaction is catalyzed by phosphatases. The phosphorylated tyrosines are often used as "handles" by which proteins can bind to one another, whereas phosphorylation of Ser/Thr often induces conformational changes, presumably because of the introduced negative charge. The effects of phosphorylating Ser/Thr can sometimes be simulated by mutating the Ser/Thr residue to glutamate.
*
glycosylation
:A catch-all name for a set of very common and very heterogeneous chemical modifications. Sugar moieties can be attached to the sidechain hydroxyl groups of Ser/Thr or to the sidechain amide groups of Asn. Such attachments can serve many functions, ranging from increasing solubility to complex recognition. All glycosylation can be blocked with certain inhibitors, such as
tunicamycin.
*
deamidation
Deamidation is a chemical reaction in which an amide functional group in the side chain of the amino acids asparagine or glutamine is removed or converted to another functional group. Typically, asparagine is converted to aspartic acid or isoaspa ...
(succinimide formation)
:In this modification, an asparagine or aspartate side chain attacks the following peptide bond, forming a symmetrical succinimide intermediate. Hydrolysis of the intermediate produces either aspartate or the β-amino acid, iso(Asp). For asparagine, either product results in the loss of the amide group, hence "deamidation".
*
hydroxylation
: Proline residues may be hydroxylates at either of two atoms, as can lysine (at one atom). Hydroxyproline is a critical component of
collagen, which becomes unstable upon its loss. The hydroxylation reaction is catalyzed by an enzyme that requires
ascorbic acid
Vitamin C (also known as ascorbic acid and ascorbate) is a water-soluble vitamin found in citrus and other fruits and vegetables, also sold as a dietary supplement and as a topical 'serum' ingredient to treat melasma (dark pigment spots) an ...
(vitamin C), deficiencies in which lead to many connective-tissue diseases such as
scurvy
Scurvy is a disease resulting from a lack of vitamin C (ascorbic acid). Early symptoms of deficiency include weakness, feeling tired and sore arms and legs. Without treatment, decreased red blood cells, gum disease, changes to hair, and bleeding ...
.
*
methylation
: Several protein residues can be methylated, most notably the positive groups of
lysine and
arginine. Arginine residues interact with the nucleic acid phosphate backbone and commonly form hydrogen bonds with the base residues, particularly
guanine
Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is c ...
, in protein–DNA complexes. Lysine residues can be singly, doubly and even triply methylated. Methylation does ''not'' alter the positive charge on the side chain, however.
*
acetylation
: Acetylation of the lysine amino groups is chemically analogous to the acetylation of the N-terminus. Functionally, however, the acetylation of lysine residues is used to regulate the binding of proteins to nucleic acids. The cancellation of the positive charge on the lysine weakens the electrostatic attraction for the (negatively charged) nucleic acids.
* sulfation
: Tyrosines may become sulfated on their
atom. Somewhat unusually, this modification occurs in the
Golgi apparatus
The Golgi apparatus (), also known as the Golgi complex, Golgi body, or simply the Golgi, is an organelle found in most eukaryotic cells. Part of the endomembrane system in the cytoplasm, it packages proteins into membrane-bound vesicles ins ...
, not in the
endoplasmic reticulum. Similar to phosphorylated tyrosines, sulfated tyrosines are used for specific recognition, e.g., in chemokine receptors on the cell surface. As with phosphorylation, sulfation adds a negative charge to a previously neutral site.
*
prenylation
Prenylation (also known as isoprenylation or lipidation) is the addition of hydrophobic molecules to a protein or a biomolecule. It is usually assumed that prenyl groups (3-methylbut-2-en-1-yl) facilitate attachment to cell membranes, similar to ...
and palmitoylation
: The hydrophobic isoprene (e.g., farnesyl, geranyl, and geranylgeranyl groups) and palmitoyl groups may be added to the
atom of cysteine residues to anchor proteins to
cellular membrane
The cell membrane (also known as the plasma membrane (PM) or cytoplasmic membrane, and historically referred to as the plasmalemma) is a biological membrane that separates and protects the interior of all cells from the outside environment (th ...
s. Unlike the
GPI and myritoyl anchors, these groups are not necessarily added at the termini.
* carboxylation
: A relatively rare modification that adds an extra carboxylate group (and, hence, a double negative charge) to a glutamate side chain, producing a Gla residue. This is used to strengthen the binding to "hard" metal ions such as
calcium
Calcium is a chemical element with the symbol Ca and atomic number 20. As an alkaline earth metal, calcium is a reactive metal that forms a dark oxide-nitride layer when exposed to air. Its physical and chemical properties are most similar to ...
.
* ADP-ribosylation
: The large ADP-ribosyl group can be transferred to several types of side chains within proteins, with heterogeneous effects. This modification is a target for the powerful toxins of disparate bacteria, e.g., ''Vibrio cholerae'', ''Corynebacterium diphtheriae'' and ''Bordetella pertussis''.
*
ubiquitin
Ubiquitin is a small (8.6 kDa) regulatory protein found in most tissues of eukaryotic organisms, i.e., it is found ''ubiquitously''. It was discovered in 1975 by Gideon Goldstein and further characterized throughout the late 1970s and 1980s. Fo ...
ation and
SUMOylation
: Various full-length, folded proteins can be attached at their C-termini to the sidechain ammonium groups of lysines of other proteins. Ubiquitin is the most common of these, and usually signals that the ubiquitin-tagged protein should be degraded.
Most of the polypeptide modifications listed above occur ''post-translationally'', i.e., after the
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
has been synthesized on the
ribosome
Ribosomes ( ) are macromolecular machines, found within all cells, that perform biological protein synthesis (mRNA translation). Ribosomes link amino acids together in the order specified by the codons of messenger RNA (mRNA) molecules to ...
, typically occurring in the
endoplasmic reticulum, a subcellular
organelle of the eukaryotic cell.
Many other chemical reactions (e.g., cyanylation) have been applied to proteins by chemists, although they are not found in biological systems.
Cleavage and ligation
In addition to those listed above, the most important modification of primary structure is
peptide cleavage (by chemical
hydrolysis
Hydrolysis (; ) is any chemical reaction in which a molecule of water breaks one or more chemical bonds. The term is used broadly for substitution reaction, substitution, elimination reaction, elimination, and solvation reactions in which water ...
or by
protease
A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes (increases reaction rate or "speeds up") proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the ...
s). Proteins are often synthesized in an inactive precursor form; typically, an N-terminal or C-terminal segment blocks the
active site
In biology and biochemistry, the active site is the region of an enzyme where substrate molecules bind and undergo a chemical reaction. The active site consists of amino acid residues that form temporary bonds with the substrate (binding site) a ...
of the protein, inhibiting its function. The protein is activated by cleaving off the inhibitory peptide.
Some proteins even have the power to cleave themselves. Typically, the hydroxyl group of a serine (rarely, threonine) or the thiol group of a cysteine residue will attack the carbonyl carbon of the preceding peptide bond, forming a tetrahedrally bonded intermediate
lassified as a hydroxyoxazolidine (Ser/Thr) or hydroxythiazolidine (Cys) intermediate This intermediate tends to revert to the amide form, expelling the attacking group, since the amide form is usually favored by free energy, (presumably due to the strong resonance stabilization of the peptide group). However, additional molecular interactions may render the amide form less stable; the amino group is expelled instead, resulting in an ester (Ser/Thr) or thioester (Cys) bond in place of the peptide bond. This chemical reaction is called an
N-O acyl shift.
The ester/thioester bond can be resolved in several ways:
* Simple hydrolysis will split the polypeptide chain, where the displaced amino group becomes the new N-terminus. This is seen in the maturation of glycosylasparaginase.
* A β-elimination reaction also splits the chain, but results in a pyruvoyl group at the new N-terminus. This pyruvoyl group may be used as a covalently attached catalytic cofactor in some enzymes, especially decarboxylases such as
S-adenosylmethionine decarboxylase
The enzyme adenosylmethionine decarboxylase () catalyzes the conversion of S-Adenosyl methionine, ''S''-adenosyl methionine to S-Adenosylmethioninamine, ''S''-adenosylmethioninamine.
Polyamines such as spermidine and spermine are essential for c ...
(SAMDC) that exploit the electron-withdrawing power of the pyruvoyl group.
* Intramolecular transesterification, resulting in a ''branched'' polypeptide. In
inteins, the new ester bond is broken by an intramolecular attack by the soon-to-be C-terminal asparagine.
* Intermolecular transesterification can transfer a whole segment from one polypeptide to another, as is seen in the Hedgehog protein autoprocessing.
Sequence compression
The compression of amino acid sequences is a comparatively challenging task. The existing specialized amino acid sequence compressors are low compared with that of DNA sequence compressors, mainly because of the characteristics of the data. For example, modeling inversions is harder because of the reverse information loss (from amino acids to DNA sequence). The current lossless data compressor that provides higher compression is AC2.
AC2 mixes various context models using Neural Networks and encodes the data using arithmetic encoding.
History
The proposal that proteins were linear chains of α-amino acids was made nearly simultaneously by two scientists at the same conference in 1902, the 74th meeting of the Society of German Scientists and Physicians, held in Karlsbad.
Franz Hofmeister
Franz Hofmeister (30 August 1850, in Prague – 26 July 1922, in Würzburg) was an early protein scientist, and is famous for his studies of salts that influence the solubility and conformational stability of proteins. In 1902, Hofmeister became t ...
made the proposal in the morning, based on his observations of the biuret reaction in proteins. Hofmeister was followed a few hours later by
Emil Fischer
Hermann Emil Louis Fischer (; 9 October 1852 – 15 July 1919) was a German chemist and 1902 recipient of the Nobel Prize in Chemistry. He discovered the Fischer esterification. He also developed the Fischer projection, a symbolic way of draw ...
, who had amassed a wealth of chemical details supporting the peptide-bond model. For completeness, the proposal that proteins contained amide linkages was made as early as 1882 by the French chemist E. Grimaux.
Despite these data and later evidence that proteolytically digested proteins yielded only oligopeptides, the idea that proteins were linear, unbranched polymers of amino acids was not accepted immediately. Some well-respected scientists such as
William Astbury
William Thomas Astbury FRS (25 February 1898 – 4 June 1961) was an English physicist and molecular biologist who made pioneering X-ray diffraction studies of biological molecules. His work on keratin provided the foundation for Linus Pauling ...
doubted that covalent bonds were strong enough to hold such long molecules together; they feared that thermal agitations would shake such long molecules asunder.
Hermann Staudinger faced similar prejudices in the 1920s when he argued that
rubber
Rubber, also called India rubber, latex, Amazonian rubber, ''caucho'', or ''caoutchouc'', as initially produced, consists of polymers of the organic compound isoprene, with minor impurities of other organic compounds. Thailand, Malaysia, and ...
was composed of
macromolecules.
Thus, several alternative hypotheses arose. The colloidal protein hypothesis stated that proteins were colloidal assemblies of smaller molecules. This hypothesis was disproved in the 1920s by ultracentrifugation measurements by
Theodor Svedberg
Theodor Svedberg (30 August 1884 – 25 February 1971) was a Swedish chemist and Nobel laureate for his research on colloids and proteins using the ultracentrifuge. Svedberg was active at Uppsala University from the mid 1900s to late 1940s. Wh ...
that showed that proteins had a well-defined, reproducible molecular weight and by electrophoretic measurements by
Arne Tiselius
Arne Wilhelm Kaurin Tiselius (10 August 1902 – 29 October 1971) was a Swedish biochemist who won the Nobel Prize in Chemistry in 1948 "for his research on electrophoresis and adsorption analysis, especially for his discoveries concerning ...
that indicated that proteins were single molecules. A second hypothesis, the
cyclol
The cyclol hypothesis is the now discredited first structural model of a folded, globular protein, formulated in the 1930s. It was based on the cyclol reaction of peptide bonds proposed by physicist Frederick Frank in 1936, in which two pe ...
hypothesis advanced by
Dorothy Wrinch
Dorothy Maud Wrinch (12 September 1894 – 11 February 1976; married names Nicholson, Glaser) was a mathematician and biochemical theorist best known for her attempt to deduce protein structure using mathematical principles. She was a champion o ...
, proposed that the linear polypeptide underwent a chemical cyclol rearrangement C=O + HN
C(OH)-N that crosslinked its backbone amide groups, forming a two-dimensional ''fabric''. Other primary structures of proteins were proposed by various researchers, such as the diketopiperazine model of
Emil Abderhalden
Emil Abderhalden (9 March 1877 – 5 August 1950) was a Swiss biochemist and physiologist. His main findings, though disputed already in the 1910s, were not finally rejected until the late 1990s. Whether his misleading findings were based on f ...
and the pyrrol/piperidine model of Troensegaard in 1942. Although never given much credence, these alternative models were finally disproved when
Frederick Sanger successfully sequenced
insulin
Insulin (, from Latin ''insula'', 'island') is a peptide hormone produced by beta cells of the pancreatic islets encoded in humans by the ''INS'' gene. It is considered to be the main anabolic hormone of the body. It regulates the metabolism o ...
and by the crystallographic determination of myoglobin and hemoglobin by
Max Perutz
Max Ferdinand Perutz (19 May 1914 – 6 February 2002) was an Austrian-born British molecular biologist, who shared the 1962 Nobel Prize for Chemistry with John Kendrew, for their studies of the structures of haemoglobin and myoglobin. He went ...
and
John Kendrew
Sir John Cowdery Kendrew, (24 March 1917 – 23 August 1997) was an English biochemist, crystallographer, and science administrator. Kendrew shared the 1962 Nobel Prize in Chemistry with Max Perutz, for their work at the Cavendish La ...
.
Primary structure in other molecules
Any linear-chain heteropolymer can be said to have a "primary structure" by analogy to the usage of the term for proteins, but this usage is rare compared to the extremely common usage in reference to proteins. In
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
, which also has extensive
secondary structure
Protein secondary structure is the three dimensional conformational isomerism, form of ''local segments'' of proteins. The two most common Protein structure#Secondary structure, secondary structural elements are alpha helix, alpha helices and beta ...
, the linear chain of bases is generally just referred to as the "sequence" as it is in
DNA (which usually forms a linear double helix with little secondary structure). Other biological polymers such as
polysaccharides
Polysaccharides (), or polycarbohydrates, are the most abundant carbohydrates found in food. They are long chain polymeric carbohydrates composed of monosaccharide units bound together by glycosidic linkages. This carbohydrate can react with w ...
can also be considered to have a primary structure, although the usage is not standard.
Relation to secondary and tertiary structure
The primary structure of a biological polymer to a large extent determines the three-dimensional shape (
tertiary structure
Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may int ...
). Protein sequence can be used to
predict local features, such as segments of secondary structure, or trans-membrane regions. However, the complexity of
protein folding
Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reproduc ...
currently prohibits
predicting the tertiary structure of a protein from its sequence alone. Knowing the structure of a similar
homologous sequence
Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a s ...
(for example a member of the same
protein family
A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be c ...
) allows highly accurate prediction of the
tertiary structure
Protein tertiary structure is the three dimensional shape of a protein. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures, the protein domains. Amino acid side chains may int ...
by
homology modeling
Homology modeling, also known as comparative modeling of protein, refers to constructing an atomic-resolution model of the "''target''" protein from its amino acid sequence and an experimental three-dimensional structure of a related homologous pr ...
. If the full-length protein sequence is available, it is possible to estimate its general
biophysical properties, such as its
isoelectric point
The isoelectric point (pI, pH(I), IEP), is the pH at which a molecule carries no net electrical charge or is electrically neutral in the statistical mean. The standard nomenclature to represent the isoelectric point is pH(I). However, pI is also u ...
.
Sequence families are often determined by
sequence clustering, and
structural genomics
Structural genomics seeks to describe the 3-dimensional structure of every protein encoded by a given genome. This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling ...
projects aim to produce a set of representative structures to cover the
sequence space
In functional analysis and related areas of mathematics, a sequence space is a vector space whose elements are infinite sequences of real or complex numbers. Equivalently, it is a function space whose elements are functions from the natural nu ...
of possible non-redundant sequences.
See also
*
Protein sequencing
Protein sequencing is the practical process of determining the amino acid sequence of all or part of a protein or peptide. This may serve to identify the protein or characterize its post-translational modifications. Typically, partial sequencing o ...
*
Nucleic acid primary structure
A nucleic acid sequence is a succession of Nucleobase, bases signified by a series of a set of five different letters that indicate the order of nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. By convention, sequence ...
*
Translation
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
*
Pseudo amino acid composition
Notes and references
{{Portal bar, Biology
Protein structure 1
Stereochemistry