AT-content
   HOME

TheInfoList



OR:

In
molecular biology Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and physi ...
and
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinian friar wor ...
, GC-content (or guanine-cytosine content) is the percentage of nitrogenous bases in a DNA or
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
molecule that are either
guanine Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is called ...
(G) or
cytosine Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine (uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached (an am ...
(C). This measure indicates the proportion of G and C bases out of an implied four total bases, also including
adenine Adenine () ( symbol A or Ade) is a nucleobase (a purine derivative). It is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The three others are guanine, cytosine and thymine. Its derivati ...
and
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidine nu ...
in DNA and adenine and
uracil Uracil () (symbol U or Ura) is one of the four nucleobases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by ...
in RNA. GC-content may be given for a certain fragment of DNA or RNA or for an entire
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
. When it refers to a fragment, it may denote the GC-content of an individual
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
or section of a gene (domain), a group of genes or gene clusters, a
non-coding region Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regula ...
, or a synthetic
oligonucleotide Oligonucleotides are short DNA or RNA molecules, oligomers, that have a wide range of applications in genetic testing, research, and forensics. Commonly made in the laboratory by solid-phase chemical synthesis, these small bits of nucleic acids c ...
such as a
primer Primer may refer to: Arts, entertainment, and media Films * ''Primer'' (film), a 2004 feature film written and directed by Shane Carruth * ''Primer'' (video), a documentary about the funk band Living Colour Literature * Primer (textbook), a t ...
.


Structure

Qualitatively, guanine (G) and cytosine (C) undergo a specific
hydrogen bonding In chemistry, a hydrogen bond (or H-bond) is a primarily electrostatic force of attraction between a hydrogen (H) atom which is covalently bound to a more electronegative "donor" atom or group (Dn), and another electronegative atom bearing a l ...
with each other, whereas adenine (A) bonds specifically with thymine (T) in DNA and with uracil (U) in RNA. Quantitatively, each GC
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
is held together by three hydrogen bonds, while AT and AU base pairs are held together by two hydrogen bonds. To emphasize this difference, the base pairings are often represented as "G≡C" versus "A=T" or "A=U". DNA with low GC-content is less stable than DNA with high GC-content; however, the hydrogen bonds themselves do not have a particularly significant impact on molecular stability, which is instead caused mainly by molecular interactions of base stacking. In spite of the higher
thermostability In materials science and molecular biology, thermostability is the ability of a substance to resist irreversible change in its chemical or physical structure, often by resisting decomposition or polymerization, at a high relative temperature. ...
conferred to a nucleic acid with high GC-content, it has been observed that at least some species of
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
with DNA of high GC-content undergo autolysis more readily, thereby reducing the longevity of the cell ''per se''. Because of the thermostability of GC pairs, it was once presumed that high GC-content was a necessary
adaptation In biology, adaptation has three related meanings. Firstly, it is the dynamic evolutionary process of natural selection that fits organisms to their environment, enhancing their evolutionary fitness. Secondly, it is a state reached by the po ...
to high temperatures, but this hypothesis was refuted in 2001. Even so, it has been shown that there is a strong correlation between the optimal growth of
prokaryote A prokaryote () is a single-celled organism that lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek πρό (, 'before') and κάρυον (, 'nut' or 'kernel').Campbell, N. "Biology:Concepts & Connec ...
s at higher temperatures and the GC-content of structural RNAs such as
ribosomal RNA Ribosomal ribonucleic acid (rRNA) is a type of non-coding RNA which is the primary component of ribosomes, essential to all cells. rRNA is a ribozyme which carries out protein synthesis in ribosomes. Ribosomal RNA is transcribed from ribosomal ...
,
transfer RNA Transfer RNA (abbreviated tRNA and formerly referred to as sRNA, for soluble RNA) is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes), that serves as the physical link between the mRNA and the amino ac ...
, and many other
non-coding RNA A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non-c ...
s. The AU base pairs are less stable than the GC base pairs, making high-GC-content RNA structures more resistant to the effects of high temperatures. More recently, it has been demonstrated that the most important factor contributing to the thermal stability of double-stranded nucleic acids is actually due to the base stackings of adjacent bases rather than the number of hydrogen bonds between the bases. There is more favorable stacking energy for GC pairs than for AT or AU pairs because of the relative positions of exocyclic groups. Additionally, there is a correlation between the order in which the bases stack and the thermal stability of the molecule as a whole.


Determination

GC-content is usually expressed as a percentage value, but sometimes as a ratio (called G+C ratio or GC-ratio). GC-content percentage is calculated as :\cfrac\times100% whereas the AT/GC ratio is calculated as :\cfrac . The GC-content percentages as well as GC-ratio can be measured by several means, but one of the simplest methods is to measure the melting temperature of the DNA
double helix A double is a look-alike or doppelgänger; one person or being that resembles another. Double, The Double or Dubble may also refer to: Film and television * Double (filmmaking), someone who substitutes for the credited actor of a character * ...
using
spectrophotometry Spectrophotometry is a branch of electromagnetic spectroscopy concerned with the quantitative measurement of the reflection or transmission properties of a material as a function of wavelength. Spectrophotometry uses photometers, known as spec ...
. The
absorbance Absorbance is defined as "the logarithm of the ratio of incident to transmitted radiant power through a sample (excluding the effects on cell walls)". Alternatively, for samples which scatter light, absorbance may be defined as "the negative lo ...
of DNA at a
wavelength In physics, the wavelength is the spatial period of a periodic wave—the distance over which the wave's shape repeats. It is the distance between consecutive corresponding points of the same phase on the wave, such as two adjacent crests, tro ...
of 260 nm increases fairly sharply when the double-stranded DNA molecule separates into two single strands when sufficiently heated. The most commonly used protocol for determining GC-ratios uses
flow cytometry Flow cytometry (FC) is a technique used to detect and measure physical and chemical characteristics of a population of cells or particles. In this process, a sample containing cells or particles is suspended in a fluid and injected into the flo ...
for large numbers of samples. In an alternative manner, if the DNA or RNA molecule under investigation has been reliably
sequenced In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which suc ...
, then GC-content can be accurately calculated by simple arithmetic or by using a variety of publicly available software tools, such as th
free online GC calculator


Genomic content


Within-genome variation

The GC-ratio within a genome is found to be markedly variable. These variations in GC-ratio within the genomes of more complex organisms result in a mosaic-like formation with islet regions called isochores. This results in the variations in staining intensity in
chromosomes A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins are ...
. GC-rich isochores typically include many protein-coding genes within them, and thus determination of GC-ratios of these specific regions contributes to mapping gene-rich regions of the genome.


Coding sequences

Within a long region of genomic sequence, genes are often characterised by having a higher GC-content in contrast to the background GC-content for the entire genome. Evidence of GC ratio with that of length of the
coding region The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to no ...
of a
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
has shown that the length of the coding sequence is directly proportional to higher G+C content. This has been pointed to the fact that the
stop codon In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in me ...
has a bias towards A and T nucleotides, and, thus, the shorter the sequence the higher the AT bias. Comparison of more than 1,000
orthologous Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spec ...
genes in mammals showed marked within-genome variations of the third-codon position GC content, with a range from less than 30% to more than 80%.


Among-genome variation

GC content is found to be variable with different organisms, the process of which is envisaged to be contributed to by variation in
selection Selection may refer to: Science * Selection (biology), also called natural selection, selection in evolution ** Sex selection, in genetics ** Mate selection, in mating ** Sexual selection in humans, in human sexuality ** Human mating strategie ...
, mutational bias, and biased recombination-associated
DNA repair DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as radiation can cause DNA dam ...
. The average GC-content in human genomes ranges from 35% to 60% across 100-Kb fragments, with a mean of 41%. (page 876) The GC-content of
Yeast Yeasts are eukaryotic, single-celled microorganisms classified as members of the fungus kingdom. The first yeast originated hundreds of millions of years ago, and at least 1,500 species are currently recognized. They are estimated to constitut ...
(''
Saccharomyces cerevisiae ''Saccharomyces cerevisiae'' () (brewer's yeast or baker's yeast) is a species of yeast (single-celled fungus microorganisms). The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have been o ...
'') is 38%, and that of another common
model organism A model organism (often shortened to model) is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workin ...
, thale cress (''
Arabidopsis thaliana ''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small flowering plant native to Eurasia and Africa. ''A. thaliana'' is considered a weed; it is found along the shoulders of roads and in disturbed land. A winter a ...
''), is 36%. Because of the nature of the
genetic code The genetic code is the set of rules used by living cells to translate information encoded within genetic material ( DNA or RNA sequences of nucleotide triplets, or codons) into proteins. Translation is accomplished by the ribosome, which links ...
, it is virtually impossible for an organism to have a genome with a GC-content approaching either 0% or 100%. However, a species with an extremely low GC-content is ''
Plasmodium falciparum ''Plasmodium falciparum'' is a Unicellular organism, unicellular protozoan parasite of humans, and the deadliest species of ''Plasmodium'' that causes malaria in humans. The parasite is transmitted through the bite of a female ''Anopheles'' mosqu ...
'' (GC% = ~20%), and it is usually common to refer to such examples as being AT-rich instead of GC-poor. Several mammalian species (e.g.,
shrew Shrews (family Soricidae) are small mole-like mammals classified in the order Eulipotyphla. True shrews are not to be confused with treeshrews, otter shrews, elephant shrews, West Indies shrews, or marsupial shrews, which belong to different fa ...
,
microbat Microbats constitute the suborder Microchiroptera within the order Chiroptera (bats). Bats have long been differentiated into Megachiroptera (megabats) and Microchiroptera, based on their size, the use of echolocation by the Microchiroptera a ...
,
tenrec A tenrec is any species of mammal within the afrotherian family Tenrecidae endemic to Madagascar. Tenrecs are wildly diverse; as a result of convergent evolution some resemble hedgehogs, shrews, opossums, rats, and mice. They occupy aquatic, a ...
,
rabbit Rabbits, also known as bunnies or bunny rabbits, are small mammals in the family Leporidae (which also contains the hares) of the order Lagomorpha (which also contains the pikas). ''Oryctolagus cuniculus'' includes the European rabbit speci ...
) have independently undergone a marked increase in the GC-content of their genes. These GC-content changes are correlated with species life-history traits (e.g., body mass or longevity) and
genome size Genome size is the total amount of DNA contained within one copy of a single complete genome. It is typically measured in terms of mass in picograms (trillionths (10−12) of a gram, abbreviated pg) or less frequently in daltons, or as the total ...
, and might be linked to a molecular phenomenon called the GC-biased
gene conversion Gene conversion is the process by which one DNA sequence replaces a homologous sequence such that the sequences become identical after the conversion event. Gene conversion can be either allelic, meaning that one allele of the same gene replaces a ...
.


Applications


Molecular biology

In
polymerase chain reaction The polymerase chain reaction (PCR) is a method widely used to rapidly make millions to billions of copies (complete or partial) of a specific DNA sample, allowing scientists to take a very small sample of DNA and amplify it (or a part of it) t ...
(PCR) experiments, the GC-content of short oligonucleotides known as primers is often used to predict their annealing temperature to the template DNA. A higher GC-content level indicates a relatively higher melting temperature. Many sequencing technologies, such as
Illumina sequencing Illumina dye sequencing is a technique used to determine the series of base pairs in DNA, also known as DNA sequencing. The reversible terminated chemistry concept was invented by Bruno Canard and Simon Sarfati at the Pasteur Institute in Paris. It ...
, have trouble reading high-GC-content sequences.
Bird Birds are a group of warm-blooded vertebrates constituting the class Aves (), characterised by feathers, toothless beaked jaws, the laying of hard-shelled eggs, a high metabolic rate, a four-chambered heart, and a strong yet lightweigh ...
genomes are known to have many such parts, causing the problem of "missing genes" expected to be present from evolution and phenotype but never sequenced — until improved methods were used.


Systematics

The
species problem The species problem is the set of questions that arises when biologists attempt to define what a species is. Such a definition is called a species concept; there are at least 26 recognized species concepts. A species concept that works well for se ...
in non-eukaryotic taxonomy has led to various suggestions in classifying bacteria, and the ''ad hoc committee on reconciliation of approaches to bacterial systematics'' of 1987 has recommended use of GC-ratios in higher-level hierarchical classification. For example, the
Actinomycetota The ''Actinomycetota'' (or ''Actinobacteria'') are a phylum of all gram-positive bacteria. They can be terrestrial or aquatic. They are of great economic importance to humans because agriculture and forests depend on their contributions to soi ...
are characterised as "high GC-content
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
". In ''
Streptomyces coelicolor ''Streptomyces albidoflavus'' is a bacterium species from the genus of ''Streptomyces'' which has been isolated from soil from Poland. ''Streptomyces albidoflavus'' produces dibutyl phthalate and streptothricins. Small noncoding RNA Bacteri ...
'' A3(2), GC-content is 72%. With the use of more reliable, modern methods of molecular systematics, the GC-content definition of Actinomycetota has been abolished and low-GC bacteria of this
clade A clade (), also known as a monophyletic group or natural group, is a group of organisms that are monophyletic – that is, composed of a common ancestor and all its lineal descendants – on a phylogenetic tree. Rather than the English term, ...
have been found.


Software tools

GCSpeciesSorter and TopSort are software tools for classifying species based on their GC-contents.


See also

*
Codon usage bias Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides (a triplet) that encodes a specific amino acid residue in a polypeptide chain or for the terminatio ...


References


External links


Table with GC-content of all sequenced prokaryotes

Taxonomic browser of bacteria based on GC ratio on NCBI website

GC ratio in diverse species
{{DEFAULTSORT:Gc-Content DNA Molecular biology Biological classification