Genome size is the total amount of
DNA contained within one copy of a single complete
genome. It is typically measured in terms of
mass in picograms (trillionths (10
−12) of a
gram, abbreviated pg) or less frequently in
daltons, or as the total number of
nucleotide base pair
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s, usually in megabases (millions of base pairs, abbreviated Mb or Mbp). One picogram is equal to 978 megabases.
In
diploid
Ploidy () is the number of complete sets of chromosomes in a cell, and hence the number of possible alleles for autosomal and pseudoautosomal genes. Sets of chromosomes refer to the number of maternal and paternal chromosome copies, respectively ...
organisms, genome size is often used interchangeably with the term
C-value.
An organism's complexity is not directly proportional to its genome size; total DNA content is widely variable between biological taxa. Some single-celled organisms have much more DNA than humans, for reasons that remain unclear (see
non-coding DNA and
C-value enigma).
Origin of the term
The term "genome size" is often erroneously attributed to a 1976 paper by Ralph Hinegardner,
even in discussions dealing specifically with terminology in this area of research (e.g., Greilhuber 2005
). Notably, Hinegardner
used the term only once: in the title. The term actually seems to have first appeared in 1968, when Hinegardner wondered, in the last paragraph of another article, whether "
cellular DNA content does, in fact, reflect genome size".
In this context, "genome size" was being used in the sense of
genotype
The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
to mean the number of
genes.
In a paper submitted only two months later, Wolf et al. (1969)
used the term "genome size" throughout and in its present usage; therefore these authors should probably be credited with originating the term in its modern sense. By the early 1970s, "genome size" was in common usage with its present definition, probably as a result of its inclusion in
Susumu Ohno's influential book ''Evolution by Gene Duplication'', published in 1970.
Variation in genome size and gene content
With the emergence of various molecular techniques in the past 50 years, the genome sizes of thousands of
eukaryote
Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
s have been analyzed, and these data are available in online databases for animals, plants, and fungi (see external links). Nuclear genome size is typically measured in eukaryotes using either
densitometric measurements of
Feulgen-stained nuclei (previously using specialized densitometers, now more commonly using computerized
image analysis
Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as sophi ...
) or
flow cytometry
Flow cytometry (FC) is a technique used to detect and measure physical and chemical characteristics of a population of cells or particles.
In this process, a sample containing cells or particles is suspended in a fluid and injected into the flo ...
. In
prokaryotes,
pulsed field gel electrophoresis and complete
genome sequencing are the predominant methods of genome size determination.
Nuclear genome sizes are well known to vary enormously among eukaryotic species. In animals they range more than 3,300-fold, and in land plants they differ by a factor of about 1,000.
Protist genomes have been reported to vary more than 300,000-fold in size, but the high end of this range (''
Amoeba'') has been called into question. In eukaryotes (but not prokaryotes), genome size is not proportional to the number of
genes present in the genome, an observation that was deemed wholly counter-intuitive before the discovery of
non-coding DNA and which became known as the "
C-value paradox" as a result. However, although there is no longer any paradoxical aspect to the discrepancy between genome size and gene number, the term remains in common usage. For reasons of conceptual clarification, the various puzzles that remain with regard to genome size variation instead have been suggested by one author to more accurately comprise a puzzle or an enigma (the so-called "
C-value enigma").
Genome size correlates with a range of measurable characteristics at the
cell and organism levels, including cell size,
cell division rate, and, depending on the
taxon, body size,
metabolic rate, developmental rate,
organ
Organ may refer to:
Biology
* Organ (biology), a part of an organism
Musical instruments
* Organ (music), a family of keyboard musical instruments characterized by sustained tone
** Electronic organ, an electronic keyboard instrument
** Hammond ...
complexity, geographical distribution, or
extinction risk.
Based on currently available completely sequenced genome data (as of April 2009), log-transformed gene number forms a linear correlation with log-transformed genome size in bacteria, archaea, viruses, and organelles combined, whereas a nonlinear (semi-natural logarithm) correlation is seen for eukaryotes.
Although the latter contrasts with the previous view that no correlation exists for the eukaryotes, the observed nonlinear correlation for eukaryotes may reflect disproportionately fast-increasing
non-coding DNA in increasingly large eukaryotic genomes. Although sequenced genome data are practically biased toward small genomes, which may compromise the accuracy of the empirically derived correlation, and ultimate proof of the correlation remains to be obtained by sequencing some of the largest eukaryotic genomes, current data do not seem to rule out a possible correlation.
Genome reduction
Genome reduction, also known as
genome degradation, is the process by which an organism's genome shrinks relative to that of its ancestors. Genomes fluctuate in size regularly, and
genome size reduction is most significant in
bacteria.
The most evolutionarily significant cases of genome reduction may be observed in the eukaryotic
organelle
In cell biology, an organelle is a specialized subunit, usually within a cell, that has a specific function. The name ''organelle'' comes from the idea that these structures are parts of cells, as organs are to the body, hence ''organelle,'' the ...
s known to be derived from bacteria:
mitochondria
A mitochondrion (; ) is an organelle found in the Cell (biology), cells of most Eukaryotes, such as animals, plants and Fungus, fungi. Mitochondria have a double lipid bilayer, membrane structure and use aerobic respiration to generate adenosi ...
and
plastid
The plastid (Greek: πλαστός; plastós: formed, molded – plural plastids) is a membrane-bound organelle found in the Cell (biology), cells of plants, algae, and some other eukaryotic organisms. They are considered to be intracellular endosy ...
s. These organelles are descended from primordial
endosymbionts, which were capable of surviving within the host cell and which the host cell likewise needed for survival. Many present-day mitochondria have less than 20 genes in their entire genome, whereas a modern free-living bacterium generally has at least 1,000 genes. Many genes have apparently been transferred to the host
nucleus, while others have simply been lost and their function replaced by host processes.
Other bacteria have become endosymbionts or obligate intracellular
pathogens and experienced extensive genome reduction as a result. This process seems to be dominated by
genetic drift resulting from small
population size, low
recombination rates, and high
mutation rates, as opposed to
selection for smaller genomes. Some free-living marine bacterioplanktons also shows signs of genome reduction, which are hypothesized to be driven by natural selection.
In obligate endosymbiotic species
Obligate endosymbiotic species are characterized by a complete inability to survive external to their
host environment. These species have become a considerable threat to human health, as they are often capable of evading human immune systems and manipulating the host environment to acquire nutrients. A common explanation for these manipulative abilities is their consistently compact and efficient genomic structure. These small genomes are the result of massive losses of extraneous DNA, an occurrence that is exclusively associated with the loss of a free-living stage. As much as 90% of the genetic material can be lost when a species makes the evolutionary transition from a free-living to an obligate intracellular lifestyle. During this process the future parasite subjected to an environment rich of metabolite where somehow needs to hide within the host cell, those factors reduce the retention and increase the genetic drift leading to an acceleration of the loss of non-essential genes.
Common examples of species with reduced genomes include ''
Buchnera aphidicola'', ''
Rickettsia prowazekii'', and ''
Mycobacterium leprae''. One obligate endosymbiont of
leafhoppers
A leafhopper is the common name for any species from the family Cicadellidae. These minute insects, colloquially known as hoppers, are plant feeders that suck plant sap from grass, shrubs, or trees. Their hind legs are modified for jumping, a ...
, ''
Nasuia deltocephalinicola
''Nasuia deltocephalinicola'' was reported in 2013 to have the smallest genome of all bacteria, with 112,091 nucleotides. For comparison, the human genome has 3.2 billion nucleotides. The second smallest genome, from bacteria '' Tremblaya princep ...
'', has the smallest genome currently known among cellular organisms at 112 kb. Despite the pathogenicity of most endosymbionts, some obligate intracellular species have positive fitness effects on their hosts.
The reductive evolution model has been proposed as an effort to define the genomic commonalities seen in all obligate endosymbionts.
This model illustrates four general features of reduced genomes and obligate intracellular species:
#"genome streamlining" resulting from relaxed selection on genes that are superfluous in the intracellular environment;
#a bias towards
deletions (rather than insertions), which heavily affects genes that have been disrupted by accumulation of mutations (
pseudogene
Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by DNA duplication or indirectly by Reverse transcriptase, reverse transcription of an mRNA trans ...
s);
#very little or no capability for acquiring new DNA; and
#considerable reduction of
effective population size in endosymbiotic populations, particularly in species that rely on
vertical transmission Vertical transmission of symbionts is the transfer of a microbial symbiont from the parent directly to the offspring. Many metazoan species carry symbiotic bacteria which play a mutualistic, commensal, or parasitic role. A symbiont is acquire ...
of genetic material.
Based on this model, it is clear that endosymbionts face different adaptive challenges than free-living species and, as emerged from the analysis between different parasites, their genes inventories are extremely different, leading us to the conclusion that the genome miniaturization follows a different pattern for the different symbionts.
Conversion from picograms (pg) to base pairs (bp)
:
or simply:
:
Drake's rule
In 1991,
John W. Drake proposed a general rule: that the mutation rate within a genome and its size are inversely correlated.
This rule has been found to be approximately correct for simple genomes such as those in
DNA viruses and unicellular organisms. Its basis is unknown.
It has been proposed that the small size of
RNA virus
An RNA virus is a virusother than a retrovirusthat has ribonucleic acid (RNA) as its genetic material. The nucleic acid is usually single-stranded RNA ( ssRNA) but it may be double-stranded (dsRNA). Notable human diseases caused by RNA viruses ...
es is locked into a three-part relation between replication fidelity, genome size, and genetic complexity. The majority of RNA viruses lack an RNA proofreading facility, which limits their replication fidelity and hence their genome size. This has also been described as the "Eigen paradox". An exception to the rule of small genome sizes in RNA viruses is found in the
Nidoviruses. These viruses appear to have acquired a
3′-to-5′ exoribonuclease (ExoN) which has allowed for an increase in genome size.
Genome miniaturization and optimal size
In 1972 Michael David Bennett hypothesized that there was a correlation with the DNA content and the nuclear volume while
Commoner
A commoner, also known as the ''common man'', ''commoners'', the ''common people'' or the ''masses'', was in earlier use an ordinary person in a community or nation who did not have any significant social status, especially a member of neither ...
and
van’t Hof and Sparrow before him postulated that even cell size and cell-cycle length were controlled by the amount of DNA. More recent theories have brought us to discuss about the possibility of the presence of a mechanism that constrains physically the development of the genome to an optimal size.
Those explanations have been disputed by
Cavalier-Smith’s article
where the author pointed that the way to understand the relation between genome size and cell volume was related to the skeletal theory. The nucleus of this theory is related to the cell volume, determined by an adaptation balance between advantages and disadvantages of bigger cell size, the optimization of the ratio nucleus:cytoplasm (karyoplasmatic ratio) and the concept that larger genomes provides are more prone to the accumulation of duplicative transposons as consequences of higher content of non-coding skeletal DNA.
Cavalier-Smith also proposed that, as consequent reaction of a cell reduction, the nucleus will be more prone to a selection in favor for the deletion compared to the duplication.
From the economic way of thinking, since phosphorus and energy are scarce, a reduction in the DNA should be always the focus of the evolution, unless a benefit is acquired. The random deletion will be then mainly deleterious and not selected due to the reduction of the gained fitness but occasionally the elimination will be advantageous as well. This trade-off between economy and accumulation of non-coding DNA is the key to the maintenance of the karyoplasmatic ratio.
Mechanisms of genome miniaturization
The base question behind the process of genome miniaturization is whether it occurs through large steps or due to a constant erosion of the gene content. In order to assess the evolution of this process is necessary to compare an ancestral genome with the one where the shrinkage is supposed to be occurred. Thanks to the similarity among the gene content of ''Buchnera aphidicola'' and the enteric bacteria ''Escherichia coli'', 89% identity for the 16S rDNA and 62% for
orthologous genes was possible to shed light on the mechanism of genome miniaturization.
The genome of the
endosymbiont ''B. aphidicola'' is characterized by a genome size that is seven times smaller than ''E. coli'' (643 kb compared to 4.6 Mb)
and can be view as a subset of the enteric bacteria gene inventory.
From the confrontation of the two genomes emerged that some genes persist as partially degraded.
indicating that the function was lost during the process and that consequent events of erosion shortened the length as documented in ''Rickettsia''.
This hypothesis is confirmed by the analysis of the
pseudogene
Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by DNA duplication or indirectly by Reverse transcriptase, reverse transcription of an mRNA trans ...
s of ''Buchnera'' where the number of deletions was more than ten times higher compared to the insertion.
In ''Rickettsia prowazekii'', as with other small genome bacteria, this mutualistic endosymbiont has experienced a vast reduction of functional activity with a major exception compared to other parasites still retain the bio-synthetic ability of production of amino acid needed by its host.
The common effects of the genome shrinking between this endosymbiont and the other parasites are the reduction of the ability to produce phospholipids, repair and recombination and an overall conversion of the composition of the gene to a richer A-T content due to mutation and substitutions.
Evidence of the deletion of the function of repair and recombination is the loss of the gene ''rec''A, gene involved in the
recombinase
Recombinases are genetic recombination enzymes.
Site specific recombinases
DNA recombinases are widely used in multicellular organisms to manipulate the structure of genomes, and to control gene expression. These enzymes, derived from bacteria (b ...
pathway. This event happened during the removal of a larger region containing ten genes for a total of almost 10 kb.
Same faith occurred ''uvr''A, ''uvr''B and ''uvr''C, genes encoding for excision enzymes involved in the repair damaged DNA due to UV exposure.
One of the most plausible mechanisms for the explanation of the genome shrinking is the chromosomal rearrangement because insertion/deletion of larger portion of sequence are more easily to be seen in during homologous recombination compared to the illegitimate, therefore the spread of the
transposable elements will positively affect the rate of deletion.
The loss of those genes in the early stages of miniaturization not only this function but must played a role in the evolution of the consequent deletions. Evidences of the fact that larger event of removal occurred before smaller deletion emerged from the comparison of the genome of ''Bucknera'' and a reconstructed ancestor, where the gene that have been lost are in fact not randomly dispersed in the ancestor gene but aggregated and the negative relation between number of lost genes and length of the spacers.
The event of small local indels plays a marginal role on the genome reduction especially in the early stages where a larger number of genes became superfluous.
Single events instead occurred due to the lack of selection pressure for the retention of genes especially if part of a pathway that lost its function during a previous deletion. An example for this is the deletion of ''rec''F, gene required for the function of ''rec''A, and its flanking genes. One of the consequences of the elimination of such amount of sequences affected even the regulation of the remaining genes. The loss of large section of genomes could in fact lead to a loss in promotor sequences. This could in fact pushed the selection for the evolution of
polycistronic regions with a positive effect for both size reduction and transcription efficiency.
Evidence of genome miniaturization
One example of the miniaturization of the genome occurred in the
microsporidia, an anaerobic intracellular parasite of arthropods evolved from aerobic fungi.
During this process the
mitosomes was formed consequent to the reduction of the mitochondria to a relic voided of genomes and metabolic activity except to the production of iron sulfur centers and the capacity to enter into the host cells. Except for the
ribosome
Ribosomes ( ) are macromolecular machines, found within all cells, that perform biological protein synthesis (mRNA translation). Ribosomes link amino acids together in the order specified by the codons of messenger RNA (mRNA) molecules to ...
s, miniaturized as well, many other organelles have been almost lost during the process of the formation of the smallest genome found in the eukaryotes.
From their possible ancestor, a
zygomycotine fungi, the microsporidia shrunk its genome eliminating almost 1000 genes and reduced even the size of protein and protein-coding genes. This extreme process was possible thanks to the advantageous selection for a smaller cell size imposed by the parasitism.
Another example of miniaturization is represented by the presence of
nucleomorphs, enslaved nuclei, inside of the cell of two different algae,
cryptophytes
The cryptomonads (or cryptophytes) are a group of algae, most of which have plastids. They are common in freshwater, and also occur in marine and brackish habitats. Each cell is around 10–50 μm in size and flattened in shape, with an anteri ...
and
chlorarachneans.
Nucleomorphs are characterized by one of the smallest genomes known (551 and 380 kb) and as noticed for microsporidia, some genomes are noticeable reduced in length compared to other eukaryotes due to a virtual lack of non-coding DNA.
The most interesting factor is represented by the coexistence of those small nuclei inside of a cell that contains another nucleus that never experienced such genome reduction. Moreover, even if the host cells have different volumes from species to species and a consequent variability in genome size, the nucleomorph remain invariant denoting a double effect of selection within the same cell.
See also
*
Animal Genome Size Database
*
Bacterial genome size
Bacterial genomes are generally smaller and less variant in size among species when compared with genomes of eukaryotes. Bacterial genomes can range in size anywhere from about 130 kbp to over 14 Mbp. A study that included, but was not limited t ...
*
C-value
*
Cell nucleus
The cell nucleus (pl. nuclei; from Latin or , meaning ''kernel'' or ''seed'') is a membrane-bound organelle found in eukaryotic cells. Eukaryotic cells usually have a single nucleus, but a few cell types, such as mammalian red blood cells, h ...
*
Comparative genomics
Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural lan ...
*
Comparison of different genome sizes
*
Human genome
*
Junk DNA
*
List of sequenced eukaryotic genomes
*
Non-coding DNA
*
Plant DNA C-values Database
*
Selfish DNA
*
Transposable elements
References
Further reading
Evolution of Chlamydiaceae*
External links
Animal Genome Size DatabasePlant DNA C-values DatabaseFungal Genome Size DatabaseFungal Database — by CBS
{{DEFAULTSORT:Genome Size
Genomics