CpG islands
   HOME

TheInfoList



OR:

The CpG sites or CG sites are regions of DNA where a
cytosine Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine (uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached (an ...
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...
is followed by a
guanine Guanine () ( symbol G or Gua) is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is c ...
nucleotide in the linear
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG islands (or CG islands). Cytosines in CpG dinucleotides can be
methylated In the chemical sciences, methylation denotes the addition of a methyl group on a substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replacing a hydrogen atom. These ...
to form 5-methylcytosines.
Enzyme Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products ...
s that add a methyl group are called
DNA methyltransferase In biochemistry, the DNA methyltransferase (DNA MTase, DNMT) family of enzymes catalyze the transfer of a methyl group to DNA. DNA methylation serves a wide variety of biological functions. All the known DNA methyltransferases use S-adenosyl m ...
s. In mammals, 70% to 80% of CpG cytosines are methylated. Methylating the cytosine within a gene can change its expression, a mechanism that is part of a larger field of science studying gene regulation that is called
epigenetics In biology, epigenetics is the study of stable phenotypic changes (known as ''marks'') that do not involve alterations in the DNA sequence. The Greek prefix '' epi-'' ( "over, outside of, around") in ''epigenetics'' implies features that are ...
. Methylated cytosines often mutate to thymines. In humans, about 70% of promoters located near the
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, the fir ...
start site of a gene (proximal promoters) contain a CpG island.


CpG characteristics


Definition

''CpG'' is shorthand for ''5'—C—phosphate—G—3' '', that is, cytosine and guanine separated by only one
phosphate In chemistry, a phosphate is an anion, salt, functional group or ester derived from a phosphoric acid. It most commonly means orthophosphate, a derivative of orthophosphoric acid . The phosphate or orthophosphate ion is derived from phosph ...
group; phosphate links any two
nucleoside Nucleosides are glycosylamines that can be thought of as nucleotides without a phosphate group. A nucleoside consists simply of a nucleobase (also termed a nitrogenous base) and a five-carbon sugar (ribose or 2'-deoxyribose) whereas a nucleoti ...
s together in DNA. The ''CpG'' notation is used to distinguish this single-stranded linear sequence from the ''CG''
base-pairing A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
of cytosine and guanine for double-stranded sequences. The CpG notation is therefore to be interpreted as the cytosine being 5 prime to the guanine base. ''CpG'' should not be confused with ''GpC'', the latter meaning that a guanine is followed by a cytosine in the 5' → 3' direction of a single-stranded sequence.


Under-representation caused by high mutation rate

CpG dinucleotides have long been observed to occur with a much lower frequency in the sequence of vertebrate genomes than would be expected due to random chance. For example, in the human genome, which has a 42%
GC content In molecular biology and genetics, GC-content (or guanine-cytosine content) is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out ...
, a pair of
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...
s consisting of cytosine followed by guanine would be expected to occur 0.21 \times 0.21 = 4.41 \% of the time. The frequency of CpG dinucleotides in human genomes is less than one-fifth of the expected frequency. This underrepresentation is a consequence of the high
mutation rate In genetics, the mutation rate is the frequency of new mutations in a single gene or organism over time. Mutation rates are not constant and are not limited to a single type of mutation; there are many different types of mutations. Mutation rates ...
of methylated CpG sites: the spontaneously occurring
deamination Deamination is the removal of an amino group from a molecule. Enzymes that catalyse this reaction are called deaminases. In the human body, deamination takes place primarily in the liver, however it can also occur in the kidney. In situations of ...
of a methylated cytosine results in a
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidi ...
, and the resulting G:T mismatched bases are often improperly resolved to A:T; whereas the deamination of unmethylated cytosine results in a
uracil Uracil () (symbol U or Ura) is one of the four nucleobases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced b ...
, which as a foreign base is quickly replaced by a cytosine by the
base excision repair Base excision repair (BER) is a cellular mechanism, studied in the fields of biochemistry and genetics, that repairs damaged DNA throughout the cell cycle. It is responsible primarily for removing small, non-helix-distorting base lesions from t ...
mechanism. The C to T transition rate at methylated CpG sites is ~10 fold higher than at unmethylated sites.


Genomic distribution

CpG dinucleotides frequently occur in CpG islands (see definition of CpG islands, below). There are 28,890 CpG islands in the human genome, (50,267 if one includes CpG islands in repeat sequences). This is in agreement with the 28,519 CpG islands found by Venter et al. since the Venter et al. genome sequence did not include the interiors of highly similar repetitive elements and the extremely dense repeat regions near the centromeres. Since CpG islands contain multiple CpG dinucleotide sequences, there appear to be more than 20 million CpG dinucleotides in the human genome.


CpG islands

CpG islands (or CG islands) are regions with a high frequency of CpG sites. Though objective definitions for CpG islands are limited, the usual formal definition is a region with at least 200 bp, a GC percentage greater than 50%, and an observed-to-expected CpG ratio greater than 60%. The "observed-to-expected CpG ratio" can be derived where the observed is calculated as: (\textCpGs) and the expected as (\textC * \textG) / \text or ((\textC + \textG)/2)^2 / \text. Many genes in mammalian genomes have CpG islands associated with the start of the gene ( promoter regions). Because of this, the presence of a CpG island is used to help in the prediction and annotation of genes. In mammalian genomes, CpG islands are typically 300–3,000 base pairs in length, and have been found in or near approximately 40% of promoters of mammalian genes. Over 60% of human genes and almost all house-keeping genes have their promoters embedded in CpG islands. Given the frequency of GC two-nucleotide sequences, the number of CpG dinucleotides is much lower than would be expected. A 2002 study revised the rules of CpG island prediction to exclude other GC-rich genomic sequences such as Alu repeats. Based on an extensive search on the complete sequences of human chromosomes 21 and 22, DNA regions greater than 500 bp were found more likely to be the "true" CpG islands associated with the 5' regions of genes if they had a GC content greater than 55%, and an observed-to-expected CpG ratio of 65%. CpG islands are characterized by CpG dinucleotide content of at least 60% of that which would be statistically expected (~4–6%), whereas the rest of the genome has much lower CpG frequency (~1%), a phenomenon called
CG suppression CG suppression is a term for the phenomenon that CG dinucleotides are very uncommon in most portions of vertebrate genomes. In adult somatic tissues, cytosine residues may be methylated, and this occurs almost exclusively within a symmetric CpG ...
. Unlike CpG sites in the coding region of a gene, in most instances the CpG sites in the CpG islands of promoters are unmethylated if the genes are expressed. This observation led to the speculation that methylation of CpG sites in the promoter of a gene may inhibit gene expression. Methylation, along with
histone In biology, histones are highly basic proteins abundant in lysine and arginine residues that are found in eukaryotic cell nuclei. They act as spools around which DNA winds to create structural units called nucleosomes. Nucleosomes in turn a ...
modification, is central to imprinting. Most of the methylation differences between tissues, or between normal and cancer samples, occur a short distance from the CpG islands (at "CpG island shores") rather than in the islands themselves. CpG islands typically occur at or near the transcription start site of genes, particularly
housekeeping gene In molecular biology, housekeeping genes are typically constitutive genes that are required for the maintenance of basic cellular function, and are expressed in all cells of an organism under normal and patho-physiological conditions. Although ...
s, in vertebrates. A C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the cytosines in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over time methylated cytosines tend to turn into
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidi ...
s because of spontaneous
deamination Deamination is the removal of an amino group from a molecule. Enzymes that catalyse this reaction are called deaminases. In the human body, deamination takes place primarily in the liver, however it can also occur in the kidney. In situations of ...
. There is a special enzyme in humans (
Thymine-DNA glycosylase G/T mismatch-specific thymine DNA glycosylase is an enzyme that in humans is encoded by the TDG gene. Several bacterial proteins have strong sequence homology with this protein. Function The protein encoded by this gene belongs to the TDG/mug ...
, or TDG) that specifically replaces T's from T/G mismatches. However, due to the rarity of CpGs, it is theorised to be insufficiently effective in preventing a possibly rapid mutation of the dinucleotides. The existence of CpG islands is usually explained by the existence of selective forces for relatively high CpG content, or low levels of methylation in that genomic area, perhaps having to do with the regulation of gene expression. A 2011 study showed that most CpG islands are a result of non-selective forces.


Methylation, silencing, cancer, and aging


CpG islands in promoters

In humans, about 70% of promoters located near the
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, the fir ...
start site of a gene (proximal promoters) contain a
CpG island The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG isl ...
. Distal promoter elements also frequently contain CpG islands. An example is the DNA repair gene ''
ERCC1 DNA excision repair protein ERCC-1 is a protein that in humans is encoded by the ''ERCC1'' gene. Together with ERCC4, ERCC1 forms the ERCC1-XPF enzyme complex that participates in DNA repair and DNA recombination. Many aspects of these two gene ...
'', where the CpG island-containing element is located about 5,400 nucleotides upstream of the
transcription start site Transcription is the process of copying a segment of DNA into RNA. The segments of DNA transcribed into RNA molecules that can encode proteins are said to produce messenger RNA (mRNA). Other segments of DNA are copied into RNA molecules calle ...
of the ''ERCC1'' gene. CpG islands also occur frequently in promoters for functional noncoding RNAs such as
microRNA MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. mi ...
s.


Methylation of CpG islands stably silences genes

In humans, DNA methylation occurs at the 5 position of the pyrimidine ring of the cytosine residues within CpG sites to form 5-methylcytosines. The presence of multiple methylated CpG sites in CpG islands of promoters causes stable silencing of genes. Silencing of a gene may be initiated by other mechanisms, but this is often followed by methylation of CpG sites in the promoter CpG island to cause the stable silencing of the gene.


Promoter CpG hyper/hypo-methylation in cancer

In cancers, loss of expression of genes occurs about 10 times more frequently by hypermethylation of promoter CpG islands than by mutations. For example, in a colorectal cancer there are usually about 3 to 6 driver mutations and 33 to 66
hitchhiker Hitchhiking (also known as thumbing, autostop or hitching) is a means of transportation that is gained by asking individuals, usually strangers, for a ride in their car or other vehicle. The ride is usually, but not always, free. Nomads hav ...
or passenger mutations. In contrast, in one study of colon tumors compared to adjacent normal-appearing colonic mucosa, 1,734 CpG islands were heavily methylated in tumors whereas these CpG islands were not methylated in the adjacent mucosa. Half of the CpG islands were in promoters of annotated protein coding genes, suggesting that about 867 genes in a colon tumor have lost expression due to CpG island methylation. A separate study found an average of 1,549 differentially methylated regions (hypermethylated or hypomethylated) in the genomes of six colon cancers (compared to adjacent mucosa), of which 629 were in known promoter regions of genes. A third study found more than 2,000 genes differentially methylated between colon cancers and adjacent mucosa. Using gene set enrichment analysis, 569 out of 938 gene sets were hypermethylated and 369 were hypomethylated in cancers. Hypomethylation of CpG islands in promoters results in overexpression of the genes or gene sets affected. One 2012 study listed 147 specific genes with colon cancer-associated hypermethylated promoters, along with the frequency with which these hypermethylations were found in colon cancers. At least 10 of those genes had hypermethylated promoters in nearly 100% of colon cancers. They also indicated 11
microRNA MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. mi ...
s whose promoters were hypermethylated in colon cancers at frequencies between 50% and 100% of cancers. MicroRNAs (miRNAs) are small endogenous RNAs that pair with sequences in messenger RNAs to direct post-transcriptional repression. On average, each microRNA represses several hundred target genes. Thus microRNAs with hypermethylated promoters may be allowing over-expression of hundreds to thousands of genes in a cancer. The information above shows that, in cancers, promoter CpG hyper/hypo-methylation of genes and of microRNAs causes loss of expression (or sometimes increased expression) of far more genes than does mutation.


DNA repair genes with hyper/hypo-methylated promoters in cancers

DNA repair genes are frequently repressed in cancers due to hypermethylation of CpG islands within their promoters. In head and neck squamous cell carcinomas at least 15 DNA repair genes have frequently hypermethylated promoters; these genes are ''XRCC1, MLH3, PMS1, RAD51B, XRCC3, RAD54B, BRCA1, SHFM1, GEN1, FANCE, FAAP20, SPRTN, SETMAR, HUS1,'' and ''PER1''. About seventeen types of cancer are frequently deficient in one or more DNA repair genes due to hypermethylation of their promoters. As an example, promoter hypermethylation of the DNA repair gene '' MGMT'' occurs in 93% of bladder cancers, 88% of stomach cancers, 74% of thyroid cancers, 40%-90% of colorectal cancers and 50% of brain cancers. Promoter hypermethylation of ''
LIG4 DNA ligase 4 is an enzyme that in humans is encoded by the LIG4 gene. Function The protein encoded by this gene is an ATP-dependent DNA ligase that joins double-strand breaks during the non-homologous end joining pathway of double-strand break ...
'' occurs in 82% of colorectal cancers. Promoter hypermethylation of '' NEIL1'' occurs in 62% of
head and neck cancer Head and neck cancer develops from tissues in the lip and oral cavity (mouth), larynx (throat), salivary glands, nose, sinuses or the skin of the face. The most common types of head and neck cancers occur in the lip, mouth, and larynx. Symptoms ...
s and in 42% of
non-small-cell lung cancer Non-small-cell lung cancer (NSCLC) is any type of epithelial lung cancer other than small-cell lung carcinoma (SCLC). NSCLC accounts for about 85% of all lung cancers. As a class, NSCLCs are relatively insensitive to chemotherapy, compared to s ...
s. Promoter hypermethylation of '' ATM'' occurs in 47% of
non-small-cell lung cancer Non-small-cell lung cancer (NSCLC) is any type of epithelial lung cancer other than small-cell lung carcinoma (SCLC). NSCLC accounts for about 85% of all lung cancers. As a class, NSCLCs are relatively insensitive to chemotherapy, compared to s ...
s. Promoter hypermethylation of '' MLH1'' occurs in 48% of
non-small-cell lung cancer Non-small-cell lung cancer (NSCLC) is any type of epithelial lung cancer other than small-cell lung carcinoma (SCLC). NSCLC accounts for about 85% of all lung cancers. As a class, NSCLCs are relatively insensitive to chemotherapy, compared to s ...
squamous cell carcinomas. Promoter hypermethylation of ''
FANCB Fanconi anemia group B protein is a protein that in humans is encoded by the ''FANCB'' gene. Function The Fanconi anemia complementation group (FANC) currently includes FANCA, FANCB, FANCC, FANCD1 (also called BRCA2), FANCD2, FANCE, FANCF ...
'' occurs in 46% of
head and neck cancer Head and neck cancer develops from tissues in the lip and oral cavity (mouth), larynx (throat), salivary glands, nose, sinuses or the skin of the face. The most common types of head and neck cancers occur in the lip, mouth, and larynx. Symptoms ...
s. On the other hand, the promoters of two genes, ''
PARP1 Poly DP-ribosepolymerase 1 (PARP-1) also known as NAD+ ADP-ribosyltransferase 1 or poly DP-ribosesynthase 1 is an enzyme that in humans is encoded by the ''PARP1'' gene. It is the most abundant of the PARP family of enzymes, accounting for 90% o ...
'' and ''
FEN1 Flap endonuclease 1 is an enzyme that in humans is encoded by the ''FEN1'' gene. Function The protein encoded by this gene removes 5' overhanging "flaps" (or short sections of single stranded DNA that "hang off" because their nucleotide bases a ...
'', were hypomethylated and these genes were over-expressed in numerous cancers. ''PARP1'' and ''FEN1'' are essential genes in the error-prone and mutagenic DNA repair pathway
microhomology-mediated end joining Microhomology-mediated end joining (MMEJ), also known as alternative nonhomologous end-joining (Alt-NHEJ) is one of the pathways for repairing double-strand breaks in DNA. As reviewed by McVey and Lee, the foremost distinguishing property of MMEJ ...
. If this pathway is over-expressed the excess mutations it causes can lead to cancer.
PARP1 Poly DP-ribosepolymerase 1 (PARP-1) also known as NAD+ ADP-ribosyltransferase 1 or poly DP-ribosesynthase 1 is an enzyme that in humans is encoded by the ''PARP1'' gene. It is the most abundant of the PARP family of enzymes, accounting for 90% o ...
is over-expressed in tyrosine kinase-activated leukemias, in neuroblastoma, in testicular and other germ cell tumors, and in Ewing's sarcoma,
FEN1 Flap endonuclease 1 is an enzyme that in humans is encoded by the ''FEN1'' gene. Function The protein encoded by this gene removes 5' overhanging "flaps" (or short sections of single stranded DNA that "hang off" because their nucleotide bases a ...
is over-expressed in the majority of cancers of the breast, prostate, stomach, neuroblastomas, pancreatic, and lung. DNA damage appears to be the primary underlying cause of cancer. If accurate DNA repair is deficient, DNA damages tend to accumulate. Such excess DNA damage can increase
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA replication, DNA or viral repl ...
al errors during
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritanc ...
due to error-prone
translesion synthesis DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as radiation can cause DNA dama ...
. Excess DNA damage can also increase epigenetic alterations due to errors during DNA repair. Such mutations and epigenetic alterations can give rise to
cancer Cancer is a group of diseases involving abnormal cell growth with the potential to invade or spread to other parts of the body. These contrast with benign tumors, which do not spread. Possible signs and symptoms include a lump, abnormal b ...
(see
malignant neoplasms Cancer is a group of diseases involving abnormal cell growth with the potential to invade or spread to other parts of the body. These contrast with benign tumors, which do not spread. Possible signs and symptoms include a lump, abnormal bl ...
). Thus, CpG island hyper/hypo-methylation in the promoters of DNA repair genes are likely central to progression to cancer.


Methylation of CpG sites with age

Since age has a strong effect on DNA methylation levels on tens of thousands of CpG sites, one can define a highly accurate biological clock (referred to as epigenetic clock or DNA methylation age) in humans and chimpanzees.


Unmethylated sites

Unmethylated CpG dinucleotide sites can be detected by Toll-like receptor 9 (
TLR 9 Toll-like receptor 9 is a protein that in humans is encoded by the ''TLR9'' gene. TLR9 has also been designated as CD289 (cluster of differentiation 289). It is a member of the toll-like receptor (TLR) family. TLR9 is an important receptor expresse ...
) on
plasmacytoid dendritic cell Plasmacytoid dendritic cells (pDCs) are a rare type of immune cell that are known to secrete large quantities of type 1 interferon (IFNs) in response to a viral infection. They circulate in the blood and are found in peripheral lymphoid organs. T ...
s,
monocyte Monocytes are a type of leukocyte or white blood cell. They are the largest type of leukocyte in blood and can differentiate into macrophages and conventional dendritic cells. As a part of the vertebrate innate immune system monocytes also ...
s, natural killer (NK) cells, and B cells in humans. This is used to detect intracellular viral infection.


Role of CpG sites in memory

In mammals,
DNA methyltransferase In biochemistry, the DNA methyltransferase (DNA MTase, DNMT) family of enzymes catalyze the transfer of a methyl group to DNA. DNA methylation serves a wide variety of biological functions. All the known DNA methyltransferases use S-adenosyl m ...
s (which add
methyl group In organic chemistry, a methyl group is an alkyl derived from methane, containing one carbon atom bonded to three hydrogen atoms, having chemical formula . In formulas, the group is often abbreviated as Me. This hydrocarbon group occurs in ma ...
s to DNA bases) exhibit a sequence preference for cytosines within CpG sites. In the mouse brain, 4.2% of all cytosines are methylated, primarily in the context of CpG sites, forming 5mCpG. Most hypermethylated 5mCpG sites increase the repression of associated genes. As reviewed by Duke et al., neuron DNA methylation (repressing expression of particular genes) is altered by neuronal activity. Neuron DNA methylation is required for
synaptic plasticity In neuroscience, synaptic plasticity is the ability of synapses to strengthen or weaken over time, in response to increases or decreases in their activity. Since memories are postulated to be represented by vastly interconnected neural circuits ...
; is modified by experiences; and active DNA methylation and demethylation is required for memory formation and maintenance. In 2016 Halder et al. using mice, and in 2017 Duke et al. using rats, subjected the rodents to contextual
fear conditioning Pavlovian fear conditioning is a behavioral paradigm in which organisms learn to predict aversive events. It is a form of learning in which an aversive stimulus (e.g. an electrical shock) is associated with a particular neutral context (e.g., a ...
, causing an especially strong
long-term memory Long-term memory (LTM) is the stage of the Atkinson–Shiffrin memory model in which informative knowledge is held indefinitely. It is defined in contrast to short-term and working memory, which persist for only about 18 to 30 seconds. Long- ...
to form. At 24 hours after the conditioning, in the
hippocampus The hippocampus (via Latin from Greek , ' seahorse') is a major component of the brain of humans and other vertebrates. Humans and other mammals have two hippocampi, one in each side of the brain. The hippocampus is part of the limbic system, ...
brain region of rats, the expression of 1,048 genes was down-regulated (usually associated with 5mCpG in gene promoters) and the expression of 564 genes was up-regulated (often associated with hypomethylation of CpG sites in gene promoters). At 24 hours after training, 9.2% of the genes in the rat genome of
hippocampus The hippocampus (via Latin from Greek , ' seahorse') is a major component of the brain of humans and other vertebrates. Humans and other mammals have two hippocampi, one in each side of the brain. The hippocampus is part of the limbic system, ...
neurons were differentially methylated. However while the hippocampus is essential for learning new information it does not store information itself. In the mouse experiments of Halder, 1,206 differentially methylated genes were seen in the hippocampus one hour after contextual fear conditioning but these altered methylations were reversed and not seen after four weeks. In contrast with the absence of long-term CpG methylation changes in the hippocampus, substantial differential CpG methylation could be detected in cortical neurons during memory maintenance. There were 1,223 differentially methylated genes in the anterior cingulate cortex of mice four weeks after contextual fear conditioning.


Demethylation at CpG sites requires ROS activity

In adult somatic cells DNA methylation typically occurs in the context of CpG dinucleotides (
CpG sites The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG isl ...
), forming 5-methylcytosine-pG, or 5mCpG. Reactive oxygen species (ROS) may attack guanine at the dinucleotide site, forming
8-hydroxy-2'-deoxyguanosine 8-Oxo-2'-deoxyguanosine (8-oxo-dG) is an oxidized derivative of deoxyguanosine. 8-Oxo-dG is one of the major products of DNA oxidation. Concentrations of 8-oxo-dG within a cell are a measurement of oxidative stress. In DNA Steady-state levels ...
(8-OHdG), and resulting in a 5mCp-8-OHdG dinucleotide site. The
base excision repair Base excision repair (BER) is a cellular mechanism, studied in the fields of biochemistry and genetics, that repairs damaged DNA throughout the cell cycle. It is responsible primarily for removing small, non-helix-distorting base lesions from t ...
enzyme OGG1 targets 8-OHdG and binds to the lesion without immediate excision. OGG1, present at a 5mCp-8-OHdG site recruits TET1 and TET1 oxidizes the 5mC adjacent to the 8-OHdG. This initiates demethylation of 5mC. As reviewed in 2018, in brain neurons, 5mC is oxidized by the ten-eleven translocation (TET) family of dioxygenases ( TET1,
TET2 Tet methylcytosine dioxygenase 2 (''TET2'') is a human gene. It resides at chromosome 4q24, in a region showing recurrent microdeletions and copy-neutral loss of heterozygosity (CN-LOH) in patients with diverse myeloid malignancies. Function ' ...
, TET3) to generate 5-hydroxymethylcytosine (5hmC). In successive steps TET enzymes further hydroxylate 5hmC to generate 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC).
Thymine-DNA glycosylase G/T mismatch-specific thymine DNA glycosylase is an enzyme that in humans is encoded by the TDG gene. Several bacterial proteins have strong sequence homology with this protein. Function The protein encoded by this gene belongs to the TDG/mug ...
(TDG) recognizes the intermediate bases 5fC and 5caC and excises the
glycosidic bond A glycosidic bond or glycosidic linkage is a type of covalent bond that joins a carbohydrate (sugar) molecule to another group, which may or may not be another carbohydrate. A glycosidic bond is formed between the hemiacetal or hemiketal group ...
resulting in an apyrimidinic site (
AP site In biochemistry and molecular genetics, an AP site (apurinic/apyrimidinic site), also known as an abasic site, is a location in DNA (also in RNA but much less likely) that has neither a purine nor a pyrimidine base, either spontaneously or due ...
). In an alternative oxidative deamination pathway, 5hmC can be oxidatively deaminated by activity-induced cytidine deaminase/apolipoprotein B mRNA editing complex (AID/APOBEC) deaminases to form 5-hydroxymethyluracil (5hmU) or 5mC can be converted to
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidi ...
(Thy). 5hmU can be cleaved by TDG, single-strand-selective monofunctional uracil-DNA glycosylase 1 (
SMUG1 Single-strand selective monofunctional uracil DNA glycosylase is an enzyme that in humans is encoded by the ''SMUG1'' gene. SMUG1 is a glycosylase that removes uracil from single- and double-stranded DNA in nuclear chromatin, thus contributing to ...
), Nei-Like DNA Glycosylase 1 ( NEIL1), or methyl-CpG binding protein 4 (
MBD4 Methyl-CpG-binding domain protein 4 is a protein that in humans is encoded by the ''MBD4'' gene. Structure Human MBD4 protein has 580 amino acids with a methyl-CpG-binding domain at amino acids 82–147 and a C-terminal DNA glycosylase domain at ...
). AP sites and T:G mismatches are then repaired by base excision repair (BER) enzymes to yield
cytosine Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine (uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached (an ...
(Cyt). Two reviews summarize the large body of evidence for the critical and essential role of ROS in
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
formation. The
DNA demethylation For molecular biology in mammals, DNA demethylation causes replacement of 5-methylcytosine (5mC) in a DNA sequence by cytosine (C) (see figure of 5mC and C). DNA demethylation can occur by an active process at the site of a 5mC in a DNA sequenc ...
of thousands of CpG sites during memory formation depends on initiation by ROS. In 2016, Zhou et al., showed that ROS have a central role in
DNA demethylation For molecular biology in mammals, DNA demethylation causes replacement of 5-methylcytosine (5mC) in a DNA sequence by cytosine (C) (see figure of 5mC and C). DNA demethylation can occur by an active process at the site of a 5mC in a DNA sequenc ...
. TET1 is a key enzyme involved in demethylating 5mCpG. However, TET1 is only able to act on 5mCpG if an ROS has first acted on the guanine to form
8-hydroxy-2'-deoxyguanosine 8-Oxo-2'-deoxyguanosine (8-oxo-dG) is an oxidized derivative of deoxyguanosine. 8-Oxo-dG is one of the major products of DNA oxidation. Concentrations of 8-oxo-dG within a cell are a measurement of oxidative stress. In DNA Steady-state levels ...
(8-OHdG), resulting in a 5mCp-8-OHdG dinucleotide (see first figure in this section). After formation of 5mCp-8-OHdG, the
base excision repair Base excision repair (BER) is a cellular mechanism, studied in the fields of biochemistry and genetics, that repairs damaged DNA throughout the cell cycle. It is responsible primarily for removing small, non-helix-distorting base lesions from t ...
enzyme OGG1 binds to the 8-OHdG lesion without immediate excision. Adherence of OGG1 to the 5mCp-8-OHdG site recruits TET1, allowing TET1 to oxidize the 5mC adjacent to 8-OHdG, as shown in the first figure in this section. This initiates the demethylation pathway shown in the second figure in this section. Altered protein expression in neurons, controlled by ROS-dependent demethylation of CpG sites in gene promoters within neuron DNA, is central to memory formation.


CpG loss

CpG depletion has been observed in the process of DNA methylation of
Transposable Elements A transposable element (TE, transposon, or jumping gene) is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Trans ...
(TEs) where TEs are not only responsible in the genome expansion but also CpG loss in a host DNA. TEs can be known as "methylation centers" whereby the methylation process, the TEs spreads into the flanking DNA once in the host DNA. This spreading might subsequently result in CpG loss over evolutionary time. Older evolutionary times show a higher CpG loss in the flanking DNA, compared to the younger evolutionary times. Therefore, the DNA methylation can lead eventually to the noticeably loss of CpG sites in neighboring DNA.


Genome size and CpG ratio are negatively correlated

Previous studies have confirmed the variety of genomes sizes amount species, where
invertebrates Invertebrates are a paraphyletic group of animals that neither possess nor develop a vertebral column (commonly known as a ''backbone'' or ''spine''), derived from the notochord. This is a grouping including all animals apart from the chordat ...
and
vertebrates Vertebrates () comprise all animal taxa within the subphylum Vertebrata () (chordates with backbones), including all mammals, birds, reptiles, amphibians, and fish. Vertebrates represent the overwhelming majority of the phylum Chordata, with c ...
have small and big genomes compared to humans. The genome size is strongly connected to the number of transposable elements. However, there is a correlation between the number of TEs methylation versus the CpG amount. This negative correlation consequently causes depletion of CpG due to intergenic DNA methylation which is mostly attributed to the methylation of TEs. Overall, this contributes to a noticeable amount of CpG loss in different genomes species.


Alu elements as promoters of CpG loss

Alu elements are known as the most abundant type of transposable elements. Some studies have used Alu elements as a way to study the idea of which factor is responsible for genome expansion. Alu elements are CpG-rich in a longer amount of sequence, unlike LINEs and ERVs. Alus can work as a methylation center, and the insertion into a host DNA can produce DNA methylation and provoke a spreading into the Flanking DNA area. This spreading is why there are a considerable amount CpG loss and a considerable increase in genome expansion. However, this is a result that is analyzed over time because older Alus elements show more CpG loss in sites of neighboring DNA compared to younger ones.


See also

*
TLR9 Toll-like receptor 9 is a protein that in humans is encoded by the ''TLR9'' gene. TLR9 has also been designated as CD289 (cluster of differentiation 289). It is a member of the toll-like receptor (TLR) family. TLR9 is an important receptor expresse ...
, detector of unmethylated CpG sites * DNA methylation age


References

{{Portal bar, Biology Molecular genetics DNA