A regulatory sequence is a segment of a
nucleic acid
Nucleic acids are biopolymers, macromolecules, essential to all known forms of life. They are composed of nucleotides, which are the monomers made of three components: a 5-carbon sugar, a phosphate group and a nitrogenous base. The two main cl ...
molecule which is capable of increasing or decreasing the
expression
Expression may refer to:
Linguistics
* Expression (linguistics), a word, phrase, or sentence
* Fixed expression, a form of words with a specific meaning
* Idiom, a type of fixed expression
* Metaphorical expression, a particular word, phrase, o ...
of specific genes within an organism.
Regulation of gene expression
Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products (protein or RNA). Sophisticated programs of gene expression are wide ...
is an essential feature of all living organisms and viruses.
Description
In
DNA, regulation of
gene
In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
expression normally happens at the level of RNA biosynthesis (
transcription
Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including:
Genetics
* Transcription (biology), the copying of DNA into RNA, the fir ...
). It is accomplished through the sequence-specific binding of proteins (
transcription factors
In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The func ...
) that activate or inhibit transcription. Transcription factors may act as
activators,
repressors, or both. Repressors often act by preventing
RNA polymerase
In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that synthesizes RNA from a DNA template.
Using the enzyme helicase, RNAP locally opens the ...
from forming a productive complex with the transcriptional initiation region (
promoter), while activators facilitate formation of a productive complex. Furthermore, DNA motifs have been shown to be predictive of epigenomic modifications, suggesting that transcription factors play a role in regulating the
epigenome
An epigenome consists of a record of the chemical changes to the DNA and histone proteins of an organism; these changes can be passed down to an organism's offspring via transgenerational stranded epigenetic inheritance. Changes to the epigenome ...
.
In
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
, regulation may occur at the level of protein biosynthesis (
translation
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
), RNA cleavage,
RNA splicing
RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). It works by removing all the introns (non-coding regions of RNA) and ''splicing'' b ...
, or transcriptional termination. Regulatory sequences are frequently associated with
messenger RNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.
mRNA is created during the p ...
(mRNA) molecules, where they are used to control mRNA biogenesis or translation. A variety of biological molecules may bind to the RNA to accomplish this regulation, including proteins (e.g., translational repressors and splicing factors), other RNA molecules (e.g.,
miRNA
MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. miR ...
) and
small molecules
Within the fields of molecular biology and pharmacology, a small molecule or micromolecule is a low molecular weight (≤ 1000 daltons) organic compound that may regulate a biological process, with a size on the order of 1 nm. Many drugs ...
, in the case of
riboswitches
In molecular biology, a riboswitch is a regulatory segment of a messenger RNA molecule that binds a small molecule, resulting in a change in Translation (biology), production of the proteins encoded by the mRNA. Thus, an mRNA that contains a ribo ...
.
Activation and implementation
A regulatory DNA sequence does not regulate unless it is activated. Different regulatory sequences are activated and then implement their regulation by different mechanisms.
Enhancer activation and implementation
Expression of genes in mammals can be upregulated when signals are transmitted to the promoters associated with the genes.
''Cis''-regulatory DNA sequences that are located in DNA regions distant from the promoters of genes can have very large effects on gene expression, with some genes undergoing up to 100-fold increased expression due to such a ''cis''-regulatory sequence.
These ''cis''-regulatory sequences include
enhancers
In genetics, an enhancer is a short (50–1500 bp) region of DNA that can be bound by proteins ( activators) to increase the likelihood that transcription of a particular gene will occur. These proteins are usually referred to as transcriptio ...
,
silencers,
insulators
Insulator may refer to:
* Insulator (electricity), a substance that resists electricity
** Pin insulator, a device that isolates a wire from a physical support such as a pin on a utility pole
** Strain insulator, a device that is designed to work ...
and tethering elements.
Among this constellation of sequences, enhancers and their associated
transcription factor proteins have a leading role in the regulation of gene expression.
Enhancers
In genetics, an enhancer is a short (50–1500 bp) region of DNA that can be bound by proteins ( activators) to increase the likelihood that transcription of a particular gene will occur. These proteins are usually referred to as transcriptio ...
are sequences of the genome that are major gene-regulatory elements. Enhancers control cell-type-specific gene expression programs, most often by looping through long distances to come in physical proximity with the promoters of their target genes.
In a study of brain cortical neurons, 24,937 loops were found, bringing enhancers to promoters.
Multiple enhancers, each often at tens or hundred of thousands of nucleotides distant from their target genes, loop to their target gene promoters and coordinate with each other to control expression of their common target gene.
The schematic illustration in this section shows an enhancer looping around to come into close physical proximity with the promoter of a target gene. The loop is stabilized by a dimer of a connector protein (e.g. dimer of
CTCF
Transcriptional repressor CTCF also known as 11-zinc finger protein or CCCTC-binding factor is a transcription factor that in humans is encoded by the ''CTCF'' gene. CTCF is involved in many cellular processes, including transcriptional regulatio ...
or
YY1
YY1 (Yin Yang 1) is a transcriptional repressor protein in humans that is encoded by the YY1 gene.
Function
YY1 is a ubiquitously distributed transcription factor belonging to the GLI-Kruppel class of zinc finger proteins. The protein is invo ...
), with one member of the dimer anchored to its binding motif on the enhancer and the other member anchored to its binding motif on the promoter (represented by the red zigzags in the illustration).
Several cell function specific transcription factor proteins (in 2018 Lambert et al. indicated there were about 1,600 transcription factors in a human cell
) generally bind to specific motifs on an enhancer
and a small combination of these enhancer-bound transcription factors, when brought close to a promoter by a DNA loop, govern the level of transcription of the target gene.
Mediator (coactivator)
Mediator is a multiprotein complex that functions as a transcriptional coactivator in all eukaryotes. It was discovered in 1990 in the lab of Roger D. Kornberg, recipient of the 2006 Nobel Prize in Chemistry. Mediator complexes interact with tr ...
(a complex usually consisting of about 26 proteins in an interacting structure) communicates regulatory signals from enhancer DNA-bound transcription factors directly to the RNA polymerase II (RNAP II) enzyme bound to the promoter.
Enhancers, when active, are generally transcribed from both strands of DNA with RNA polymerases acting in two different directions, producing two eRNAs as illustrated in the Figure.
An inactive enhancer may be bound by an inactive transcription factor. Phosphorylation of the
transcription factor
In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The fu ...
may activate it and that activated transcription factor may then activate the enhancer to which it is bound (see small red star representing phosphorylation of a transcription factor bound to an enhancer in the illustration).
An activated enhancer begins transcription of its RNA before activating a promoter to initiate transcription of messenger
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
from its target gene.
CpG island methylation and demethylation
5-Methylcytosine (5-mC) is a
methylated
In the chemical sciences, methylation denotes the addition of a methyl group on a substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replacing a hydrogen atom. These ...
form of the
DNA base
cytosine
Cytosine () ( symbol C or Cyt) is one of the four nucleobases found in DNA and RNA, along with adenine, guanine, and thymine (uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached (an am ...
(see figure). 5-mC is an
epigenetic marker found predominantly on cytosines within CpG dinucleotides, which consist of a cytosine is followed by a guanine reading in the 5′ to 3′ direction along the DNA strand (
CpG sites
The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG isl ...
). About 28 million CpG dinucleotides occur in the human genome.
In most tissues of mammals, on average, 70% to 80% of CpG cytosines are methylated (forming 5-methyl-CpG, or 5-mCpG).
Methylated cytosines within CpG sequences often occur in groups, called
CpG islands
The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG isl ...
. About 59% of promoter sequences have a CpG island while only about 6% of enhancer sequences have a CpG island.
CpG islands constitute regulatory sequences, since if CpG islands are methylated in the promoter of a gene this can reduce or silence gene expression.
DNA methylation regulates gene expression through interaction with methyl binding domain (MBD) proteins, such as MeCP2, MBD1 and MBD2. These MBD proteins bind most strongly to highly methylated
CpG islands
The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear sequence of bases along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG isl ...
.
These MBD proteins have both a methyl-CpG-binding domain and a transcriptional repression domain.
They bind to methylated DNA and guide or direct protein complexes with chromatin remodeling and/or histone modifying activity to methylated CpG islands. MBD proteins generally repress local chromatin by means such as catalyzing the introduction of repressive histone marks or creating an overall repressive chromatin environment through
nucleosome remodeling and chromatin reorganization.
Transcription factors
In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The func ...
are proteins that bind to specific DNA sequences in order to regulate the expression of a given gene. The binding sequence for a transcription factor in DNA is usually about 10 or 11 nucleotides long. There are approximately 1,400 different transcription factors encoded in the human genome and they constitute about 6% of all human protein coding genes.
About 94% of transcription factor binding sites that are associated with signal-responsive genes occur in enhancers while only about 6% of such sites occur in promoters.
EGR1
EGR-1 (Early growth response protein 1) also known as ZNF268 (zinc finger protein 268) or NGFI-A (nerve growth factor-induced protein A) is a protein that in humans is encoded by the ''EGR1'' gene.
EGR-1 is a mammalian transcription factor. It wa ...
is a transcription factor important for regulation of methylation of CpG islands. An EGR1 transcription factor binding site is frequently located in enhancer or promoter sequences.
There are about 12,000 binding sites for EGR1 in the mammalian genome and about half of EGR1 binding sites are located in promoters and half in enhancers.
The binding of EGR1 to its target DNA binding site is insensitive to cytosine methylation in the DNA.
While only small amounts of EGR1 protein are detectable in cells that are un-stimulated, EGR1 translation into protein at one hour after stimulation is markedly elevated.
Expression of EGR1 in various types of cells can be stimulated by growth factors, neurotransmitters, hormones, stress and injury.
In the brain, when neurons are activated, EGR1 proteins are upregulated, and they bind to (recruit) pre-existing TET1 enzymes, which are highly expressed in neurons.
TET enzymes
The TET enzymes are a family of ten-eleven translocation (TET) methylcytosine dioxygenases. They are instrumental in DNA demethylation. 5-Methylcytosine (see first Figure) is a methylated form of the DNA base cytosine (C) that often regulates ge ...
can catalyze demethylation of 5-methylcytosine. When EGR1 transcription factors bring TET1 enzymes to EGR1 binding sites in promoters, the TET enzymes can
demethylate Demethylating agents are chemical substances that can inhibit methylation, resulting in the expression of the previously hypermethylated silenced genes (see Methylation#Cancer for more detail). Cytidine analogs such as 5-azacytidine (azacitidine) ...
the methylated CpG islands at those promoters. Upon demethylation, these promoters can then initiate transcription of their target genes. Hundreds of genes in neurons are differentially expressed after neuron activation through EGR1 recruitment of TET1 to methylated regulatory sequences in their promoters.
Activation by double- or single-strand breaks
About 600 regulatory sequences in promoters and about 800 regulatory sequences in enhancers appear to depend on double-strand breaks initiated by
topoisomerase 2β (TOP2B) for activation.
The induction of particular double-strand breaks is specific with respect to the inducing signal. When neurons are activated ''in vitro'', just 22 TOP2B-induced double-strand breaks occur in their genomes.
However, when
contextual fear conditioning is carried out in a mouse, this conditioning causes hundreds of gene-associated DSBs in the medial prefrontal cortex and hippocampus, which are important for learning and memory.
Such TOP2B-induced double-strand breaks are accompanied by at least four enzymes of the
non-homologous end joining (NHEJ) DNA repair pathway (DNA-PKcs, KU70, KU80 and DNA LIGASE IV) (see figure). These enzymes repair the double-strand breaks within about 15 minutes to 2 hours.
The double-strand breaks in the promoter are thus associated with TOP2B and at least these four repair enzymes. These proteins are present simultaneously on a single promoter nucleosome (there are about 147 nucleotides in the DNA sequence wrapped around a single nucleosome) located near the transcription start site of their target gene.
The double-strand break introduced by TOP2B apparently frees the part of the promoter at an RNA polymerase–bound transcription start site to physically move to its associated enhancer. This allows the enhancer, with its bound transcription factors and mediator proteins, to directly interact with the RNA polymerase that had been paused at the transcription start site to start transcription.
[
Similarly, topoisomerase I (TOP1) enzymes appear to be located at many enhancers, and those enhancers become activated when TOP1 introduces a single-strand break.] TOP1 causes single-strand breaks in particular enhancer DNA regulatory sequences when signaled by a specific enhancer-binding transcription factor. Topoisomerase I breaks are associated with different DNA repair factors than those surrounding TOP2B breaks. In the case of TOP1, the breaks are associated most immediately with DNA repair enzymes MRE11
Double-strand break repair protein MRE11 is an enzyme that in humans is encoded by the ''MRE11'' gene. The gene has been designated ''MRE11A'' to distinguish it from the pseudogene ''MRE11B'' that is nowadays named ''MRE11P1''.
Function
This ge ...
, RAD50
DNA repair protein RAD50, also known as RAD50, is a protein that in humans is encoded by the ''RAD50'' gene.
Function
The protein encoded by this gene is highly similar to ''Saccharomyces cerevisiae'' Rad50, a protein involved in DNA double- ...
and ATR ATR may refer to:
Medicine
* Acute transfusion reaction
* Ataxia telangiectasia and Rad3 related, a protein involved in DNA damage repair
Science and mathematics
* Advanced Test Reactor, nuclear research reactor at the Idaho National Laboratory, ...
.
Examples
Genomes can be analyzed systematically to identify regulatory regions. Conserved non-coding sequence
A conserved non-coding sequence (CNS) is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production.
CNSs in plants and animals are highly associated with tra ...
s often contain regulatory regions, and so they are often the subject of these analyses.
* CAAT box
* CCAAT box
* Operator (biology)
In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splic ...
* Pribnow box
The Pribnow box (also known as the Pribnow-Schaller box) is a sequence of ''TATAAT'' of six nucleotides (thymine, adenine, thymine, etc.) that is an essential part of a promoter site on DNA for transcription to occur in bacteria.
It is an idea ...
* TATA box
In molecular biology, the TATA box (also called the Goldberg–Hogness box) is a sequence of DNA found in the core promoter region of genes in archaea and eukaryotes. The bacterial homolog of the TATA box is called the Pribnow box which has ...
* SECIS element
In biology, the SECIS element (SECIS: ''selenocysteine insertion sequence'') is an RNA element around 60 nucleotides in length that adopts a stem-loop structure. This structural motif (pattern of nucleotides) directs the cell to translate UGA ...
, mRNA
* Polyadenylation
Polyadenylation is the addition of a poly(A) tail to an RNA transcript, typically a messenger RNA (mRNA). The poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In euk ...
signal, mRNA
* A-box
A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the Gene expression, expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living ...
* Z-box
* C-box
* E-box An E-box (enhancer box) is a DNA response element found in some eukaryotes that acts as a protein-binding site and has been found to regulate gene expression in neurons, muscles, and other tissues. Its specific DNA sequence, CANNTG (where N can b ...
* G-box
Insulin gene
Regulatory sequences for the insulin gene
Insulin (, from Latin ''insula'', 'island') is a peptide hormone produced by beta cells of the pancreatic islets encoded in humans by the ''INS'' gene. It is considered to be the main anabolic hormone of the body. It regulates the metabolism o ...
are:
* A5
* Z
* negative regulatory element
A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and vi ...
(NRE)
* C2
* E2
* A3
* cAMP response element
CREB-TF (CREB, cAMP response element-binding protein) is a cellular transcription factor. It binds to certain DNA sequences called cAMP response elements (CRE), thereby increasing or decreasing the transcription of the genes. CREB was first des ...
* A2
* CAAT enhancer binding (CEB)
* C1
* E1
* G1
See also
* Regulator gene
A regulator gene, regulator, or regulatory gene is a gene involved in controlling the expression of one or more other genes. Regulatory sequences, which encode regulatory genes, are often at the five prime end (5') to the start site of transcrip ...
* Regulation of gene expression
Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products (protein or RNA). Sophisticated programs of gene expression are wide ...
* ''Cis''-acting element
* Gene regulatory network
A gene (or genetic) regulatory network (GRN) is a collection of molecular regulators that interact with each other and with other substances in the cell to govern the gene expression levels of mRNA and proteins which, in turn, determine the fun ...
* Open Regulatory Annotation Database
* Operon
In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splic ...
* DNA binding site
DNA binding sites are a type of binding site found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that (1) they are part of a DNA sequence (e.g. a genome) and (2) they are bound by DNA-binding ...
* Promoter
* ''Trans''-acting factor
* ORegAnno
References
External links
ORegAnno - Open Regulatory Annotation Database
ReMap - database of transcriptional regulators
{{Regulatory sequences
Gene expression