Conserved non-coding sequence
   HOME

TheInfoList



OR:

A conserved non-coding sequence (CNS) is a
DNA sequence DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
of
noncoding DNA Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regul ...
that is
evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
arily conserved. These sequences are of interest for their potential to regulate gene production. CNSs in plants and animals are highly associated with
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The fu ...
binding sites and other ''cis''-acting regulatory elements. Conserved non-coding sequences can be important sites of evolutionary divergence as mutations in these regions may alter the regulation of conserved genes, producing species-specific patterns of
gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. The ...
. These features have made them an invaluable resource in
comparative genomics Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural lan ...
.


Sources

All CNSs are likely to perform some function in order to have constraints on their evolution, but they can be distinguished based on where in the genome they are found and how they got there.


Introns

Introns An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e. a region inside a gene."The notion of the cistron .e., gene. ...
are stretches of sequence found mostly in
eukaryotic Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
organisms which interrupt the coding regions of genes, with basepair lengths varying across three orders of magnitude. Intron sequences may be conserved, often because they contain expression regulating elements that put functional constraints on their
evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
. Patterns of conserved introns between species of different
kingdom Kingdom commonly refers to: * A monarchy ruled by a king or queen * Kingdom (biology), a category in biological taxonomy Kingdom may also refer to: Arts and media Television * ''Kingdom'' (British TV series), a 2007 British television drama s ...
s have been used to make inferences about intron density at different points in evolutionary history. This makes them an important resource for understanding the dynamics of intron gain and loss in eukaryotes (1,28).


Untranslated regions

Some of the most highly conserved noncoding regions are found in the
untranslated regions In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR (or leader sequence), or if it is foun ...
(UTRs) at the 3' end of mature RNA transcripts, rather than in the introns. This suggests an important function operating at the post-transcriptional level. If these regions perform an important regulatory function, the increase in 3'-UTR length over evolutionary time suggests that conserved UTRs contribute to organism complexity. Regulatory motifs in UTRs often conserved in genes belonging to the same
metabolic Metabolism (, from el, μεταβολή ''metabolē'', "change") is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run cell ...
family could potentially be used to develop highly specific medicines that target RNA transcripts.


Transposable elements

Repetitive elements can accumulate in an organism's genome as the result of a few different transposition processes. The extent to which this has taken place during the evolution of eukaryotes varies greatly: repetitive DNA accounts for just 3% of the
fly Flies are insects of the Order (biology), order Diptera, the name being derived from the Ancient Greek, Greek δι- ''di-'' "two", and πτερόν ''pteron'' "wing". Insects of this order use only a single pair of wings to fly, the hindwing ...
genome, but accounts for 50% of the
human genome The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the n ...
. There are different theories explaining the conservation of
transposable element A transposable element (TE, transposon, or jumping gene) is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transp ...
s. One holds that, like
pseudogene Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by DNA duplication or indirectly by Reverse transcriptase, reverse transcription of an mRNA trans ...
s, they provide a source of new genetic material, allowing for faster
adaptation In biology, adaptation has three related meanings. Firstly, it is the dynamic evolutionary process of natural selection that fits organisms to their environment, enhancing their evolutionary fitness. Secondly, it is a state reached by the po ...
to changes in the environment. A simpler alternative is that, because eukaryotic genomes may have no means to prevent the proliferation of transposable elements, they are free to accumulate as long as they are not inserted into or near a gene in such a way that they would disrupt essential functions. A recent study showed that transposons contribute at least 16% of the
eutheria Eutheria (; from Greek , 'good, right' and , 'beast'; ) is the clade consisting of all therian mammals that are more closely related to placentals than to marsupials. Eutherians are distinguished from noneutherians by various phenotypic tra ...
n-specific CNSs, marking them as a "major creative force" in the evolution of
gene regulation Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products (protein or RNA). Sophisticated programs of gene expression are wide ...
in
mammal Mammals () are a group of vertebrate animals constituting the class Mammalia (), characterized by the presence of mammary glands which in females produce milk for feeding (nursing) their young, a neocortex (a region of the brain), fur or ...
s. There are three major classes of transposable elements, distinguished by the mechanisms by which they proliferate.


Classes

DNA transposons encode a
transposase A transposase is any of a class of enzymes capable of binding to the end of a transposon and catalysing its movement to another part of a genome, typically by a cut-and-paste mechanism or a replicative mechanism, in a process known as transposition ...
protein, which is flanked by
inverted repeat An inverted repeat (or IR) is a single stranded sequence of nucleotides followed downstream by its reverse complement. The intervening sequence of nucleotides between the initial sequence and the reverse complement can be any length including zero. ...
sequences. The transposase excises the sequence and reintegrates it elsewhere in the genome. By excising immediately following
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritanc ...
and inserting into target sites which have not yet been replicated, the number of transposons in the genome can increase.
Retrotransposon Retrotransposons (also called Class I transposable elements or transposons via RNA intermediates) are a type of genetic component that copy and paste themselves into different genomic locations (transposon) by converting RNA back into DNA through ...
s use
reverse transcriptase A reverse transcriptase (RT) is an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, ...
to generate a
cDNA In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a speci ...
from the TE transcript. These are further divided into
long terminal repeat A long terminal repeat (LTR) is a pair of identical sequences of DNA, several hundred base pairs long, which occur in eukaryotic genomes on either end of a series of genes or pseudogenes that form a retrotransposon or an endogenous retrovirus or ...
(LTR) retrotransposons, long interspersed nuclear elements (LINEs), and short interspersed nuclear elements (SINEs). In LTR retrotransposons, after the RNA template is degraded, a DNA strand complementary to the reverse-transcribed cDNA returns the element to a double-stranded state.
Integrase Retroviral integrase (IN) is an enzyme produced by a retrovirus (such as HIV) that integrates—forms covalent links between—its genetic information into that of the host cell it infects. Retroviral INs are not to be confused with phage int ...
, an enzyme encoded by the LTR retrotransposon, then reincorporates the element at a new target site. These elements are flanked by long terminal repeats (300–500bp) which mediate the transposition process. LINEs use a simpler method in which the cDNA is synthesized at the target site following cleavage by a LINE-encoded
endonuclease Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain. Some, such as deoxyribonuclease I, cut DNA relatively nonspecifically (without regard to sequence), while many, typically called restriction endonucleases ...
. LINE-encoded reverse transcriptase is not highly sequence-specific. The incorporation by LINE machinery of unrelated RNA transcripts gives rise to non-functional processed pseudogenes. If a small gene's promoter is included in the transcribed portion of the gene, the stable transcript can be duplicated and reinserted into the genome multiple times. The elements produced by this process are called SINEs.


Conserved regulatory transposable elements

When conserved regulatory transposable elements are active in a genome, they can introduce new promoter regions, disrupt existing regulatory sites, or, if inserted into transcribed regions, alter splicing patterns. A particular transposed element will be positively selected for if the altered expression it produces confers an adaptive advantage. This has resulted in some of the conserved regions found in humans. Nearly 25% of characterized promoters in humans contain transposed elements. This is of particular interest in light of the fact that most transposable elements in humans are no longer active.


Pseudogenes

Pseudogenes are vestiges of once-functional genes disabled by sequence deletions, insertions, or
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mi ...
s. The primary evidence for this process is the presence of fully functioning orthologues to these inactivated sequences in other related genomes. Pseudogenes commonly emerge following a
gene duplication Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. ...
or
polyploid Polyploidy is a condition in which the cells of an organism have more than one pair of ( homologous) chromosomes. Most species whose cells have nuclei ( eukaryotes) are diploid, meaning they have two sets of chromosomes, where each set contain ...
ization event. With two functional copies of a gene, there is no selective pressure to maintain expressibility of both, leaving one free to accumulate mutations as a nonfunctioning pseudogene. This is the typical case, whereby neutral selection allows pseudogenes to accumulate mutations, serving as "reservoirs" of new genetic material, with potential to be reincorporated into the genome. However, some pseudogenes have been found to be conserved in mammals.Cooper, DN. ''Human Gene Evolution''. Oxford: BIOS Scientific Publishers, Sept, 1988, p.265-292 The simplest explanation for this is that these noncoding regions may serve some biological function, and this has been found to be the case for several conserved pseudogenes. Makorin1 mRNA, for example, was found to be stabilized by its paralogous pseudogene, Makorin1-p1, which is conserved in several mouse species. Other pseudogenes have also been found to be conserved between humans and mice and between humans and
chimpanzee The chimpanzee (''Pan troglodytes''), also known as simply the chimp, is a species of great ape native to the forest and savannah of tropical Africa. It has four confirmed subspecies and a fifth proposed subspecies. When its close relative th ...
s, originating from duplication events prior to the divergence of the species. Evidence of these pseudogenes' transcription also supports the hypothesis that they have a biological function. Findings of potentially functional pseudogenes creates difficulty in defining them, since the term was originally meant for degenerate sequences with no biological function. An example of a pseudogene is the gene for
L-gulonolactone oxidase L-Gulonolactone oxidase ( ECbr>1.1.3.8 is an enzyme that produces vitamin C, but is non-functional in Haplorrhini (including humans), in some bats, and in guinea pigs. It catalyzes the reaction of L-gulono-1,4-lactone with oxygen to form L-xyl ...
, a liver enzyme necessary for biosynthesis of L-ascorbic acid (vitamin C) in most birds and mammals, but which is mutated in the
haplorrhini Haplorhini (), the haplorhines (Greek for "simple-nosed") or the "dry-nosed" primates, is a suborder of primates containing the tarsiers and the simians (Simiiformes or anthropoids), as sister of the Strepsirrhini ("moist-nosed"). The name is some ...
suborder of primates, including humans which require ascorbic acid or ascorbate from food. The remains of this non-functional gene with many mutations is still present in the genomes of guinea pigs and humans.


Ultraconserved regions

Ultraconserved regions (UCRs) are regions over 200 bp in length with 100% identity across species. These unique sequences are mostly found in noncoding regions. It is still not fully understood why the negative
selective pressure Any cause that reduces or increases reproductive success in a portion of a population potentially exerts evolutionary pressure, selective pressure or selection pressure, driving natural selection. It is a quantitative description of the amount of ...
on these regions is so much stronger than the selection in protein-coding regions. Though these regions can be seen as unique, the distinction between regions with a high degree of sequence conservation and those with perfect sequence conservation is not necessarily one of biological significance. One study in Science found that all extremely conserved noncoding sequences have important regulatory functions regardless of whether the conservation is perfect, making the distinction of ultraconservation appear somewhat arbitrary.


In comparative genomics

The conservation of both functional and nonfunctional noncoding regions provides an important tool for
comparative genomics Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural lan ...
, though conservation of cis-regulatory elements has proven particularly useful. The presence of CNSs could be due in some cases to a lack of divergence time, though the more common thinking is that they perform functions which place varying degrees of constraint on their evolution. Consistent with this theory, cis-regulatory elements are commonly found in conserved noncoding regions. Thus, sequence similarity is often used as a parameter to limit the search space when trying to identify regulatory elements conserved across species, though this is most useful in analyzing distantly related organisms, since closer relatives have sequence conservation among nonfunctional elements as well. Orthologues with high sequence similarity may not share the same regulatory elements. These differences may account for different expression patterns across species. Conservation of noncoding sequence is important for the analysis of paralogs within a single species as well. CNSs shared by paralogous clusters of
Hox gene Hox genes, a subset of homeobox genes, are a group of related genes that specify regions of the body plan of an embryo along the head-tail axis of animals. Hox proteins encode and specify the characteristics of 'position', ensuring that the cor ...
s are candidates for expression regulating regions, possibly coordinating the similar expression patterns of these genes. Comparative genomic studies of the promoter regions of orthologous genes can also detect differences in the presence and relative positioning of transcription factor binding sites in promoter regions. Orthologues with high sequence similarity may not share the same regulatory elements. These differences may account for different expression patterns across species. The regulatory functions commonly associated with conserved non-coding regions are thought to play a role in the evolution of eukaryotic complexity. On average, plants contain fewer CNSs per gene than mammals. This is thought to be related to their having undergone more polyploidization, or genome duplication events. During the subfunctionalization that ensues following gene duplication, there is potential for a greater rate of CNS loss per gene. Thus, genome duplication events may account for the fact that plants have more genes, each with fewer CNSs. Assuming the number of CNSs to be a proxy for regulatory complexity, this may account for the disparity in complexity between plants and mammals. Because changes in gene regulation are thought to account for most of the differences between humans and chimpanzees, researchers have looked to CNSs to try to show this. A portion of the CNSs between humans and other primates have an enrichment of human-specific
single-nucleotide polymorphism In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently lar ...
s, suggesting positive selection for these SNPs and accelerated evolution of those CNSs. Many of these SNPs are also associated with changes in gene expression, suggesting that these CNSs played an important role in
human evolution Human evolution is the evolutionary process within the history of primates that led to the emergence of ''Homo sapiens'' as a distinct species of the hominid family, which includes the great apes. This process involved the gradual development of ...
.


Online bioinformatic software


References

{{Reflist DNA Non-coding DNA