HOME

TheInfoList



OR:

The coding region of a
gene In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
, also known as the coding sequence (CDS), is the portion of a gene's DNA or
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
that codes for
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of
prokaryote A prokaryote () is a single-celled organism that lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek πρό (, 'before') and κάρυον (, 'nut' or 'kernel').Campbell, N. "Biology:Concepts & Conne ...
s and eukaryotes. This can further assist in mapping the
human genome The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the ...
and developing gene therapy.


Definition

Although this term is also sometimes used interchangeably with exon, it is not the exact same thing: the exon is composed of the coding region as well as the 3' and 5'
untranslated region In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR (or leader sequence), or if it is foun ...
s of the RNA, and so therefore, an exon would be partially made up of coding regions. The 3' and 5'
untranslated region In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR (or leader sequence), or if it is foun ...
s of the RNA, which do not code for protein, are termed
non-coding Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regula ...
regions and are not discussed on this page. There is often confusion between coding regions and
exome The exome is composed of all of the exons within the genome, the sequences which, when transcribed, remain within the mature RNA after introns are removed by RNA splicing. This includes untranslated regions of messenger RNA (mRNA), and coding re ...
s and there is a clear distinction between these terms. While the
exome The exome is composed of all of the exons within the genome, the sequences which, when transcribed, remain within the mature RNA after introns are removed by RNA splicing. This includes untranslated regions of messenger RNA (mRNA), and coding re ...
refers to all exons within a genome, the coding region refers to a singular section of the DNA or RNA which specifically codes for a certain kind of protein.  


History

In 1978,
Walter Gilbert Walter Gilbert (born March 21, 1932) is an American biochemist, physicist, molecular biology pioneer, and Nobel laureate. Education and early life Walter Gilbert was born in Boston, Massachusetts, on March 21, 1932, the son of Emma (Cohen), a c ...
published "Why Genes in Pieces" which first began to explore the idea that the gene is a mosaic—that each full nucleic acid strand is not coded continuously but is interrupted by "silent" non-coding regions. This was the first indication that there needed to be a distinction between the parts of the genome that code for protein, now called coding regions, and those that do not.


Composition

The evidence suggests that there is a general interdependence between base composition patterns and coding region availability. The coding region is thought to contain a higher GC-content than non-coding regions. There is further research that discovered that the longer the coding strand, the higher the GC-content. Short coding strands are comparatively still GC-poor, similar to the low GC-content of the base composition translational
stop codon In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in mess ...
s like TAG, TAA, and TGA. GC-rich areas are also where the ratio
point mutation A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequence ...
type is altered slightly: there are more transitions, which are changes from purine to purine or pyrimidine to pyrimidine, compared to
transversion Transversion, in molecular biology, refers to a point mutation in DNA in which a single (two ring) purine ( A or G) is changed for a (one ring) pyrimidine ( T or C), or vice versa. A transversion can be spontaneous, or it can be caused by i ...
s, which are changes from purine to pyrimidine or pyrimidine to purine. The transitions are less likely to change the encoded amino acid and remain a
silent mutation Silent mutations are mutations in DNA that do not have an observable effect on the organism's phenotype. They are a specific type of neutral mutation. The phrase ''silent mutation'' is often used interchangeably with the phrase '' synonymous muta ...
(especially if they occur in the third
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...
of a codon) which is usually beneficial to the organism during translation and protein formation. This indicates that essential coding regions (gene-rich) are higher in GC-content and more stable and resistant to
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA replication, DNA or viral repl ...
compared to accessory and non-essential regions (gene-poor). However, it is still unclear whether this came about through neutral and random mutation or through a pattern of
selection Selection may refer to: Science * Selection (biology), also called natural selection, selection in evolution ** Sex selection, in genetics ** Mate selection, in mating ** Sexual selection in humans, in human sexuality ** Human mating strateg ...
. There is also debate on whether the methods used, such as gene windows, to ascertain the relationship between GC-content and coding region are accurate and unbiased.


Structure and function

In DNA, the coding region is flanked by the promoter sequence on the 5' end of the
template strand Transcription is the process of copying a segment of DNA into RNA. The segments of DNA transcribed into RNA molecules that can encode proteins are said to produce messenger RNA (mRNA). Other segments of DNA are copied into RNA molecules called ...
and the termination sequence on the 3' end. During
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, the fir ...
, the RNA Polymerase (RNAP) binds to the promoter sequence and moves along the template strand to the coding region. RNAP then adds RNA
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...
s complementary to the coding region in order to form the
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...
, substituting
uracil Uracil () (symbol U or Ura) is one of the four nucleobases in the nucleic acid RNA. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced b ...
in place of
thymine Thymine () ( symbol T or Thy) is one of the four nucleobases in the nucleic acid of DNA that are represented by the letters G–C–A–T. The others are adenine, guanine, and cytosine. Thymine is also known as 5-methyluracil, a pyrimidi ...
.Overview of transcription. (n.d.). Retrieved from https://www.khanacademy.org/science/biology/gene-expression-central-dogma/transcription-of-dna-into-rna/a/overview-of-transcription. This continues until the RNAP reaches the termination sequence. After transcription and maturation, the
mature mRNA Mature messenger RNA, often abbreviated as mature mRNA is a eukaryotic RNA transcript that has been spliced and processed and is ready for translation in the course of protein synthesis. Unlike the eukaryotic RNA immediately after transcription ...
formed encompasses multiple parts important for its eventual translation into
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
. The coding region in an mRNA is flanked by the 5' untranslated region (5'-UTR) and
3' untranslated region In molecular genetics, the three prime untranslated region (3′-UTR) is the section of messenger RNA (mRNA) that immediately follows the translation termination codon. The 3′-UTR often contains regulatory regions that post-transcriptionally ...
(3'-UTR), the 5' cap, and
Poly-A tail Polyadenylation is the addition of a poly(A) tail to an RNA transcript, typically a messenger RNA (mRNA). The poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In euka ...
. During
translation Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
, the ribosome facilitates the attachment of the
tRNAs Transfer RNA (abbreviated tRNA and formerly referred to as sRNA, for soluble RNA) is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes), that serves as the physical link between the mRNA and the amino a ...
to the coding region, 3 nucleotides at a time ( codons). The tRNAs transfer their associated
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha a ...
s to the growing polypeptide chain, eventually forming the protein defined in the initial DNA coding region.


Regulation

The coding region can be modified in order to regulate gene expression.
Alkylation Alkylation is the transfer of an alkyl group from one molecule to another. The alkyl group may be transferred as an alkyl carbocation, a free radical, a carbanion, or a carbene (or their equivalents). Alkylating agents are reagents for effecti ...
is one form of regulation of the coding region. The gene that would have been transcribed can be silenced by targeting a specific sequence. The bases in this sequence would be blocked using alkyl groups, which create the silencing effect. While the regulation of gene expression manages the abundance of RNA or protein made in a cell, the regulation of these mechanisms can be controlled by a
regulatory sequence A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and v ...
found before the
open reading frame In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible readin ...
begins in a strand of DNA. The
regulatory sequence A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and v ...
will then determine the location and time that expression will occur for a protein coding region. RNA splicing ultimately determines what part of the sequence becomes translated and expressed, and this process involves cutting out introns and putting together exons. Where the RNA
spliceosome A spliceosome is a large ribonucleoprotein (RNP) complex found primarily within the nucleus of eukaryotic cells. The spliceosome is assembled from small nuclear RNAs ( snRNA) and numerous proteins. Small nuclear RNA (snRNA) molecules bind to specif ...
cuts, however, is guided by the recognition of
splice site RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). It works by removing all the introns (non-coding regions of RNA) and ''splicing'' ba ...
s, in particular the 5' splicing site, which is one of the substrates for the first step in splicing. The coding regions are within the exons, which become covalently joined together to form the mature messenger RNA.


Mutations

Mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA replication, DNA or viral repl ...
s in the coding region can have very diverse effects on the phenotype of the organism. While some mutations in this region of DNA/RNA can result in advantageous changes, others can be harmful and sometimes even lethal to an organism's survival. In contrast, changes in the coding region may not always result in detectable changes in phenotype.


Mutation types

There are various forms of mutations that can occur in coding regions. One form is
silent mutation Silent mutations are mutations in DNA that do not have an observable effect on the organism's phenotype. They are a specific type of neutral mutation. The phrase ''silent mutation'' is often used interchangeably with the phrase '' synonymous muta ...
s, in which a change in nucleotides does not result in any change in amino acid after transcription and translation.Yang, J. (2016, March 23). What are Genetic Mutation? Retrieved from https://www.singerinstruments.com/resource/what-are-genetic-mutation/. There also exist
nonsense mutation In genetics, a nonsense mutation is a point mutation in a sequence of DNA that results in a premature stop codon, or a ''nonsense codon'' in the transcribed mRNA, and in leading to a truncated, incomplete, and usually nonfunctional protein produc ...
s, where base alterations in the coding region code for a premature stop codon, producing a shorter final protein.
Point mutations A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequence ...
, or single base pair changes in the coding region, that code for different amino acids during translation, are called
missense mutation In genetics, a missense mutation is a point mutation in which a single nucleotide change results in a codon that codes for a different amino acid. It is a type of nonsynonymous substitution. Substitution of protein from DNA mutations Missense m ...
s. Other types of mutations include frameshift mutations such as insertions or deletions.


Formation

Some forms of mutations are hereditary (
germline mutation A germline mutation, or germinal mutation, is any detectable variation within germ cells (cells that, when fully developed, become sperm and ova). Mutations in these cells are the only mutations that can be passed on to offspring, when either a m ...
s), or passed on from a parent to its offspring.What is a gene mutation and how do mutations occur? - Genetics Home Reference - NIH. (n.d.). Retrieved from https://ghr.nlm.nih.gov/primer/mutationsanddisorders/genemutation. Such mutated coding regions are present in all cells within the organism. Other forms of mutations are acquired (
somatic mutation A somatic mutation is a change in the DNA sequence of a somatic cell of a multicellular organism with dedicated reproductive cells; that is, any mutation that occurs in a cell other than a gamete, germ cell, or gametocyte. Unlike germline mutatio ...
s) during an organisms lifetime, and may not be constant cell-to-cell. These changes can be caused by mutagens,
carcinogen A carcinogen is any substance, radionuclide, or radiation that promotes carcinogenesis (the formation of cancer). This may be due to the ability to damage the genome or to the disruption of cellular metabolic processes. Several radioactive subs ...
s, or other environmental agents (ex. UV). Acquired mutations can also be a result of copy-errors during
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritanc ...
and are not passed down to offspring. Changes in the coding region can also be de novo (new); such changes are thought to occur shortly after
fertilization Fertilisation or fertilization (see spelling differences), also known as generative fertilisation, syngamy and impregnation, is the fusion of gametes to give rise to a new individual organism or offspring and initiate its development. Proce ...
, resulting in a mutation present in the offspring's DNA while being absent in both the sperm and egg cells.


Prevention

There exist multiple transcription and translation mechanisms to prevent lethality due to deleterious mutations in the coding region. Such measures include proofreading by some
DNA Polymerases A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create ...
during replication,
mismatch repair DNA mismatch repair (MMR) is a system for recognizing and repairing erroneous insertion, deletion, and mis-incorporation of bases that can arise during DNA replication and recombination, as well as repairing some forms of DNA damage. Mismatch ...
following replication, and the '
Wobble Hypothesis A wobble base pair is a pairing between two nucleotides in RNA molecules that does not follow Watson-Crick base pair rules. The four main wobble base pairs are guanine-uracil (G-U), hypoxanthine-uracil (I-U), hypoxanthine-adenine (I-A), and hypo ...
' which describes the degeneracy of the third base within an mRNA codon.


Constrained coding regions (CCRs)

While it is well known that the genome of one individual can have extensive differences when compared to the genome of another, recent research has found that some coding regions are highly constrained, or resistant to mutation, between individuals of the same species. This is similar to the concept of interspecies constraint in conserved sequences. Researchers termed these highly constrained sequences constrained coding regions (CCRs), and have also discovered that such regions may be involved in high
purifying selection In natural selection, negative selection or purifying selection is the selective removal of alleles that are deleterious. This can result in stabilising selection through the purging of deleterious genetic polymorphisms that arise through random ...
. On average, there is approximately 1 protein-altering mutation every 7 coding bases, but some CCRs can have over 100 bases in sequence with no observed protein-altering mutations, some without even synonymous mutations.Havrilla, J. M., Pedersen, B. S., Layer, R. M., & Quinlan, A. R. (2018). A map of constrained coding regions in the human genome. ''Nature Genetics'', 88–95. doi: 10.1101/220814 These patterns of constraint between genomes may provide clues to the sources of rare developmental diseases or potentially even embryonic lethality. Clinically validated variants and
de novo mutation A de novo mutation is any mutation/alteration in the genome of any organism (humans, animals, plant, microbes, etc.) that wasn't present or transmitted by their parents. This type of mutation (like any other) occurs spontaneously during the process ...
s in CCRs have been previously linked to disorders such as infantile epileptic encephalopathy, developmental delay and severe heart disease.


Coding sequence detection

While identification of
open reading frames In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible readin ...
within a DNA sequence is straightforward, identifying coding sequences is not, because the cell translates only a subset of all open reading frames to proteins. Currently CDS prediction uses sampling and sequencing of mRNA from cells, although there is still the problem of determining which parts of a given mRNA are actually translated to protein. CDS prediction is a subset of
gene prediction In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functiona ...
, the latter also including prediction of DNA sequences that code not only for protein but also for other functional elements such as RNA genes and regulatory sequences. In both
prokaryote A prokaryote () is a single-celled organism that lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek πρό (, 'before') and κάρυον (, 'nut' or 'kernel').Campbell, N. "Biology:Concepts & Conne ...
s and eukaryotes, gene overlapping occurs relatively often in both DNA and RNA viruses as an evolutionary advantage to reduce genome size while retaining the ability to produce various proteins from the available coding regions. For both DNA and RNA, pairwise alignments can detect overlapping coding regions, including short
open reading frame In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible readin ...
s in viruses, but would require a known coding strand to compare the potential overlapping coding strand with. An alternative method using single genome sequences would not require multiple genome sequences to execute comparisons but would require at least 50 nucleotides overlapping in order to be sensitive.


See also

*
Coding strand When referring to DNA transcription, the coding strand (or informational strand) is the DNA strand whose base sequence is identical to the base sequence of the RNA transcript produced (although with thymine replaced by uracil). It is this stran ...
The DNA strand that codes for a protein * Exon The entire portion of the strand that is transcribed *
Mature mRNA Mature messenger RNA, often abbreviated as mature mRNA is a eukaryotic RNA transcript that has been spliced and processed and is ready for translation in the course of protein synthesis. Unlike the eukaryotic RNA immediately after transcription ...
The portion of the mRNA transcription product that is translated *
Gene structure Gene structure is the organisation of specialised sequence elements within a gene. Genes contain most of the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequenc ...
The other elements that make up a gene * Nested gene Entire coding sequence lies within the bounds of a larger external gene *
Non-coding DNA Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regula ...
Parts of genomes that do not encode protein-coding genes *
Non-coding RNA A non-coding RNA (ncRNA) is a functional RNA molecule that is not Translation (genetics), translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally im ...
Molecules that do not encode proteins, so have no CDS


References

{{Reflist DNA Biochemistry