The exome is composed of all of the
exons
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
within the
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
, the sequences which, when transcribed, remain within the mature
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
after
introns
An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e. a region inside a gene."The notion of the cistron .e., gene. ...
are removed by
RNA splicing
RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). It works by removing all the introns (non-coding regions of RNA) and ''splicing'' b ...
. This includes
untranslated region
In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR (or leader sequence), or if it is foun ...
s of
messenger RNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.
mRNA is created during the p ...
(mRNA), and
coding region
The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to no ...
s.
Exome sequencing
Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome (known as the exome). It consists of two steps: the first step is to select only the subse ...
has proven to be an efficient method of determining the genetic basis of more than two dozen
Mendelian
Mendelian inheritance (also known as Mendelism) is a type of biology, biological Heredity, inheritance following the principles originally proposed by Gregor Mendel in 1865 and 1866, re-discovered in 1900 by Hugo de Vries and Carl Correns, an ...
or
single gene disorders.
Statistics
The human exome consists of roughly 233,785
exons
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequence ...
, about 80% of which are less than 200
base pairs
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
in length, constituting a total of about 1.1% of the total
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
, or about 30 megabases of
DNA.
Though composing a very small fraction of the
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
,
mutations
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mi ...
in the exome are thought to harbor 85% of
mutations
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mi ...
that have a large effect on disease.
Definition
It is important to note that the exome is distinct from the
transcriptome
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
, which is all of the transcribed RNA within a cell type. While the exome is constant from cell-type to cell-type, the
transcriptome
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
changes based on the structure and function of the cells. As a result, the entirety of the exome is not
translated
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
into protein in every cell. Different cell types only
transcribe
Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including:
Genetics
* Transcription (biology), the copying of DNA into RNA, the fir ...
portions of the exome, and only the
coding regions
The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non ...
of the exons are eventually translated into proteins.
Next-generation sequencing
Next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation s ...
(next-gen sequencing) allows for the rapid sequencing of large amounts of DNA, significantly advancing the study of genetics, and replacing older methods such as
Sanger sequencing
Sanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederi ...
. This technology is starting to become more common in healthcare and research not only because it is a reliable method of determining genetic variations, but also because it is cost effective and allows researchers to sequence entire genomes in anywhere between days to weeks. This compares to former methods which may have taken months. Next-gen sequencing includes both
whole-exome sequencing and
whole-genome sequencing
Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a ...
.
Whole-exome sequencing
Sequencing an individual's exome instead of their entire genome has been proposed to be a more cost-effective and efficient way to diagnose rare
genetic disorders. It has also been found to be more effective than other methods such as
karyotyping
A karyotype is the general appearance of the complete set of metaphase chromosomes in the cells of a species or in an individual organism, mainly including their sizes, numbers, and shapes. Karyotyping is the process by which a karyotype is disce ...
and
microarrays
A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silicon t ...
. This distinction is largely due to the fact that phenotypes of genetic disorders are a result of mutated exons. In addition, since the exome only comprises 1.5% of the total genome, this process is more cost efficient and fast as it involves sequencing around 40 million bases rather than the 3 billion base pairs that make up the genome.
Whole-genome sequencing
On the other hand,
whole genome sequencing
Whole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a s ...
has been found to capture a more comprehensive view of variants in the DNA compared to
whole-exome sequencing. Especially for
single nucleotide variants, whole genome sequencing is more powerful and more sensitive than whole-exome sequencing in detecting potentially disease-causing mutations within the exome. One must also keep in mind that
non-coding regions can be involved in the regulation of the exons that make up the exome, and so whole-exome sequencing may not be complete in showing all the sequences at play in forming the exome.
Ethical considerations
With either form of
sequencing
In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succ ...
, whole-exome sequencing or whole genome sequencing, some have argued that such practices should be done under the consideration of medical ethics. While physicians strive to preserve patient autonomy, sequencing deliberately asks laboratories to look at
genetic variants that may be completely unrelated to the patient's condition at hand and have the potential of revealing findings that were not intentionally sought. In addition, such testing have been suggested to have imply forms of discrimination against particular groups for having certain genes, creating the potential for stigmas or negative attitudes towards that group as a result.
Diseases and diagnoses
Rare mutations that affect the function of essential proteins constitute the majority of
Mendelian diseases
A genetic disorder is a health problem caused by one or more abnormalities in the genome. It can be caused by a mutation in a single gene (monogenic) or multiple genes (polygenic) or by a chromosomal abnormality. Although polygenic disorders ...
. In addition, the overwhelming majority of disease-causing mutations in
Mendelian loci can be found within the coding region.
With the goal of finding methods to best detect harmful mutations and successfully diagnose patients, researchers are looking to the exome for clues to aid in this process.
Whole-exome sequencing is a recent technology that has led to the discovery of various genetic disorders and increased the rate of diagnoses of patients with rare genetic disorders. Overall, whole-exome sequencing has allowed healthcare providers to diagnose 30–50% of patients who were thought to have rare Mendelian disorders. It has been suggested that whole-exome sequencing in clinical settings has many unexplored advantages. Not only can the exome increase our understanding of genetic patterns, but under clinical settings, it has the potential to the change in management of patients with rare and previously unknown disorders, allowing physicians to develop more targeted and personalized interventions.
For example,
Bartter Syndrome
Bartter syndrome (BS) is a rare inherited disease characterised by a defect in the thick ascending limb of the loop of Henle, which results in low potassium levels (hypokalemia), increased blood pH (alkalosis), and normal to low blood pressure. Th ...
, also known as salt-wasting nephropathy, is a hereditary disease of the kidney characterized by
hypotension
Hypotension is low blood pressure. Blood pressure is the force of blood pushing against the walls of the arteries as the heart pumps out blood. Blood pressure is indicated by two numbers, the systolic blood pressure (the top number) and the dias ...
(low blood pressure),
hypokalemia
Hypokalemia is a low level of potassium (K+) in the blood serum. Mild low potassium does not typically cause symptoms. Symptoms may include feeling tired, leg cramps, weakness, and constipation. Low potassium also increases the risk of an abno ...
(low potassium), and
alkalosis
Alkalosis is the result of a process reducing hydrogen ion concentration of arterial blood plasma (alkalemia). In contrast to acidemia (serum pH 7.35 or lower), alkalemia occurs when the serum pH is higher than normal (7.45 or higher). Alkalosis i ...
(high blood pH) leading to muscle fatigue and varying levels of fatality. It is an example of a rare disease, affecting fewer than one per million people, whose patients have been positively impacted by whole-exome sequencing. Thanks to this method, patients who formerly did not exhibit the classical mutations associated with Bartter Syndrome were formally diagnosed with it after the discovery that the disease has mutations outside of the loci of interest.
They were thus able to gain more targeted and productive treatment for the disease.
Much of the focus of exome sequencing in the context of disease diagnosis has been on protein coding "loss of function" alleles. Research has shown, however, that future advances that allow the study of non-coding regions, within and without the exome, may lead to additional abilities in the diagnoses of rare Mendelian disorders.
The exome is the part of the
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
composed of
exon
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequen ...
s, the sequences which, when transcribed, remain within the mature
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
after
introns
An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e. a region inside a gene."The notion of the cistron .e., gene. ...
are removed by
RNA splicing
RNA splicing is a process in molecular biology where a newly-made precursor messenger RNA (pre-mRNA) transcript is transformed into a mature messenger RNA (mRNA). It works by removing all the introns (non-coding regions of RNA) and ''splicing'' b ...
and contribute to the final protein product encoded by that gene. It consists of all DNA that is transcribed into mature RNA in cells of any type, as distinct from the
transcriptome
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
, which is the RNA that has been transcribed only in a specific cell population. The exome of the
human genome
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the n ...
consists of roughly 180,000
exon
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequen ...
s constituting about 1% of the total
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
, or about 30 megabases of
DNA. Though composing a very small fraction of the
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
,
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mi ...
s in the exome are thought to harbor 85% of
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mi ...
s that have a large effect on disease.
Exome sequencing
Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome (known as the exome). It consists of two steps: the first step is to select only the subse ...
has proved to be an efficient strategy to determine the genetic basis of more than two dozen
Mendelian
Mendelian inheritance (also known as Mendelism) is a type of biology, biological Heredity, inheritance following the principles originally proposed by Gregor Mendel in 1865 and 1866, re-discovered in 1900 by Hugo de Vries and Carl Correns, an ...
or
single gene disorder
A genetic disorder is a health problem caused by one or more abnormalities in the genome. It can be caused by a mutation in a single gene (monogenic) or multiple genes (polygenic) or by a chromosomal abnormality. Although polygenic disorders ...
s.
See also
*
Coding strand
When referring to DNA transcription, the coding strand (or informational strand) is the DNA strand whose base sequence is identical to the base sequence of the RNA transcript produced (although with thymine replaced by uracil). It is this strand ...
*
Exome sequencing
Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome (known as the exome). It consists of two steps: the first step is to select only the subse ...
*
Gene structure
Gene structure is the organisation of specialised sequence elements within a gene. Genes contain most of the information necessary for living cells to survive and reproduce. In most organisms, genes are made of DNA, where the particular DNA sequen ...
*
Non-coding DNA
Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regul ...
*
Non-coding RNA
A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non-c ...
*
Transcriptome
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
*
Transcriptomics
Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. He ...
References
{{reflist
Human genetics
Genetics