Pseudogenes are nonfunctional segments of
DNA that resemble functional
gene
In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
s. Most arise as superfluous copies of functional genes, either directly by DNA duplication or indirectly by
reverse transcription of an
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein.
mRNA is ...
transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for
transcription
Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including:
Genetics
* Transcription (biology), the copying of DNA into RNA, the fir ...
or
translation
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
, or whose coding sequences are obviously defective due to
frameshifts or premature
stop codon
In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in me ...
s.
Most non-bacterial genomes contain many pseudogenes, often as many as functional genes. This is not surprising, since various biological processes are expected to accidentally create pseudogenes, and there are no specialized mechanisms to remove them from genomes. Eventually pseudogenes may be deleted from their genomes by chance
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritanc ...
or
DNA repair
DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as radiation can cause DNA dam ...
errors, or they may accumulate so many
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mi ...
al changes that they are no longer recognizable as former genes. Analysis of these degeneration events helps clarify the effects of non-selective processes in genomes.
Pseudogene sequences may be transcribed into
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
at low levels, due to
promoter elements inherited from the ancestral gene or arising by new mutations. Although most of these transcripts will have no more functional significance than chance transcripts from other parts of the genome, some have given rise to beneficial regulatory RNAs and new proteins.
Properties
Pseudogenes are usually characterized by a combination of
homology
Homology may refer to:
Sciences
Biology
*Homology (biology), any characteristic of biological organisms that is derived from a common ancestor
* Sequence homology, biological homology between DNA, RNA, or protein sequences
*Homologous chrom ...
to a known gene and loss of some functionality. That is, although every pseudogene has a
DNA sequence that is similar to some functional gene, they are usually unable to produce functional final protein products.
Pseudogenes are sometimes difficult to identify and characterize in genomes, because the two requirements of homology and loss of functionality are usually implied through sequence alignments rather than biologically proven.
#Homology is implied by sequence identity between the DNA sequences of the pseudogene and parent gene. After
aligning the two sequences, the percentage of identical
base pair
A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
s is computed. A high sequence identity means that it is highly likely that these two sequences diverged from a common ancestral sequence (are homologous), and highly unlikely that these two sequences have evolved independently (see
Convergent evolution
Convergent evolution is the independent evolution of similar features in species of different periods or epochs in time. Convergent evolution creates analogous structures that have similar form or function but were not present in the last com ...
).
#Nonfunctionality can manifest itself in many ways. Normally, a gene must go through several steps to a fully functional protein:
Transcription
Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including:
Genetics
* Transcription (biology), the copying of DNA into RNA, the fir ...
,
pre-mRNA processing
Transcriptional modification or co-transcriptional modification is a set of biological processes common to most eukaryotic cells by which an RNA primary transcript is chemically altered following transcription from a gene to produce a mature, fun ...
,
translation
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
, and
protein folding
Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reproduci ...
are all required parts of this process. If any of these steps fails, then the sequence may be considered nonfunctional. In high-throughput pseudogene identification, the most commonly identified disablements are premature
stop codon
In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in me ...
s and
frameshifts, which almost universally prevent the translation of a functional protein product.
Pseudogenes for
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
genes are usually more difficult to discover as they do not need to be translated and thus do not have "reading frames".
Pseudogenes can complicate molecular genetic studies. For example, amplification of a gene by
PCR may simultaneously amplify a pseudogene that shares similar sequences. This is known as PCR bias or amplification bias. Similarly, pseudogenes are sometimes annotated as genes in
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
sequences.
Processed pseudogenes often pose a problem for
gene prediction
In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functiona ...
programs, often being misidentified as real genes or exons. It has been proposed that identification of processed pseudogenes can help improve the accuracy of gene prediction methods.
Recently 140 human pseudogenes have been shown to be translated. However, the function, if any, of the protein products is unknown.
Types and origin
There are four main types of pseudogenes, all with distinct mechanisms of origin and characteristic features. The classifications of pseudogenes are as follows:
Processed
In higher
eukaryote
Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
s, particularly
mammal
Mammals () are a group of vertebrate animals constituting the class Mammalia (), characterized by the presence of mammary glands which in females produce milk for feeding (nursing) their young, a neocortex (a region of the brain), fur or ...
s,
retrotransposition
A transposable element (TE, transposon, or jumping gene) is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Trans ...
is a fairly common event that has had a huge impact on the composition of the genome. For example, somewhere between 30 and 44% of the
human genome
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the n ...
consists of repetitive elements such as
SINEs
Sines () is a city and a municipality in Portugal. The municipality, divided into two parishes, has around 14,214 inhabitants (2021) in an area of . Sines holds an important oil refinery and several petrochemical industries. It is also a popular ...
and
LINEs
Line most often refers to:
* Line (geometry), object with zero thickness and curvature that stretches to infinity
* Telephone line, a single-user circuit on a telephone communication system
Line, lines, The Line, or LINE may also refer to:
Arts ...
(see
retrotransposons
Retrotransposons (also called Class I transposable elements or transposons via RNA intermediates) are a type of genetic component that copy and paste themselves into different genomic locations (transposon) by converting RNA back into DNA through ...
).
In the process of retrotransposition, a portion of the
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein.
mRNA is ...
or
hnRNA
A primary transcript is the single-stranded ribonucleic acid ( RNA) product synthesized by transcription of DNA, and processed to yield various mature RNA products such as mRNAs, tRNAs, and rRNAs. The primary transcripts designated to be mRNAs ...
transcript of a gene is spontaneously
reverse transcribed back into DNA and inserted into chromosomal DNA. Although retrotransposons usually create copies of themselves, it has been shown in an ''in vitro'' system that they can create retrotransposed copies of random genes, too.
Once these pseudogenes are inserted back into the genome, they usually contain a
poly-A tail
Polyadenylation is the addition of a poly(A) tail to an RNA transcript, typically a messenger RNA (mRNA). The poly(A) tail consists of multiple adenosine monophosphates; in other words, it is a stretch of RNA that has only adenine bases. In euka ...
, and usually have had their introns
spliced out; these are both hallmark features of
cDNA
In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a speci ...
s. However, because they are derived from an RNA product, processed pseudogenes also lack the upstream promoters of normal genes; thus, they are considered "dead on arrival", becoming non-functional pseudogenes immediately upon the retrotransposition event.
However, these insertions occasionally contribute exons to existing genes, usually via
alternatively spliced
Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to code for multiple proteins. In this process, particular exons of a gene may be ...
transcripts.
A further characteristic of processed pseudogenes is common truncation of the 5' end relative to the parent sequence, which is a result of the relatively non-processive retrotransposition mechanism that creates processed pseudogenes.
Processed pseudogenes are continually being created in primates. Human populations, for example, have distinct sets of processed pseudogenes across its individuals.
Non-processed
Non-processed (or duplicated) pseudogenes.
Gene duplication
Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. ...
is another common and important process in the evolution of genomes. A copy of a functional gene may arise as a result of a gene duplication event caused by
homologous recombination
Homologous recombination is a type of genetic recombination in which genetic information is exchanged between two similar or identical molecules of double-stranded or single-stranded nucleic acids (usually DNA as in cellular organisms but may ...
at, for example, repetitive
sine
In mathematics, sine and cosine are trigonometric functions of an angle. The sine and cosine of an acute angle are defined in the context of a right triangle: for the specified angle, its sine is the ratio of the length of the side that is oppo ...
sequences on misaligned chromosomes and subsequently acquire
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, mi ...
s that cause the copy to lose the original gene's function. Duplicated pseudogenes usually have all the same characteristics as genes, including an intact
exon
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequen ...
-
intron
An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e. a region inside a gene."The notion of the cistron .e., gene. ...
structure and regulatory sequences. The loss of a duplicated gene's functionality usually has little effect on an organism's
fitness, since an intact functional copy still exists. According to some evolutionary models, shared duplicated pseudogenes indicate the evolutionary relatedness of humans and the other primates.
If pseudogenization is due to gene duplication, it usually occurs in the first few million years after the gene duplication, provided the gene has not been subjected to any
selection pressure
Any cause that reduces or increases reproductive success in a portion of a population potentially exerts evolutionary pressure, selective pressure or selection pressure, driving natural selection. It is a quantitative description of the amount of ...
.
Gene duplication generates functional
redundancy and it is not normally advantageous to carry two identical genes. Mutations that disrupt either the structure or the function of either of the two genes are not deleterious and will not be removed through the selection process. As a result, the gene that has been mutated gradually becomes a pseudogene and will be either unexpressed or functionless. This kind of evolutionary fate is shown by population
genetic modeling
and also by
genome analysis.
According to evolutionary context, these pseudogenes will either be deleted or become so distinct from the parental genes so that they will no longer be identifiable. Relatively young pseudogenes can be recognized due to their sequence similarity.
Unitary pseudogenes
Various mutations (such as
indel
Indel is a molecular biology term for an insertion or deletion of bases in the genome of an organism. It is classified among small genetic variations, measuring from 1 to 10 000 base pairs in length, including insertion and deletion events that ...
s and
nonsense mutation
In genetics, a nonsense mutation is a point mutation in a sequence of DNA that results in a premature stop codon, or a ''nonsense codon'' in the transcribed mRNA, and in leading to a truncated, incomplete, and usually nonfunctional protein produc ...
s) can prevent a gene from being normally
transcribed or
translated
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
, and thus the gene may become less- or non-functional or "deactivated". These are the same mechanisms by which non-processed genes become pseudogenes, but the difference in this case is that the gene was not duplicated before pseudogenization. Normally, such a pseudogene would be unlikely to become fixed in a population, but various population effects, such as
genetic drift
Genetic drift, also known as allelic drift or the Wright effect, is the change in the frequency of an existing gene variant (allele) in a population due to random chance.
Genetic drift may cause gene variants to disappear completely and there ...
, a
population bottleneck
A population bottleneck or genetic bottleneck is a sharp reduction in the size of a population due to environmental events such as famines, earthquakes, floods, fires, disease, and droughts; or human activities such as specicide, widespread violen ...
, or, in some cases,
natural selection
Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the heritable traits characteristic of a population over generations. Charle ...
, can lead to fixation. The classic example of a unitary pseudogene is the gene that presumably coded the enzyme
L-gulono-γ-lactone oxidase (GULO) in primates. In all mammals studied besides primates (except guinea pigs), GULO aids in the biosynthesis of
ascorbic acid
Vitamin C (also known as ascorbic acid and ascorbate) is a water-soluble vitamin found in citrus and other fruits and vegetables, also sold as a dietary supplement and as a topical 'serum' ingredient to treat melasma (dark pigment spots) an ...
(vitamin C), but it exists as a disabled gene (GULOP) in humans and other primates.
Another more recent example of a disabled gene links the deactivation of the
caspase 12
Caspase 12 is a protein that in humans is encoded by the ''CASP12'' gene. The protein belongs to a family of enzymes called caspases which cleave their Substrate (biochemistry), substrates at C-terminal aspartic acid residues. It is closely rela ...
gene (through a
nonsense mutation
In genetics, a nonsense mutation is a point mutation in a sequence of DNA that results in a premature stop codon, or a ''nonsense codon'' in the transcribed mRNA, and in leading to a truncated, incomplete, and usually nonfunctional protein produc ...
) to positive selection in humans.
It has been shown that processed pseudogenes accumulate mutations faster than non-processed pseudogenes.
Pseudo-pseudogenes
The rapid proliferation of
DNA sequencing
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
technologies has led to the identification of many apparent pseudogenes using
gene prediction
In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functiona ...
techniques. Pseudogenes are often identified by the appearance of a premature
stop codon
In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon (nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in me ...
in a predicted mRNA sequence, which would, in theory, prevent synthesis (
translation
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
) of the normal
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
product of the original gene. There have been some reports of
translational readthrough of such premature stop codons in mammals. As alluded to in the figure above, a small amount of the protein product of such readthrough may still be recognizable and function at some level. If so, the pseudogene can be subject to
natural selection
Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the heritable traits characteristic of a population over generations. Charle ...
. That appears to have happened during the evolution of ''
Drosophila
''Drosophila'' () is a genus of flies, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or (less frequently) pomace flies, vinegar flies, or wine flies, a reference to the characteristic of many species ...
''
species
In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriate s ...
.
In 2016 it was reported that four predicted pseudogenes in multiple ''Drosophila'' species actually encode proteins with biologically important functions,
"suggesting that such 'pseudo-pseudogenes' could represent a widespread phenomenon". For example, the functional protein (an
olfactory receptor
Olfactory receptors (ORs), also known as odorant receptors, are chemoreceptors expressed in the cell membranes of olfactory receptor neurons and are responsible for the detection of odorants (for example, compounds that have an odor) which give ri ...
) is found only in
neurons
A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa. N ...
. This finding of tissue-specific biologically-functional genes that could have been classified as pseudogenes by ''
in silico
In biology and other experimental sciences, an ''in silico'' experiment is one performed on computer or via computer simulation. The phrase is pseudo-Latin for 'in silicon' (correct la, in silicio), referring to silicon in computer chips. It ...
'' analysis complicates the analysis of sequence data. In the
human genome
The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the n ...
, a number of examples have been identified that were originally classified as pseudogenes but later discovered to have a functional, although not necessarily protein-coding, role.
As of 2012, it appeared that there are approximately 12,000–14,000 pseudogenes in the human genome.
A 2016
proteogenomics
Proteogenomics is a field of biological research that utilizes a combination of proteomics, genomics, and transcriptomics to aid in the discovery and identification of peptides. Proteogenomics is used to identify new peptides by comparing MS/MS sp ...
analysis using
mass spectrometry
Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is use ...
of peptides identified at least 19,262 human proteins produced from 16,271 genes or clusters of genes, with 8 new protein-coding genes identified that were previously considered pseudogenes.
Examples of pseudogene function
While the vast majority of pseudogenes have lost their function, some cases have emerged in which a pseudogene either re-gained its original or a similar function or evolved a new function. Examples include the following:
''Drosophila'' glutamate receptor. The term "pseudo-pseudogene" was coined for the gene encoding the chemosensory
ionotropic glutamate receptor
Ionotropic glutamate receptors (iGluRs) are ligand-gated ion channels that are activated by the neurotransmitter glutamate. They mediate the majority of excitatory synaptic transmission throughout the central nervous system and are key players in ...
Ir75a of ''
Drosophila sechellia
''Drosophila sechellia'' is a species of fruit fly, used in lab studies of speciation because it can mate with ''Drosophila simulans''.
''Drosophila sechellia'' is endemic to (some of) the Seychelles, and was one of 12 fruit fly genomes sequenc ...
'', which bears a premature termination codon (PTC) and was thus classified as a pseudogene. However, ''in vivo'' the ''D. sechellia'' Ir75a locus produces a functional receptor, owing to translational read-through of the PTC. Read-through is detected only in neurons and depends on the nucleotide sequence downstream of the PTC.
siRNAs. Some endogenous
siRNA
Small interfering RNA (siRNA), sometimes known as short interfering RNA or silencing RNA, is a class of double-stranded RNA at first non-coding RNA molecules, typically 20-24 (normally 21) base pairs in length, similar to miRNA, and operating wi ...
s appear to be derived from pseudogenes, and thus some pseudogenes play a role in regulating protein-coding transcripts, as reviewed. One of the many examples is psiPPM1K. Processing of RNAs transcribed from psiPPM1K yield siRNAs that can act to suppress the most common type of liver cancer,
hepatocellular carcinoma
Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer in adults and is currently the most common cause of death in people with cirrhosis. HCC is the third leading cause of cancer-related deaths worldwide.
It occurs in t ...
. This and much other research has led to considerable excitement about the possibility of targeting pseudogenes with/as therapeutic agents
piRNAs. Some
piRNAs are derived from pseudogenes located in piRNA clusters. Those piRNAs regulate genes via the piRNA pathway in mammalian testes and are crucial for limiting
transposable element
A transposable element (TE, transposon, or jumping gene) is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transp ...
damage to the genome.
microRNAs. There are many reports of pseudogene transcripts acting as
microRNA
MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. miRN ...
decoys. Perhaps the earliest definitive example of such a pseudogene involved in cancer is the pseudogene of
BRAF. The BRAF gene is a
proto-oncogene
An oncogene is a gene that has the potential to cause cancer. In tumor cells, these genes are often mutated, or expressed at high levels. that, when mutated, is associated with many cancers. Normally, the amount of BRAF protein is kept under control in cells through the action of miRNA. In normal situations, the amount of RNA from BRAF and the pseudogene BRAFP1 compete for miRNA, but the balance of the 2 RNAs is such that cells grow normally. However, when BRAFP1 RNA expression is increased (either experimentally or by natural mutations), less miRNA is available to control the expression of BRAF, and the increased amount of BRAF protein causes cancer. This sort of competition for regulatory elements by RNAs that are endogenous to the genome has given rise to the term
ceRNA.
PTEN. The
PTEN gene is a known
tumor suppressor gene
A tumor suppressor gene (TSG), or anti-oncogene, is a gene that regulates a cell during cell division and replication. If the cell grows uncontrollably, it will result in cancer. When a tumor suppressor gene is mutated, it results in a loss or red ...
. The PTEN pseudogene, PTENP1 is a processed pseudogene that is very similar in its genetic sequence to the wild-type gene. However, PTENP1 has a missense mutation which eliminates the
codon
The genetic code is the set of rules used by living cells to translate information encoded within genetic material ( DNA or RNA sequences of nucleotide triplets, or codons) into proteins. Translation is accomplished by the ribosome, which links ...
for the
initiating methionine and thus prevents translation of the normal PTEN protein. In spite of that, PTENP1 appears to play a role in
oncogenesis
Carcinogenesis, also called oncogenesis or tumorigenesis, is the formation of a cancer, whereby normal cells are transformed into cancer cells. The process is characterized by changes at the cellular, genetic, and epigenetic levels and abnor ...
. The 3'
UTR of PTENP1 mRNA functions as a decoy of PTEN mRNA by targeting
micro RNA
MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. miRN ...
s due to its similarity to the PTEN gene, and overexpression of the 3' UTR resulted in an increase of PTEN protein level. That is, overexpression of the PTENP1 3' UTR leads to increased regulation and suppression of cancerous tumors. The biology of this system is basically the inverse of the BRAF system described above.
Potogenes. Pseudogenes can, over evolutionary time scales, participate in
gene conversion
Gene conversion is the process by which one DNA sequence replaces a homologous sequence such that the sequences become identical after the conversion event. Gene conversion can be either allelic, meaning that one allele of the same gene replaces a ...
and other mutational events that may give rise to new or newly functional genes. This has led to the concept that ''pseudo''genes could be viewed as ''pot''ogenes: ''pot''ential genes for evolutionary diversification.
Misidentified pseudogenes
Sometimes genes are thought to be pseudogenes, usually based on bioinformatic analysis, but then turn out to be functional genes. Examples include the ''Drosophila'' jingwei gene which encodes a functional
alcohol dehydrogenase
Alcohol dehydrogenases (ADH) () are a group of dehydrogenase enzymes that occur in many organisms and facilitate the interconversion between alcohols and aldehydes or ketones with the reduction of nicotinamide adenine dinucleotide (NAD+) to N ...
enzyme ''in vivo''.
Another example is the human gene encoding phosphoglycerate mutase which was thought to be a pseudogene but which turned out to be a functional gene,
now named . Mutations in it cause infertility.
Bacterial pseudogenes
Pseudogenes are found in
bacteria
Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
. Most are found in bacteria that are not free-living; that is, they are either
symbiont
Symbiosis (from Greek , , "living together", from , , "together", and , bíōsis, "living") is any type of a close and long-term biological interaction between two different biological organisms, be it mutualistic, commensalistic, or parasit ...
s or
obligate intracellular parasite
Intracellular parasites are microparasites that are capable of growing and reproducing inside the cells of a host.
Types of parasites
There are two main types of intracellular parasites: Facultative and Obligate.
Facultative intracellular pa ...
s. Thus, they do not require many genes that are needed by free-living bacteria, such as gene associated with metabolism and DNA repair. However, there is not an order to which functional
gene
In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
s are lost first. For example, the oldest pseudogenes in ''
Mycobacterium leprae
''Mycobacterium leprae'' (also known as the leprosy bacillus or Hansen's bacillus), is one
of the two species of bacteria that cause Hansen’s disease (leprosy), a chronic but curable infectious disease that damages the peripheral nerves and ...
'' are in
RNA polymerase
In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that synthesizes RNA from a DNA template.
Using the enzyme helicase, RNAP locally opens the ...
s and the
biosynthesis
Biosynthesis is a multi-step, enzyme-catalyzed process where substrates are converted into more complex products in living organisms. In biosynthesis, simple compounds are modified, converted into other compounds, or joined to form macromolecules. ...
of
secondary metabolite
Secondary metabolites, also called specialised metabolites, toxins, secondary products, or natural products, are organic compounds produced by any lifeform, e.g. bacteria, fungi, animals, or plants, which are not directly involved in the norm ...
s while the oldest ones in ''
Shigella flexneri
''Shigella flexneri'' is a species of Gram-negative bacteria in the genus ''Shigella'' that can cause diarrhea in humans. Several different serogroups of ''Shigella'' are described; ''S. flexneri'' belongs to group ''B''. ''S. flexneri'' infecti ...
'' and ''
Shigella typhi'' are in
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritanc ...
, recombination, and
repair
The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of necessary devices, equipment, machinery, building infrastructure, and supporting utilities in industrial, business, and residential installa ...
.
Since most bacteria that carry pseudogenes are either symbionts or obligate intracellular parasites, genome size eventually reduces. An extreme example is the genome of ''
Mycobacterium leprae
''Mycobacterium leprae'' (also known as the leprosy bacillus or Hansen's bacillus), is one
of the two species of bacteria that cause Hansen’s disease (leprosy), a chronic but curable infectious disease that damages the peripheral nerves and ...
'', an obligate parasite and the causative agent of
leprosy
Leprosy, also known as Hansen's disease (HD), is a long-term infection by the bacteria ''Mycobacterium leprae'' or ''Mycobacterium lepromatosis''. Infection can lead to damage of the nerves, respiratory tract, skin, and eyes. This nerve damag ...
. It has been reported to have 1,133 pseudogenes which give rise to approximately 50% of its
transcriptome
The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
.
The effect of pseudogenes and genome reduction can be further seen when compared to ''
Mycobacterium marinum
''Mycobacterium marinum'' is a slow growing mycobacterium (SGM) belonging to the genus ''Mycobacterium'' and the phylum Actinobacteria. The strain marinum was first identified by Aronson in 1926 and it is observed as a pathogenic mycobacterium. F ...
'', a
pathogen
In biology, a pathogen ( el, πάθος, "suffering", "passion" and , "producer of") in the oldest and broadest sense, is any organism or agent that can produce disease. A pathogen may also be referred to as an infectious agent, or simply a germ ...
from the same family. ''Mycobacteirum marinum'' has a larger genome compared to ''Mycobacterium leprae'' because it can survive outside the host; therefore, the genome must contain the genes needed to do so.
Although genome reduction focuses on what genes are not needed by getting rid of pseudogenes, selective pressures from the host can sway what is kept. In the case of a symbiont from the ''
Verrucomicrobiota
Verrucomicrobiota is a phylum of Gram-negative bacteria that contains only a few described species. The species identified have been isolated from fresh water, marine and soil environments and human faeces. A number of as-yet uncultivated species ...
'' phylum, there are seven additional copies of the gene coding the mandelalide pathway.
The host, species from ''Lissoclinum'', use mandelalides as part of its defense mechanism.
The relationship between
epistasis
Epistasis is a phenomenon in genetics in which the effect of a gene mutation is dependent on the presence or absence of mutations in one or more other genes, respectively termed modifier genes. In other words, the effect of the mutation is dep ...
and the domino theory of gene loss was observed in ''Buchnera aphidicola''. The domino theory suggests that if one gene of a cellular process becomes inactivated, then selection in other genes involved relaxes, leading to gene loss.
When comparing ''
Buchnera aphidicola
''Buchnera aphidicola'', a member of the Pseudomonadota and the only species in the genus ''Buchnera'', is the primary endosymbiont of aphids, and has been studied in the pea aphid, ''Acyrthosiphon pisum''. ''Buchnera'' is believed to have had ...
'' and ''
Escherichia coli
''Escherichia coli'' (),Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. also known as ''E. coli'' (), is a Gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus ''Escher ...
,'' it was found that positive epistasis furthers gene loss while negative epistasis hinders it.
See also
*
List of disabled human pseudogenes
This is a list of human pseudogenes that are known to be disabled genes.
* WNT3A pseudogene, associated with the growth of a tail
* NCF1C pseudogene, associated with a type of white blood cell. It makes part of the neutrophil NADPH oxidase enz ...
*
Molecular evolution
Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics ...
*
Molecular paleontology
Molecular paleontology refers to the recovery and analysis of DNA, proteins, carbohydrates, or lipids, and their diagenetic products from ancient human, animal, and plant remains. The field of molecular paleontology has yielded important insights ...
*
Pseudogene (database)
*
Retroposon
Retroposons are repetitive DNA fragments which are inserted into chromosomes after they had been reverse transcribed from any RNA molecule.
Difference between retroposons and retrotransposons
In contrast to retrotransposons, retroposons never ...
*
Retrotransposon
Retrotransposons (also called Class I transposable elements or transposons via RNA intermediates) are a type of genetic component that copy and paste themselves into different genomic locations (transposon) by converting RNA back into DNA through ...
References
Further reading
*
*
*
External links
Pseudogene interaction database, miRNA-pseudogene and protein-pseudogene interaction maps databaseYale University pseudogene database(homologous processed pseudogenes)
RCPedia - Processed Pseudogene database
{{Repeated sequence
Non-coding DNA