![Pseudogene defects](https://upload.wikimedia.org/wikipedia/commons/7/7c/Pseudogene_defects.png)
Pseudogenes are nonfunctional segments of
DNA that resemble functional
gene
In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
s. Most arise as superfluous copies of functional genes, either directly by DNA duplication or indirectly by
reverse transcription
A reverse transcriptase (RT) is an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes ...
of an
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.
mRNA is created during the ...
transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for
transcription or
translation
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
, or whose coding sequences are obviously defective due to
frameshifts or premature
stop codon
In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon ( nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in ...
s.
Most non-bacterial genomes contain many pseudogenes, often as many as functional genes. This is not surprising, since various biological processes are expected to accidentally create pseudogenes, and there are no specialized mechanisms to remove them from genomes. Eventually pseudogenes may be deleted from their genomes by chance
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inherita ...
or
DNA repair
DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as radiation can cause DNA da ...
errors, or they may accumulate so many
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, m ...
al changes that they are no longer recognizable as former genes. Analysis of these degeneration events helps clarify the effects of non-selective processes in genomes.
Pseudogene sequences may be transcribed into
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
at low levels, due to
promoter elements inherited from the ancestral gene or arising by new mutations. Although most of these transcripts will have no more functional significance than chance transcripts from other parts of the genome, some have given rise to beneficial regulatory RNAs and new proteins.
Properties
Pseudogenes are usually characterized by a combination of
homology to a known gene and loss of some functionality. That is, although every pseudogene has a
DNA sequence that is similar to some functional gene, they are usually unable to produce functional final protein products.
Pseudogenes are sometimes difficult to identify and characterize in genomes, because the two requirements of homology and loss of functionality are usually implied through sequence alignments rather than biologically proven.
#Homology is implied by sequence identity between the DNA sequences of the pseudogene and parent gene. After
aligning the two sequences, the percentage of identical
base pairs is computed. A high sequence identity means that it is highly likely that these two sequences diverged from a common ancestral sequence (are homologous), and highly unlikely that these two sequences have evolved independently (see
Convergent evolution
Convergent evolution is the independent evolution of similar features in species of different periods or epochs in time. Convergent evolution creates analogous structures that have similar form or function but were not present in the last com ...
).
#Nonfunctionality can manifest itself in many ways. Normally, a gene must go through several steps to a fully functional protein:
Transcription,
pre-mRNA processing,
translation
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
, and
protein folding
Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reprodu ...
are all required parts of this process. If any of these steps fails, then the sequence may be considered nonfunctional. In high-throughput pseudogene identification, the most commonly identified disablements are premature
stop codon
In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon ( nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in ...
s and
frameshifts, which almost universally prevent the translation of a functional protein product.
Pseudogenes for
RNA
Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
genes are usually more difficult to discover as they do not need to be translated and thus do not have "reading frames".
Pseudogenes can complicate molecular genetic studies. For example, amplification of a gene by
PCR PCR or pcr may refer to:
Science
* Phosphocreatine, a phosphorylated creatine molecule
* Principal component regression, a statistical technique
Medicine
* Polymerase chain reaction
** COVID-19 testing, often performed using the polymerase chain r ...
may simultaneously amplify a pseudogene that shares similar sequences. This is known as PCR bias or amplification bias. Similarly, pseudogenes are sometimes annotated as genes in
genome
In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
sequences.
Processed pseudogenes often pose a problem for
gene prediction programs, often being misidentified as real genes or exons. It has been proposed that identification of processed pseudogenes can help improve the accuracy of gene prediction methods.
Recently 140 human pseudogenes have been shown to be translated. However, the function, if any, of the protein products is unknown.
Types and origin
![Pseudo gene schematic](https://upload.wikimedia.org/wikipedia/commons/3/32/Pseudo_gene_schematic.png)
There are four main types of pseudogenes, all with distinct mechanisms of origin and characteristic features. The classifications of pseudogenes are as follows:
Processed
![Pseudogene2jpg](https://upload.wikimedia.org/wikipedia/commons/3/3f/Pseudogene2jpg.jpg)
In higher
eukaryote
Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bact ...
s, particularly
mammals,
retrotransposition is a fairly common event that has had a huge impact on the composition of the genome. For example, somewhere between 30 and 44% of the
human genome consists of repetitive elements such as
SINEs
Sines () is a city and a municipality in Portugal. The municipality, divided into two parishes, has around 14,214 inhabitants (2021) in an area of . Sines holds an important oil refinery and several petrochemical industries. It is also a popular ...
and
LINEs (see
retrotransposons).
In the process of retrotransposition, a portion of the
mRNA
In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein.
mRNA is created during the ...
or
hnRNA transcript of a gene is spontaneously
reverse transcribed back into DNA and inserted into chromosomal DNA. Although retrotransposons usually create copies of themselves, it has been shown in an ''in vitro'' system that they can create retrotransposed copies of random genes, too.
Once these pseudogenes are inserted back into the genome, they usually contain a
poly-A tail, and usually have had their introns
spliced out; these are both hallmark features of
cDNAs. However, because they are derived from an RNA product, processed pseudogenes also lack the upstream promoters of normal genes; thus, they are considered "dead on arrival", becoming non-functional pseudogenes immediately upon the retrotransposition event.
However, these insertions occasionally contribute exons to existing genes, usually via
alternatively spliced transcripts.
A further characteristic of processed pseudogenes is common truncation of the 5' end relative to the parent sequence, which is a result of the relatively non-processive retrotransposition mechanism that creates processed pseudogenes.
Processed pseudogenes are continually being created in primates. Human populations, for example, have distinct sets of processed pseudogenes across its individuals.
Non-processed
![Pseudogene3jpg](https://upload.wikimedia.org/wikipedia/commons/b/b2/Pseudogene3jpg.jpg)
Non-processed (or duplicated) pseudogenes.
Gene duplication is another common and important process in the evolution of genomes. A copy of a functional gene may arise as a result of a gene duplication event caused by
homologous recombination
Homologous recombination is a type of genetic recombination in which genetic information is exchanged between two similar or identical molecules of double-stranded or single-stranded nucleic acids (usually DNA as in cellular organisms but may be ...
at, for example, repetitive
sine sequences on misaligned chromosomes and subsequently acquire
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, m ...
s that cause the copy to lose the original gene's function. Duplicated pseudogenes usually have all the same characteristics as genes, including an intact
exon-
intron
An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e. a region inside a gene."The notion of the cistron .e., gene ...
structure and regulatory sequences. The loss of a duplicated gene's functionality usually has little effect on an organism's
fitness, since an intact functional copy still exists. According to some evolutionary models, shared duplicated pseudogenes indicate the evolutionary relatedness of humans and the other primates.
If pseudogenization is due to gene duplication, it usually occurs in the first few million years after the gene duplication, provided the gene has not been subjected to any
selection pressure
Any cause that reduces or increases reproductive success in a portion of a population potentially exerts evolutionary pressure, selective pressure or selection pressure, driving natural selection. It is a quantitative description of the amount of ...
.
Gene duplication generates functional
redundancy and it is not normally advantageous to carry two identical genes. Mutations that disrupt either the structure or the function of either of the two genes are not deleterious and will not be removed through the selection process. As a result, the gene that has been mutated gradually becomes a pseudogene and will be either unexpressed or functionless. This kind of evolutionary fate is shown by population
genetic modeling
and also by
genome analysis.
According to evolutionary context, these pseudogenes will either be deleted or become so distinct from the parental genes so that they will no longer be identifiable. Relatively young pseudogenes can be recognized due to their sequence similarity.
Unitary pseudogenes
![Pseudogene4jpg](https://upload.wikimedia.org/wikipedia/commons/8/8d/Pseudogene4jpg.jpg)
Various mutations (such as
indels and
nonsense mutations) can prevent a gene from being normally
transcribed or
translated, and thus the gene may become less- or non-functional or "deactivated". These are the same mechanisms by which non-processed genes become pseudogenes, but the difference in this case is that the gene was not duplicated before pseudogenization. Normally, such a pseudogene would be unlikely to become fixed in a population, but various population effects, such as
genetic drift
Genetic drift, also known as allelic drift or the Wright effect, is the change in the frequency of an existing gene variant (allele) in a population due to random chance.
Genetic drift may cause gene variants to disappear completely and there ...
, a
population bottleneck
A population bottleneck or genetic bottleneck is a sharp reduction in the size of a population
Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Go ...
, or, in some cases,
natural selection
Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the heritable traits characteristic of a population over generations. Cha ...
, can lead to fixation. The classic example of a unitary pseudogene is the gene that presumably coded the enzyme
L-gulono-γ-lactone oxidase (GULO) in primates. In all mammals studied besides primates (except guinea pigs), GULO aids in the biosynthesis of
ascorbic acid
Vitamin C (also known as ascorbic acid and ascorbate) is a water-soluble vitamin found in citrus and other fruits and vegetables, also sold as a dietary supplement and as a topical 'serum' ingredient to treat melasma (dark pigment spots) a ...
(vitamin C), but it exists as a disabled gene (GULOP) in humans and other primates.
Another more recent example of a disabled gene links the deactivation of the
caspase 12 gene (through a
nonsense mutation) to positive selection in humans.
It has been shown that processed pseudogenes accumulate mutations faster than non-processed pseudogenes.
Pseudo-pseudogenes
![Drosophila melanogaster - side (aka)](https://upload.wikimedia.org/wikipedia/commons/4/4c/Drosophila_melanogaster_-_side_%28aka%29.jpg)
The rapid proliferation of
DNA sequencing technologies has led to the identification of many apparent pseudogenes using
gene prediction techniques. Pseudogenes are often identified by the appearance of a premature
stop codon
In molecular biology (specifically protein biosynthesis), a stop codon (or termination codon) is a codon ( nucleotide triplet within messenger RNA) that signals the termination of the translation process of the current protein. Most codons in ...
in a predicted mRNA sequence, which would, in theory, prevent synthesis (
translation
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
) of the normal
protein
Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respon ...
product of the original gene. There have been some reports of
translational readthrough of such premature stop codons in mammals. As alluded to in the figure above, a small amount of the protein product of such readthrough may still be recognizable and function at some level. If so, the pseudogene can be subject to
natural selection
Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the heritable traits characteristic of a population over generations. Cha ...
. That appears to have happened during the evolution of ''
Drosophila
''Drosophila'' () is a genus of flies, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or (less frequently) pomace flies, vinegar flies, or wine flies, a reference to the characteristic of many s ...
''
species
In biology, a species is the basic unit of Taxonomy (biology), classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of ...
.
In 2016 it was reported that four predicted pseudogenes in multiple ''Drosophila'' species actually encode proteins with biologically important functions,
"suggesting that such 'pseudo-pseudogenes' could represent a widespread phenomenon". For example, the functional protein (an
olfactory receptor) is found only in
neurons
A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa. ...
. This finding of tissue-specific biologically-functional genes that could have been classified as pseudogenes by ''
in silico'' analysis complicates the analysis of sequence data. In the
human genome, a number of examples have been identified that were originally classified as pseudogenes but later discovered to have a functional, although not necessarily protein-coding, role.
As of 2012, it appeared that there are approximately 12,000–14,000 pseudogenes in the human genome.
A 2016
proteogenomics analysis using
mass spectrometry of peptides identified at least 19,262 human proteins produced from 16,271 genes or clusters of genes, with 8 new protein-coding genes identified that were previously considered pseudogenes.
Examples of pseudogene function
While the vast majority of pseudogenes have lost their function, some cases have emerged in which a pseudogene either re-gained its original or a similar function or evolved a new function. Examples include the following:
''Drosophila'' glutamate receptor. The term "pseudo-pseudogene" was coined for the gene encoding the chemosensory
ionotropic glutamate receptor Ir75a of ''
Drosophila sechellia'', which bears a premature termination codon (PTC) and was thus classified as a pseudogene. However, ''in vivo'' the ''D. sechellia'' Ir75a locus produces a functional receptor, owing to translational read-through of the PTC. Read-through is detected only in neurons and depends on the nucleotide sequence downstream of the PTC.
siRNAs. Some endogenous
siRNAs appear to be derived from pseudogenes, and thus some pseudogenes play a role in regulating protein-coding transcripts, as reviewed. One of the many examples is psiPPM1K. Processing of RNAs transcribed from psiPPM1K yield siRNAs that can act to suppress the most common type of liver cancer,
hepatocellular carcinoma. This and much other research has led to considerable excitement about the possibility of targeting pseudogenes with/as therapeutic agents
piRNAs. Some
piRNAs are derived from pseudogenes located in piRNA clusters. Those piRNAs regulate genes via the piRNA pathway in mammalian testes and are crucial for limiting
transposable element damage to the genome.
![BrafDFPjpg](https://upload.wikimedia.org/wikipedia/commons/1/16/BrafDFPjpg.jpg)
microRNAs. There are many reports of pseudogene transcripts acting as
microRNA
MicroRNA (miRNA) are small, single-stranded, non-coding RNA molecules containing 21 to 23 nucleotides. Found in plants, animals and some viruses, miRNAs are involved in RNA silencing and post-transcriptional regulation of gene expression. m ...
decoys. Perhaps the earliest definitive example of such a pseudogene involved in cancer is the pseudogene of
BRAF. The BRAF gene is a
proto-oncogene that, when mutated, is associated with many cancers. Normally, the amount of BRAF protein is kept under control in cells through the action of miRNA. In normal situations, the amount of RNA from BRAF and the pseudogene BRAFP1 compete for miRNA, but the balance of the 2 RNAs is such that cells grow normally. However, when BRAFP1 RNA expression is increased (either experimentally or by natural mutations), less miRNA is available to control the expression of BRAF, and the increased amount of BRAF protein causes cancer. This sort of competition for regulatory elements by RNAs that are endogenous to the genome has given rise to the term
ceRNA.
PTEN. The
PTEN gene is a known
tumor suppressor gene
A tumor suppressor gene (TSG), or anti-oncogene, is a gene that regulates a cell during cell division and replication. If the cell grows uncontrollably, it will result in cancer. When a tumor suppressor gene is mutated, it results in a loss or re ...
. The PTEN pseudogene, PTENP1 is a processed pseudogene that is very similar in its genetic sequence to the wild-type gene. However, PTENP1 has a missense mutation which eliminates the
codon for the
initiating methionine and thus prevents translation of the normal PTEN protein. In spite of that, PTENP1 appears to play a role in
oncogenesis. The 3'
UTR of PTENP1 mRNA functions as a decoy of PTEN mRNA by targeting
micro RNAs due to its similarity to the PTEN gene, and overexpression of the 3' UTR resulted in an increase of PTEN protein level. That is, overexpression of the PTENP1 3' UTR leads to increased regulation and suppression of cancerous tumors. The biology of this system is basically the inverse of the BRAF system described above.
Potogenes. Pseudogenes can, over evolutionary time scales, participate in
gene conversion and other mutational events that may give rise to new or newly functional genes. This has led to the concept that ''pseudo''genes could be viewed as ''pot''ogenes: ''pot''ential genes for evolutionary diversification.
Misidentified pseudogenes
Sometimes genes are thought to be pseudogenes, usually based on bioinformatic analysis, but then turn out to be functional genes. Examples include the ''Drosophila'' jingwei gene which encodes a functional
alcohol dehydrogenase enzyme ''in vivo''.
Another example is the human gene encoding phosphoglycerate mutase which was thought to be a pseudogene but which turned out to be a functional gene,
now named . Mutations in it cause infertility.
Bacterial pseudogenes
Pseudogenes are found in
bacteria
Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were am ...
. Most are found in bacteria that are not free-living; that is, they are either
symbiont
Symbiosis (from Greek , , "living together", from , , "together", and , bíōsis, "living") is any type of a close and long-term biological interaction between two different biological organisms, be it mutualistic, commensalistic, or paras ...
s or
obligate intracellular parasites. Thus, they do not require many genes that are needed by free-living bacteria, such as gene associated with metabolism and DNA repair. However, there is not an order to which functional
gene
In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
s are lost first. For example, the oldest pseudogenes in ''
Mycobacterium leprae'' are in
RNA polymerase
In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that synthesizes RNA from a DNA template.
Using the enzyme helicase, RNAP locally opens the ...
s and the
biosynthesis of
secondary metabolites while the oldest ones in ''
Shigella flexneri
''Shigella flexneri'' is a species of Gram-negative bacteria in the genus '' Shigella'' that can cause diarrhea in humans. Several different serogroups of ''Shigella'' are described; ''S. flexneri'' belongs to group ''B''. ''S. flexneri'' infe ...
'' and ''
Shigella typhi'' are in
DNA replication
In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inherita ...
, recombination, and
repair.
Since most bacteria that carry pseudogenes are either symbionts or obligate intracellular parasites, genome size eventually reduces. An extreme example is the genome of ''
Mycobacterium leprae'', an obligate parasite and the causative agent of
leprosy. It has been reported to have 1,133 pseudogenes which give rise to approximately 50% of its
transcriptome.
The effect of pseudogenes and genome reduction can be further seen when compared to ''
Mycobacterium marinum'', a
pathogen
In biology, a pathogen ( el, πάθος, "suffering", "passion" and , "producer of") in the oldest and broadest sense, is any organism or agent that can produce disease. A pathogen may also be referred to as an infectious agent, or simply a ger ...
from the same family. ''Mycobacteirum marinum'' has a larger genome compared to ''Mycobacterium leprae'' because it can survive outside the host; therefore, the genome must contain the genes needed to do so.
Although genome reduction focuses on what genes are not needed by getting rid of pseudogenes, selective pressures from the host can sway what is kept. In the case of a symbiont from the ''
Verrucomicrobiota'' phylum, there are seven additional copies of the gene coding the mandelalide pathway.
The host, species from ''Lissoclinum'', use mandelalides as part of its defense mechanism.
The relationship between
epistasis and the domino theory of gene loss was observed in ''Buchnera aphidicola''. The domino theory suggests that if one gene of a cellular process becomes inactivated, then selection in other genes involved relaxes, leading to gene loss.
When comparing ''
Buchnera aphidicola'' and ''
Escherichia coli
''Escherichia coli'' (),Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. also known as ''E. coli'' (), is a Gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus '' Esc ...
,'' it was found that positive epistasis furthers gene loss while negative epistasis hinders it.
See also
*
List of disabled human pseudogenes
*
Molecular evolution
*
Molecular paleontology
*
Pseudogene (database)
Pseudogene is a database of pseudogenes annotations compiled from various sources.
See also
* Gene prediction
* Glossary of genetics
A glossary (from grc, γλῶσσα, ''glossa''; language, speech, wording) also known as a vocabulary or cl ...
*
Retroposon
*
Retrotransposon
References
Further reading
*
*
*
External links
Pseudogene interaction database, miRNA-pseudogene and protein-pseudogene interaction maps databaseYale University pseudogene database(homologous processed pseudogenes)
RCPedia - Processed Pseudogene database
{{Repeated sequence
Non-coding DNA