Pseudogenes are nonfunctional segments of DNA that resemble functional

gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...

s. Most arise as superfluous copies of functional genes, either directly by DNA duplication or indirectly by reverse transcription of an

mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of synthesizing a protein. mRNA is created during the ...

transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or

translation Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...

, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Most non-bacterial genomes contain many pseudogenes, often as many as functional genes. This is not surprising, since various biological processes are expected to accidentally create pseudogenes, and there are no specialized mechanisms to remove them from genomes. Eventually pseudogenes may be deleted from their genomes by chance

DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritan ...

DNA repair DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as radiation can cause DNA d ...

errors, or they may accumulate so many

mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, m ...

al changes that they are no longer recognizable as former genes. Analysis of these degeneration events helps clarify the effects of non-selective processes in genomes. Pseudogene sequences may be transcribed into RNA at low levels, due to promoter elements inherited from the ancestral gene or arising by new mutations. Although most of these transcripts will have no more functional significance than chance transcripts from other parts of the genome, some have given rise to beneficial regulatory RNAs and new proteins.

Properties

Pseudogenes are usually characterized by a combination of homology to a known gene and loss of some functionality. That is, although every pseudogene has a DNA sequence that is similar to some functional gene, they are usually unable to produce functional final protein products. Pseudogenes are sometimes difficult to identify and characterize in genomes, because the two requirements of homology and loss of functionality are usually implied through sequence alignments rather than biologically proven. #Homology is implied by sequence identity between the DNA sequences of the pseudogene and parent gene. After aligning the two sequences, the percentage of identical base pairs is computed. A high sequence identity means that it is highly likely that these two sequences diverged from a common ancestral sequence (are homologous), and highly unlikely that these two sequences have evolved independently (see

Convergent evolution Convergent evolution is the independent evolution of similar features in species of different periods or epochs in time. Convergent evolution creates analogous structures that have similar form or function but were not present in the last com ...

). #Nonfunctionality can manifest itself in many ways. Normally, a gene must go through several steps to a fully functional protein: Transcription,

pre-mRNA processing Transcriptional modification or co-transcriptional modification is a set of biological processes common to most eukaryotic cells by which an RNA primary transcript is chemically altered following transcription from a gene to produce a mature, ...

, and

protein folding Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reproduc ...

are all required parts of this process. If any of these steps fails, then the sequence may be considered nonfunctional. In high-throughput pseudogene identification, the most commonly identified disablements are premature stop codons and frameshifts, which almost universally prevent the translation of a functional protein product. Pseudogenes for RNA genes are usually more difficult to discover as they do not need to be translated and thus do not have "reading frames". Pseudogenes can complicate molecular genetic studies. For example, amplification of a gene by PCR may simultaneously amplify a pseudogene that shares similar sequences. This is known as PCR bias or amplification bias. Similarly, pseudogenes are sometimes annotated as genes in

genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...

sequences. Processed pseudogenes often pose a problem for

gene prediction In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functio ...

programs, often being misidentified as real genes or exons. It has been proposed that identification of processed pseudogenes can help improve the accuracy of gene prediction methods. Recently 140 human pseudogenes have been shown to be translated. However, the function, if any, of the protein products is unknown.

Types and origin

There are four main types of pseudogenes, all with distinct mechanisms of origin and characteristic features. The classifications of pseudogenes are as follows:

Processed

In higher eukaryotes, particularly

mammal Mammals () are a group of vertebrate animals constituting the class Mammalia (), characterized by the presence of mammary glands which in females produce milk for feeding (nursing) their young, a neocortex (a region of the brain), fur ...

retrotransposition A transposable element (TE, transposon, or jumping gene) is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Tra ...

is a fairly common event that has had a huge impact on the composition of the genome. For example, somewhere between 30 and 44% of the human genome consists of repetitive elements such as

SINEs Sines () is a city and a municipality in Portugal. The municipality, divided into two parishes, has around 14,214 inhabitants (2021) in an area of . Sines holds an important oil refinery and several petrochemical industries. It is also a popular ...

and LINEs (see retrotransposons). In the process of retrotransposition, a portion of the

or hnRNA transcript of a gene is spontaneously reverse transcribed back into DNA and inserted into chromosomal DNA. Although retrotransposons usually create copies of themselves, it has been shown in an ''in vitro'' system that they can create retrotransposed copies of random genes, too. Once these pseudogenes are inserted back into the genome, they usually contain a poly-A tail, and usually have had their introns spliced out; these are both hallmark features of cDNAs. However, because they are derived from an RNA product, processed pseudogenes also lack the upstream promoters of normal genes; thus, they are considered "dead on arrival", becoming non-functional pseudogenes immediately upon the retrotransposition event. However, these insertions occasionally contribute exons to existing genes, usually via

alternatively spliced Alternative splicing, or alternative RNA splicing, or differential splicing, is an alternative splicing process during gene expression that allows a single gene to code for multiple proteins. In this process, particular exons of a gene may b ...

transcripts. A further characteristic of processed pseudogenes is common truncation of the 5' end relative to the parent sequence, which is a result of the relatively non-processive retrotransposition mechanism that creates processed pseudogenes. Processed pseudogenes are continually being created in primates. Human populations, for example, have distinct sets of processed pseudogenes across its individuals.

Non-processed

Non-processed (or duplicated) pseudogenes. Gene duplication is another common and important process in the evolution of genomes. A copy of a functional gene may arise as a result of a gene duplication event caused by

homologous recombination Homologous recombination is a type of genetic recombination in which genetic information is exchanged between two similar or identical molecules of double-stranded or single-stranded nucleic acids (usually DNA as in cellular organisms but may ...

at, for example, repetitive sine sequences on misaligned chromosomes and subsequently acquire

s that cause the copy to lose the original gene's function. Duplicated pseudogenes usually have all the same characteristics as genes, including an intact exon- intron structure and regulatory sequences. The loss of a duplicated gene's functionality usually has little effect on an organism's fitness, since an intact functional copy still exists. According to some evolutionary models, shared duplicated pseudogenes indicate the evolutionary relatedness of humans and the other primates. If pseudogenization is due to gene duplication, it usually occurs in the first few million years after the gene duplication, provided the gene has not been subjected to any selection pressure. Gene duplication generates functional redundancy and it is not normally advantageous to carry two identical genes. Mutations that disrupt either the structure or the function of either of the two genes are not deleterious and will not be removed through the selection process. As a result, the gene that has been mutated gradually becomes a pseudogene and will be either unexpressed or functionless. This kind of evolutionary fate is shown by population genetic modeling and also by genome analysis. According to evolutionary context, these pseudogenes will either be deleted or become so distinct from the parental genes so that they will no longer be identifiable. Relatively young pseudogenes can be recognized due to their sequence similarity.

Unitary pseudogenes

Various mutations (such as

indel Indel is a molecular biology term for an insertion or deletion of bases in the genome of an organism. It is classified among small genetic variations, measuring from 1 to 10 000 base pairs in length, including insertion and deletion events that ...

s and nonsense mutations) can prevent a gene from being normally transcribed or

translated Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...

, and thus the gene may become less- or non-functional or "deactivated". These are the same mechanisms by which non-processed genes become pseudogenes, but the difference in this case is that the gene was not duplicated before pseudogenization. Normally, such a pseudogene would be unlikely to become fixed in a population, but various population effects, such as

genetic drift Genetic drift, also known as allelic drift or the Wright effect, is the change in the frequency of an existing gene variant (allele) in a population due to random chance. Genetic drift may cause gene variants to disappear completely and there ...

, a population bottleneck, or, in some cases,

natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the heritable traits characteristic of a population over generations. Cha ...

, can lead to fixation. The classic example of a unitary pseudogene is the gene that presumably coded the enzyme L-gulono-γ-lactone oxidase (GULO) in primates. In all mammals studied besides primates (except guinea pigs), GULO aids in the biosynthesis of

ascorbic acid Vitamin C (also known as ascorbic acid and ascorbate) is a water-soluble vitamin found in citrus and other fruits and vegetables, also sold as a dietary supplement and as a topical 'serum' ingredient to treat melasma (dark pigment spots) ...

(vitamin C), but it exists as a disabled gene (GULOP) in humans and other primates. Another more recent example of a disabled gene links the deactivation of the

caspase 12 Caspase 12 is a protein that in humans is encoded by the ''CASP12'' gene. The protein belongs to a family of enzymes called caspases which cleave their substrates at C-terminal aspartic acid residues. It is closely related to caspase 1 and oth ...

gene (through a nonsense mutation) to positive selection in humans. It has been shown that processed pseudogenes accumulate mutations faster than non-processed pseudogenes.

Pseudo-pseudogenes

The rapid proliferation of

DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. T ...

technologies has led to the identification of many apparent pseudogenes using

techniques. Pseudogenes are often identified by the appearance of a premature stop codon in a predicted mRNA sequence, which would, in theory, prevent synthesis (

) of the normal

protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...

product of the original gene. There have been some reports of translational readthrough of such premature stop codons in mammals. As alluded to in the figure above, a small amount of the protein product of such readthrough may still be recognizable and function at some level. If so, the pseudogene can be subject to

. That appears to have happened during the evolution of ''

Drosophila ''Drosophila'' () is a genus of flies, belonging to the family Drosophilidae, whose members are often called "small fruit flies" or (less frequently) pomace flies, vinegar flies, or wine flies, a reference to the characteristic of many speci ...

species In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriat ...

. In 2016 it was reported that four predicted pseudogenes in multiple ''Drosophila'' species actually encode proteins with biologically important functions, "suggesting that such 'pseudo-pseudogenes' could represent a widespread phenomenon". For example, the functional protein (an olfactory receptor) is found only in neurons. This finding of tissue-specific biologically-functional genes that could have been classified as pseudogenes by '' in silico'' analysis complicates the analysis of sequence data. In the human genome, a number of examples have been identified that were originally classified as pseudogenes but later discovered to have a functional, although not necessarily protein-coding, role. As of 2012, it appeared that there are approximately 12,000–14,000 pseudogenes in the human genome. A 2016 proteogenomics analysis using mass spectrometry of peptides identified at least 19,262 human proteins produced from 16,271 genes or clusters of genes, with 8 new protein-coding genes identified that were previously considered pseudogenes.

Examples of pseudogene function

While the vast majority of pseudogenes have lost their function, some cases have emerged in which a pseudogene either re-gained its original or a similar function or evolved a new function. Examples include the following: ''Drosophila'' glutamate receptor. The term "pseudo-pseudogene" was coined for the gene encoding the chemosensory ionotropic glutamate receptor Ir75a of '' Drosophila sechellia'', which bears a premature termination codon (PTC) and was thus classified as a pseudogene. However, ''in vivo'' the ''D. sechellia'' Ir75a locus produces a functional receptor, owing to translational read-through of the PTC. Read-through is detected only in neurons and depends on the nucleotide sequence downstream of the PTC. siRNAs. Some endogenous siRNAs appear to be derived from pseudogenes, and thus some pseudogenes play a role in regulating protein-coding transcripts, as reviewed. One of the many examples is psiPPM1K. Processing of RNAs transcribed from psiPPM1K yield siRNAs that can act to suppress the most common type of liver cancer, hepatocellular carcinoma. This and much other research has led to considerable excitement about the possibility of targeting pseudogenes with/as therapeutic agents piRNAs. Some piRNAs are derived from pseudogenes located in piRNA clusters. Those piRNAs regulate genes via the piRNA pathway in mammalian testes and are crucial for limiting

transposable element A transposable element (TE, transposon, or jumping gene) is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transp ...

damage to the genome. BrafDFPjpg

microRNAs. There are many reports of pseudogene transcripts acting as microRNA decoys. Perhaps the earliest definitive example of such a pseudogene involved in cancer is the pseudogene of BRAF. The BRAF gene is a proto-oncogene that, when mutated, is associated with many cancers. Normally, the amount of BRAF protein is kept under control in cells through the action of miRNA. In normal situations, the amount of RNA from BRAF and the pseudogene BRAFP1 compete for miRNA, but the balance of the 2 RNAs is such that cells grow normally. However, when BRAFP1 RNA expression is increased (either experimentally or by natural mutations), less miRNA is available to control the expression of BRAF, and the increased amount of BRAF protein causes cancer. This sort of competition for regulatory elements by RNAs that are endogenous to the genome has given rise to the term ceRNA. PTEN. The PTEN gene is a known

tumor suppressor gene A tumor suppressor gene (TSG), or anti-oncogene, is a gene that regulates a cell during cell division and replication. If the cell grows uncontrollably, it will result in cancer. When a tumor suppressor gene is mutated, it results in a loss or re ...

. The PTEN pseudogene, PTENP1 is a processed pseudogene that is very similar in its genetic sequence to the wild-type gene. However, PTENP1 has a missense mutation which eliminates the

codon The genetic code is the set of rules used by living cells to translate information encoded within genetic material ( DNA or RNA sequences of nucleotide triplets, or codons) into proteins. Translation is accomplished by the ribosome, which links ...

for the initiating methionine and thus prevents translation of the normal PTEN protein. In spite of that, PTENP1 appears to play a role in

oncogenesis Carcinogenesis, also called oncogenesis or tumorigenesis, is the formation of a cancer, whereby normal cells are transformed into cancer cells. The process is characterized by changes at the cellular, genetic, and epigenetic levels and abnor ...

. The 3' UTR of PTENP1 mRNA functions as a decoy of PTEN mRNA by targeting micro RNAs due to its similarity to the PTEN gene, and overexpression of the 3' UTR resulted in an increase of PTEN protein level. That is, overexpression of the PTENP1 3' UTR leads to increased regulation and suppression of cancerous tumors. The biology of this system is basically the inverse of the BRAF system described above. Potogenes. Pseudogenes can, over evolutionary time scales, participate in

gene conversion Gene conversion is the process by which one DNA sequence replaces a homologous sequence such that the sequences become identical after the conversion event. Gene conversion can be either allelic, meaning that one allele of the same gene replaces a ...

and other mutational events that may give rise to new or newly functional genes. This has led to the concept that ''pseudo''genes could be viewed as ''pot''ogenes: ''pot''ential genes for evolutionary diversification.

Misidentified pseudogenes

Sometimes genes are thought to be pseudogenes, usually based on bioinformatic analysis, but then turn out to be functional genes. Examples include the ''Drosophila'' jingwei gene which encodes a functional alcohol dehydrogenase enzyme ''in vivo''. Another example is the human gene encoding phosphoglycerate mutase which was thought to be a pseudogene but which turned out to be a functional gene, now named . Mutations in it cause infertility.

Bacterial pseudogenes

Pseudogenes are found in

bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were am ...

. Most are found in bacteria that are not free-living; that is, they are either symbionts or obligate intracellular parasites. Thus, they do not require many genes that are needed by free-living bacteria, such as gene associated with metabolism and DNA repair. However, there is not an order to which functional

s are lost first. For example, the oldest pseudogenes in '' Mycobacterium leprae'' are in RNA polymerases and the biosynthesis of secondary metabolites while the oldest ones in ''

Shigella flexneri ''Shigella flexneri'' is a species of Gram-negative bacteria in the genus ''Shigella'' that can cause diarrhea in humans. Several different serogroups of ''Shigella'' are described; ''S. flexneri'' belongs to group ''B''. ''S. flexneri'' infec ...

'' and '' Shigella typhi'' are in

, recombination, and repair. Since most bacteria that carry pseudogenes are either symbionts or obligate intracellular parasites, genome size eventually reduces. An extreme example is the genome of '' Mycobacterium leprae'', an obligate parasite and the causative agent of

leprosy Leprosy, also known as Hansen's disease (HD), is a long-term infection by the bacteria '' Mycobacterium leprae'' or '' Mycobacterium lepromatosis''. Infection can lead to damage of the nerves, respiratory tract, skin, and eyes. This nerve d ...

. It has been reported to have 1,133 pseudogenes which give rise to approximately 50% of its transcriptome. The effect of pseudogenes and genome reduction can be further seen when compared to '' Mycobacterium marinum'', a

pathogen In biology, a pathogen ( el, πάθος, "suffering", "passion" and , "producer of") in the oldest and broadest sense, is any organism or agent that can produce disease. A pathogen may also be referred to as an infectious agent, or simply a g ...

from the same family. ''Mycobacteirum marinum'' has a larger genome compared to ''Mycobacterium leprae'' because it can survive outside the host; therefore, the genome must contain the genes needed to do so. Although genome reduction focuses on what genes are not needed by getting rid of pseudogenes, selective pressures from the host can sway what is kept. In the case of a symbiont from the '' Verrucomicrobiota'' phylum, there are seven additional copies of the gene coding the mandelalide pathway. The host, species from ''Lissoclinum'', use mandelalides as part of its defense mechanism. The relationship between epistasis and the domino theory of gene loss was observed in ''Buchnera aphidicola''. The domino theory suggests that if one gene of a cellular process becomes inactivated, then selection in other genes involved relaxes, leading to gene loss. When comparing '' Buchnera aphidicola'' and ''

Escherichia coli ''Escherichia coli'' (),Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. also known as ''E. coli'' (), is a Gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus '' Esc ...

,'' it was found that positive epistasis furthers gene loss while negative epistasis hinders it. Pseudogenes Mycobacteria

References

External links

Pseudogene interaction database, miRNA-pseudogene and protein-pseudogene interaction maps database

Yale University pseudogene database

(homologous processed pseudogenes)
RCPedia - Processed Pseudogene database
{{Repeated sequence Non-coding DNA