LTR Retrotransposons
   HOME

TheInfoList



OR:

LTR retrotransposons are class I
transposable element A transposable element (TE, transposon, or jumping gene) is a nucleic acid sequence in DNA that can change its position within a genome, sometimes creating or reversing mutations and altering the cell's genetic identity and genome size. Transp ...
characterized by the presence of
long terminal repeat A long terminal repeat (LTR) is a pair of identical sequences of DNA, several hundred base pairs long, which occur in eukaryotic genomes on either end of a series of genes or pseudogenes that form a retrotransposon or an endogenous retrovirus or ...
s (LTRs) directly flanking an internal coding region. As retrotransposons, they mobilize through reverse transcription of their
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
and integration of the newly created
cDNA In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a speci ...
into another location. Their mechanism of retrotransposition is shared with
retrovirus A retrovirus is a type of virus that inserts a DNA copy of its RNA genome into the DNA of a host cell that it invades, thus changing the genome of that cell. Once inside the host cell's cytoplasm, the virus uses its own reverse transcriptase ...
es, with the difference that most LTR-retrotransposons do not form infectious particles that leave the cells and therefore only replicate inside their genome of origin. Those that do (occasionally) form
virus-like particle Virus-like particles (VLPs) are molecules that closely resemble viruses, but are non-infectious because they contain no viral genetic material. They can be naturally occurring or synthesized through the individual expression of viral structural pro ...
s are classified under ''
Ortervirales ''Ortervirales'' is an order that contains all accepted species of single-stranded RNA viruses that replicate through a DNA intermediate (Group VI) and all accepted species of double-stranded DNA viruses (except '' Hepadnaviridae'') that replic ...
''. Their size ranges from a few hundred base pairs to 25kb, for example the Ogre retrotransposon in the pea genome. In plant genomes, LTR retrotransposons are the major repetitive sequence class, for example, constituting more than 75% of the maize genome. LTR retrotransposons make up about 8% of the human genome and approximately 10% of the mouse genome.


Structure and propagation

LTR retrotransposons have
direct Direct may refer to: Mathematics * Directed set, in order theory * Direct limit of (pre), sheaves * Direct sum of modules, a construction in abstract algebra which combines several vector spaces Computing * Direct access (disambiguation), a ...
long terminal repeat A long terminal repeat (LTR) is a pair of identical sequences of DNA, several hundred base pairs long, which occur in eukaryotic genomes on either end of a series of genes or pseudogenes that form a retrotransposon or an endogenous retrovirus or ...
s that range from ~100 bp to over 5 kb in size. LTR retrotransposons are further sub-classified into the Ty1-''copia''-like (
Pseudoviridae ''Pseudoviridae'' is a family of viruses, which includes three genera. Viruses of the family are actually LTR retrotransposons of the Ty1-copia family. They replicate via structures called virus-like particles (VLPs). VLPs are not infectious l ...
), Ty3-like (
Metaviridae ''Metaviridae'' is a family of viruses which exist as Ty3-gypsy LTR retrotransposons in a eukaryotic host's genome. They are closely related to retroviruses: members of the family ''Metaviridae'' share many genomic elements with retroviruses, in ...
, formally referred to as Gypsy-like, a name that is being considered for retirement), and BEL-Pao-like (
Belpaoviridae ''Semotivirus'' is the only genus of viruses in the family ''Belpaoviridae'' (formerly included in the family ''Metaviridae''). Species exist as retrotransposons in a eukaryotic host's genome. BEL/pao transposons are only found in animals. Spec ...
) groups based on both their degree of sequence similarity and the order of encoded gene products. Ty1-''copia'' and Ty3-Metaviridae groups of retrotransposons are commonly found in high copy number (up to a few million copies per
haploid Ploidy () is the number of complete sets of chromosomes in a cell, and hence the number of possible alleles for autosomal and pseudoautosomal genes. Sets of chromosomes refer to the number of maternal and paternal chromosome copies, respectively ...
nucleus Nucleus ( : nuclei) is a Latin word for the seed inside a fruit. It most often refers to: *Atomic nucleus, the very dense central region of an atom *Cell nucleus, a central organelle of a eukaryotic cell, containing most of the cell's DNA Nucle ...
) in animals, fungi, protista, and plants genomes. BEL-Pao like elements have so far only been found in animals. All functional LTR-retrotransposons encode a minimum of two genes,
gag A gag is usually an item or device designed to prevent speech, often as a restraint device to stop the subject from calling for help and keep its wearer silent. This is usually done by blocking the mouth, partially or completely, or attemptin ...
and pol, that are sufficient for their replication. ''Gag'' encodes a polyprotein with a capsid and a nucleocapsid domain. Gag proteins form virus-like particles in the cytoplasm inside which reverse-transcription occurs. The ''Pol'' gene produces three proteins: a
protease A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes (increases reaction rate or "speeds up") proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the ...
(PR), a
reverse transcriptase A reverse transcriptase (RT) is an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, ...
endowed with an RT (reverse-transcriptase) and an
RNAse H Ribonuclease H (abbreviated RNase H or RNH) is a family of non-sequence-specific endonuclease enzymes that catalyze the cleavage of RNA in an RNA/ DNA substrate via a hydrolytic mechanism. Members of the RNase H family can be found in nearly a ...
domains, and an
integrase Retroviral integrase (IN) is an enzyme produced by a retrovirus (such as HIV) that integrates—forms covalent links between—its genetic information into that of the host cell it infects. Retroviral INs are not to be confused with phage int ...
(IN). Typically, LTR-retrotransposon mRNAs are produced by the host
RNA pol II RNA polymerase II (RNAP II and Pol II) is a multiprotein complex that transcribes DNA into precursors of messenger RNA (mRNA) and most small nuclear RNA (snRNA) and microRNA. It is one of the three RNAP enzymes found in the nucleus of eukaryoti ...
acting on a promoter located in their 5’ LTR. The Gag and Pol genes are encoded in the same mRNA. Depending on the host species, two different strategies can be used to express the two polyproteins: a fusion into a single open reading frame (ORF) that is then cleaved or the introduction of a frameshift between the two ORFs. Occasional ribosomal frameshifting allows the production of both proteins, while ensuring that much more Gag protein is produced to form virus-like particles. Reverse transcription usually initiates at a short sequence located immediately downstream of the 5’-LTR and termed the
primer binding site A primer binding site is a region of a nucleotide sequence where an RNA or DNA single-stranded primer binds to start replication. The primer binding site is on one of the two complementary strands of a double-stranded nucleotide polymer, in the ...
(PBS). Specific host
tRNAs Transfer RNA (abbreviated tRNA and formerly referred to as sRNA, for soluble RNA) is an adaptor molecule composed of RNA, typically 76 to 90 nucleotides in length (in eukaryotes), that serves as the physical link between the mRNA and the amino a ...
bind to the PBS and act as primers for reverse-transcription, which occurs in a complex and multi-step process, ultimately producing a double- stranded
cDNA In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a speci ...
molecule. The cDNA is finally integrated into a new location, creating short TSDs (Target Site Duplications) and adding a new copy in the host genome


Types


Ty1-''copia'' retrotransposons

Ty1-''copia'' retrotransposons are abundant in species ranging from single-cell
algae Algae (; singular alga ) is an informal term for a large and diverse group of photosynthetic eukaryotic organisms. It is a polyphyletic grouping that includes species from multiple distinct clades. Included organisms range from unicellular mic ...
to
bryophytes The Bryophyta s.l. are a proposed taxonomic division containing three groups of non-vascular land plants (embryophytes): the liverworts, hornworts and mosses. Bryophyta s.s. consists of the mosses only. They are characteristically limited i ...
,
gymnosperms The gymnosperms ( lit. revealed seeds) are a group of seed-producing plants that includes conifers, cycads, ''Ginkgo'', and gnetophytes, forming the clade Gymnospermae. The term ''gymnosperm'' comes from the composite word in el, γυμνό ...
, and
angiosperms Flowering plants are plants that bear flowers and fruits, and form the clade Angiospermae (), commonly called angiosperms. The term "angiosperm" is derived from the Greek words ('container, vessel') and ('seed'), and refers to those plants th ...
. They encode four protein domains in the following order:
protease A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes (increases reaction rate or "speeds up") proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the ...
,
integrase Retroviral integrase (IN) is an enzyme produced by a retrovirus (such as HIV) that integrates—forms covalent links between—its genetic information into that of the host cell it infects. Retroviral INs are not to be confused with phage int ...
,
reverse transcriptase A reverse transcriptase (RT) is an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, ...
, and
ribonuclease H Ribonuclease H (abbreviated RNase H or RNH) is a family of non-sequence-specific endonuclease enzymes that catalyze the cleavage of RNA in an RNA/ DNA substrate via a hydrolytic mechanism. Members of the RNase H family can be found in nearly a ...
. At least two classification systems exist for the subdivision of Ty1-''copia'' retrotransposons into five lineages: ''Sireviruses''/Maximus, Oryco/Ivana, Retrofit/Ale, TORK (subdivided in Angela/Sto, TAR/Fourf, GMR/Tork), and Bianca. ''Sireviruses''/Maximus retrotransposons contain an additional putative envelope gene. This lineage is named for the founder element SIRE1 in the ''
Glycine max Glycine (symbol Gly or G; ) is an amino acid that has a single hydrogen atom as its side chain. It is the simplest stable amino acid (carbamic acid is unstable), with the chemical formula NH2‐ CH2‐ COOH. Glycine is one of the proteinogeni ...
'' genome, and was later described in many species such as ''
Zea mays Maize ( ; ''Zea mays'' subsp. ''mays'', from es, maíz after tnq, mahiz), also known as corn (North American and Australian English), is a cereal grain first domesticated by indigenous peoples in southern Mexico about 10,000 years ago. Th ...
'', ''
Arabidopsis thaliana ''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small flowering plant native to Eurasia and Africa. ''A. thaliana'' is considered a weed; it is found along the shoulders of roads and in disturbed land. A winter a ...
'', ''
Beta vulgaris ''Beta vulgaris'' (beet) is a species of flowering plant in the subfamily Betoideae of the family Amaranthaceae. Economically, it is the most important crop of the large order Caryophyllales. It has several cultivar groups: the sugar beet, of gre ...
'', and ''
Pinus pinaster ''Pinus pinaster'', the maritime pine or cluster pine, is a pine native to the south Atlantic Europe region and parts of the western Mediterranean. It is a hard, fast growing pine bearing small seeds with large wings. Description ''Pinus pinast ...
''. Plant ''Sireviruses'' of many sequenced plant genomes are summarized at the MASIVEdb ''Sirevirus'' database.


Ty3-retrotransposons (formally gypsy)

Ty3-retrotransposons are widely distributed in the plant kingdom, including both
gymnosperm The gymnosperms ( lit. revealed seeds) are a group of seed-producing plants that includes conifers, cycads, ''Ginkgo'', and gnetophytes, forming the clade Gymnospermae. The term ''gymnosperm'' comes from the composite word in el, γυμνό ...
s and
angiosperms Flowering plants are plants that bear flowers and fruits, and form the clade Angiospermae (), commonly called angiosperms. The term "angiosperm" is derived from the Greek words ('container, vessel') and ('seed'), and refers to those plants th ...
. They encode at least four protein domains in the order:
protease A protease (also called a peptidase, proteinase, or proteolytic enzyme) is an enzyme that catalyzes (increases reaction rate or "speeds up") proteolysis, breaking down proteins into smaller polypeptides or single amino acids, and spurring the ...
,
reverse transcriptase A reverse transcriptase (RT) is an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, ...
,
ribonuclease H Ribonuclease H (abbreviated RNase H or RNH) is a family of non-sequence-specific endonuclease enzymes that catalyze the cleavage of RNA in an RNA/ DNA substrate via a hydrolytic mechanism. Members of the RNase H family can be found in nearly a ...
, and
integrase Retroviral integrase (IN) is an enzyme produced by a retrovirus (such as HIV) that integrates—forms covalent links between—its genetic information into that of the host cell it infects. Retroviral INs are not to be confused with phage int ...
. Based on structure, presence/absence of specific protein domains, and conserved protein sequence motifs, they can be subdivided into several lineages: ''Errantiviruses'' contain an additional defective envelope ORF with similarities to the retroviral envelope gene. First described as Athila-elements in ''
Arabidopsis thaliana ''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small flowering plant native to Eurasia and Africa. ''A. thaliana'' is considered a weed; it is found along the shoulders of roads and in disturbed land. A winter a ...
'', they have been later identified in many species, such as ''
Glycine max Glycine (symbol Gly or G; ) is an amino acid that has a single hydrogen atom as its side chain. It is the simplest stable amino acid (carbamic acid is unstable), with the chemical formula NH2‐ CH2‐ COOH. Glycine is one of the proteinogeni ...
'' and ''
Beta vulgaris ''Beta vulgaris'' (beet) is a species of flowering plant in the subfamily Betoideae of the family Amaranthaceae. Economically, it is the most important crop of the large order Caryophyllales. It has several cultivar groups: the sugar beet, of gre ...
''. ''Chromoviruses'' contain an additional chromodomain (chromatin organization modifier domain) at the C-terminus of their integrase protein. They are widespread in plants and fungi, probably retaining protein domains during evolution of these two kingdoms. It is thought that the chromodomain directs retrotransposon integration to specific target sites. According to sequence and structure of the chromodomain, chromoviruses are subdivided into the four clades CRM, Tekay, Reina and Galadriel. Chromoviruses from each clade show distinctive integration patterns, e.g. into centromeres or into the rRNA genes. Ogre-elements are gigantic Ty3-retrotransposons reaching lengths up to 25 kb. Ogre elements have been first described in ''
Pisum sativum The pea is most commonly the small spherical seed or the seed-pod of the flowering plant species ''Pisum sativum''. Each pod contains several peas, which can be green or yellow. Botanically, pea pods are fruit, since they contain seeds and d ...
''. ''Metaviruses'' describe conventional Ty3-''gypsy'' retrotransposons that do not contain additional domains or ORFs.


BEL/pao family

The BEL/pao family is found in animals.


Endogenous retroviruses (ERV)

Although
retrovirus A retrovirus is a type of virus that inserts a DNA copy of its RNA genome into the DNA of a host cell that it invades, thus changing the genome of that cell. Once inside the host cell's cytoplasm, the virus uses its own reverse transcriptase ...
es are often classified separately, they share many features with LTR retrotransposons. A major difference with Ty1-''copia'' and Ty3-''gypsy'' retrotransposons is that retroviruses have an envelope protein (ENV). A retrovirus can be transformed into an LTR retrotransposon through inactivation or deletion of the domains that enable extracellular mobility. If such a retrovirus infects and subsequently inserts itself in the genome in germ line cells, it may become transmitted vertically and become an Endogenous Retrovirus.


Terminal repeat retrotransposons in miniature (TRIMs)

Some LTR retrotransposons lack all of their coding domains. Due to their short size, they are referred to as terminal repeat retrotransposons in miniature (TRIMs). Nevertheless, TRIMs can be able to retrotranspose, as they may rely on the coding domains of autonomous Ty1-''copia'' or Ty3-''gypsy'' retrotransposons. Among the TRIMs, the Cassandra family plays an exceptional role, as the family is unusually wide-spread among higher plants. In contrast to all other characterized TRIMs, Cassandra elements harbor a 5S rRNA promoter in their LTR sequence. Due to their short overall length and the relatively high contribution of the flanking LTRs, TRIMs are prone to re-arrangements by recombination.


References

{{Repeated sequence Mobile genetic elements