HOME

TheInfoList



OR:

The 1000 Plant Transcriptomes Initiative (1KP) was an international research effort to establish the most detailed catalogue of genetic variation in plants. It was announced in 2008 and headed by Gane Ka-Shu Wong and Michael Deyholos of the
University of Alberta The University of Alberta, also known as U of A or UAlberta, is a Public university, public research university located in Edmonton, Alberta, Canada. It was founded in 1908 by Alexander Cameron Rutherford,"A Gentleman of Strathcona – Alexande ...
. The project successfully sequenced the transcriptomes (expressed genes) of 1000 different plant species by 2014;Retrieved Feb. 25, 2010
/ref> its final capstone products were published in 2019. 1KP was one of the large-scale (involving many organisms) sequencing projects designed to take advantage of the wider availability of high-throughput ("next-generation") DNA sequencing technologies. The similar
1000 Genomes Project The 1000 Genomes Project (abbreviated as 1KGP), launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one th ...
, for example, obtained high-coverage genome sequences of 1000 individual people between 2008 and 2015, to better understand human genetic variation. This project providing a template for further planetary-scale genome projects including the 10KP Project sequencing the whole genomes of 10,000 Plants, and the
Earth BioGenome Project The Earth BioGenome Project (EBP) is an initiative that aims to sequence and catalog the genomes of all of Earth's currently described eukaryotic species over a period of ten years. The initiative would produce an open DNA database of biological i ...
, aiming to sequence, catalog, and characterize the genomes of all of Earth’s
eukaryotic Eukaryotes () are organisms whose Cell (biology), cells have a cell nucleus, nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the ...
biodiversity.


Goals

, the number of classified green plant species was estimated to be around 370,000, however, there are probably many thousands more yet unclassified. Despite this number, very few of these species have detailed DNA sequence information to date; 125,426 species in
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
, , but most (>95%) having DNA sequence for only one or two genes. "...almost none of the roughly half million plant species known to humanity has been touched by genomics at any level". The 1000 Plant Genomes Project aimed to produce a roughly a 100x increase in the number of plant species with available broad genome sequence.


Evolutionary relationships

There have been efforts to determine the evolutionary relationships between the known plant species, but
phylogenies A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
(or phylogenetic trees) created solely using morphological data, cellular structures, single enzymes, or on only a few sequences (like
rRNA Ribosomal ribonucleic acid (rRNA) is a type of non-coding RNA which is the primary component of ribosomes, essential to all cells. rRNA is a ribozyme which carries out protein synthesis in ribosomes. Ribosomal RNA is transcribed from ribosoma ...
) can be prone to error; morphological features are especially vulnerable when two species look physically similar though they are not closely related (as a result of
convergent evolution Convergent evolution is the independent evolution of similar features in species of different periods or epochs in time. Convergent evolution creates analogous structures that have similar form or function but were not present in the last com ...
for example) or homology, or when two species closely related look very different because, for example, they are able to change in response to their environment very well. These situations are very common in the plant kingdom. An alternative method for constructing evolutionary relationships is through changes in DNA sequence of many genes between the different species which is often more robust to problems of similar-appearing species. With the amount of genomic sequence produced by this project, many predicted evolutionary relationships could be better tested by sequence alignment to improve their certainty. With 383,679 nuclear gene family phylogenies and 2,306 gene age distributions with ''Ks'' plots used in the final analysis and shared in
GigaDB GigaDB (GigaScience DataBase) is a disciplinary repository launched in 2011 with the aim of ensuring long-term access to massive multidimensional datasets from life science and biomedical science studies. The datasets are diverse and include geno ...
alongside the capstone paper.


Biotechnology applications

The list of plant genomes sequenced in the project was not random; instead plants that produce valuable chemicals or other products (
secondary metabolite Secondary metabolites, also called specialised metabolites, toxins, secondary products, or natural products, are organic compounds produced by any lifeform, e.g. bacteria, fungi, animals, or plants, which are not directly involved in the norma ...
s in many cases) were focused on in the hopes that characterizing the involved genes will allow the underlying biosynthetic processes to be used or modified. For example, there are many plants known to produce oils (like olives) and some of the oils from certain plants bear a strong chemical resemblance to petroleum products like the
Oil palm ''Elaeis'' () is a genus of palms containing two species, called oil palms. They are used in commercial agriculture in the production of palm oil. The African oil palm '' Elaeis guineensis'' (the species name ''guineensis'' referring to its c ...
and
hydrocarbon In organic chemistry, a hydrocarbon is an organic compound consisting entirely of hydrogen and carbon. Hydrocarbons are examples of group 14 hydrides. Hydrocarbons are generally colourless and hydrophobic, and their odors are usually weak or ...
-producing species. If these plant mechanisms could be used to produce mass quantities of industrially useful oil, or modified such that they do, then they would be of great value. Here, knowing the sequence of the plant's genes involved in the metabolic pathway producing the oil is a large first step to allow such utilization. A recent example of how engineering natural biochemical pathways works is
Golden rice Golden rice is a variety of rice (''Oryza sativa'') produced through genetic engineering to biosynthesize beta-carotene, a precursor of vitamin A, in the edible parts of the rice. It is intended to produce a fortified food to be grown and cons ...
which has involved genetically modifying its pathway, so that a precursor to vitamin A is produced in large quantities making the brown-colored rice a potential solution for vitamin A deficiency. This is concept of engineering plants to do "work" is popular and its potential would dramatically increase as a result of gene information on these 1000 plant species. Biosynthetic pathways could also be used for mass production of medicinal compounds using plants rather than manual organic chemical reactions as most are created currently. One of the most unexpected results of the project was the discovery of multiple novel light-sensitive ion-channels used extensively for optogenetic control of neurons discovered through sequencing and physiological characterization of opsins from over 100 species of alga species by the project. The characterization of these novel
channelrhodopsin Channelrhodopsins are a subfamily of retinylidene proteins ( rhodopsins) that function as light-gated ion channels. They serve as sensory photoreceptors in unicellular green algae, controlling phototaxis: movement in response to light. Express ...
sequences providing resources for protein engineers who would normally have no interest in or ability to generate sequence data from these many plant species. A number of biotech companies are developing these
channelrhodopsin Channelrhodopsins are a subfamily of retinylidene proteins ( rhodopsins) that function as light-gated ion channels. They serve as sensory photoreceptors in unicellular green algae, controlling phototaxis: movement in response to light. Express ...
proteins for medical purposes, with many of these optogenetic therapy candidates under clinical trials to restore vision for retinal blindness. The first published results of these treating
retinitis pigmentosa Retinitis pigmentosa (RP) is a genetic disorder of the eyes that causes loss of vision. Symptoms include trouble seeing at night and decreasing peripheral vision (side and upper or lower visual field). As peripheral vision worsens, people may ...
coming out in July 2021.


Project approach

Sequencing was initially done on the Illumina Genome Analyzer GAII next-generation DNA sequencing platform at the
Beijing Genomics Institute BGI Group, formerly Beijing Genomics Institute, is a Chinese genomics company with headquarters in Yantian District, Shenzhen. The company was originally formed in 1999 as a genetics research center to participate in the Human Genome Project. ...
(BGI Shenzhen, China), but later samples were run on the faster Illumina HiSeq 2000 platform. Starting with the 28 Illumina Genome Analyzer next-generation DNA sequencing machines, these were eventually upgraded to 100 HiSeq 2000 sequencers at the
Beijing Genomics Institute BGI Group, formerly Beijing Genomics Institute, is a Chinese genomics company with headquarters in Yantian District, Shenzhen. The company was originally formed in 1999 as a genetics research center to participate in the Human Genome Project. ...
. The initial 3Gb/run (3 billion base pairs per experiment) capacity of each of these machines enabled fast and accurate sequencing of the plant samples.


Species selection

The selection of plant species to be sequenced was compiled through an international collaboration of the various funding agencies and researcher groups expressing their interest in certain plants. There was a focus on those plant species that are known to have useful biosynthetic capacity to facilitate the biotechnology goals of the project, and selection of other species to fill in gaps and explain some unknown evolutionary relationships of the current plant phylogeny. In addition to industrial compound biosynthetic capacity, plant species known or suspected to produce medically active chemicals (such as poppies producing opiates) were assigned a high priority to better understand the synthesis process, explore commercial production potential, and discover new pharmaceutical options. A large number of plant species with medicinal properties were selected from
traditional Chinese medicine Traditional Chinese medicine (TCM) is an alternative medical practice drawn from traditional medicine in China. It has been described as "fraught with pseudoscience", with the majority of its treatments having no logical mechanism of acti ...
(TCM). The completed list of selected species can be publicly viewed on the website, and methodological details and data access details have been published in detail.


Transcriptome vs. genome sequencing

Rather than sequencing the entire genome (all DNA sequence) of the various plant species, the project sequenced only those regions of the genome that produce a protein product ( coding genes); the
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
. This approach is justified by the focus on biochemical pathways where only the genes producing the involved proteins are required to understand the synthetic mechanism, and because these thousands of sequences would represent adequate sequence detail to construct very robust evolutionary relationships through sequence comparison. The numbers of coding genes in plant species can vary considerably, but all have tens of thousands or more making the transcriptome a large collection of information. However, non-coding sequence makes up the majority (>90%) of the genome content. Although this approach is similar conceptually to
expressed sequence tag In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence. ESTs may be used to identify gene transcripts, and were instrumental in gene discovery and in gene-sequence determination. The identification of ESTs has proc ...
s (ESTs), it is fundamentally different in that the entire sequence of each gene will be acquired with high coverage rather than just a small portion of the gene sequence with an EST. To distinguish the two, the non-EST method is known as “shotgun transcriptome sequencing”.


Transcriptome shotgun sequencing

mRNA ( messenger RNA) is collected from a sample, converted to cDNA by a reverse transcriptase enzyme, and then fragmented so that it can be sequenced. Other than transcriptome shotgun sequencing, this technique has been called RNA-seq and whole transcriptome shotgun sequencing (WTSS). Once the cDNA fragments are sequenced, they will be ''de novo'' assembled (without aligning to a
reference genome A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assemble ...
sequence) back into the complete gene sequence by combining all of the fragments from that gene during the data analysis phase. A new a ''de novo'' transcriptome assembler designed specifically for RNA-Seq was produced for this project, SOAPdenovo-Trans being part of the SOAP suite of genome assembly tools from the BGI.


Plant tissue sampling

The samples came from around the world, with a number of particularly rare species being supplied by botanical gardens such as the
Fairy Lake Botanical Garden Fairylake Botanical Garden or Xianhu Botanical Garden () is a botanical garden and arboretum located at Liantang Subdistrict, Luohu District, Shenzhen, Guangdong, China. Fairylake Botanical Garden at the foot of Wutong Mountain, beside the Shenz ...
(Shenzhen, China). The type of tissue collected was determined by the expected location of biosynthetic activity; for example if an interesting process or chemical is known to exist primarily in the leaves, leaf sample was used. A number of RNA-sequencing protocols were adapted and tested for different tissue types, and these were openly shared via the protocols.io platform.


Potential limitations

Since only the transcriptome was sequenced, the project did not reveal information about gene regulatory sequence,
non-coding RNA A non-coding RNA (ncRNA) is a functional RNA molecule that is not Translation (genetics), translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally im ...
s, DNA repetitive elements, or other genomic features that are not part of the coding sequence. Based on the few whole plant genomes collected so far, these non-coding regions will in fact make up the majority of the genome, and the non-coding DNA may actually be the primary driver of trait differences seen between species. Since mRNA was the starting material, the amount of sequence representation for a given gene is based on the
expression Expression may refer to: Linguistics * Expression (linguistics), a word, phrase, or sentence * Fixed expression, a form of words with a specific meaning * Idiom, a type of fixed expression * Metaphorical expression, a particular word, phrase, o ...
level (how many mRNA molecules it produces). This means that highly expressed genes get better coverage because there is more sequence to work from. The result, then, is that some important genes may not have been reliably detected by the project if they are expressed at a low level yet still have important biochemical functions. Many plant species (especially agriculturally manipulated ones) are known to have undergone large genome-wide changes through duplication of the whole genome. The rice and the wheat genomes, for example, can have 4-6 copies of whole genomes (
wheat Wheat is a grass widely cultivated for its seed, a cereal grain that is a worldwide staple food. The many species of wheat together make up the genus ''Triticum'' ; the most widely grown is common wheat (''T. aestivum''). The archaeologi ...
) whereas animals typically only have 2 (
diploidy Ploidy () is the number of complete sets of chromosomes in a cell, and hence the number of possible alleles for autosomal and pseudoautosomal genes. Sets of chromosomes refer to the number of maternal and paternal chromosome copies, respectively ...
). These duplicated genes may pose a problem for the ''de novo'' assembly of sequence fragments, because repeat sequences confuse the computer programs when trying to put the fragments together, and they can be difficult to track through evolution.


Comparison with the 1000 Genomes Project


Similarities

Just as the
Beijing Genomics Institute BGI Group, formerly Beijing Genomics Institute, is a Chinese genomics company with headquarters in Yantian District, Shenzhen. The company was originally formed in 1999 as a genetics research center to participate in the Human Genome Project. ...
in Shenzhen, China is one of the major genomics centers involved in the
1000 Genomes Project The 1000 Genomes Project (abbreviated as 1KGP), launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one th ...
, the institute is the site of sequencing for the 1000 Plant Genomes Project. Both projects are large-scale efforts to obtain detailed DNA sequence information to improve our understanding of the organisms, and both projects will utilize next-generation sequencing to facilitate a timely completion.


Differences

The goals of the two projects are significantly different. While the
1000 Genomes Project The 1000 Genomes Project (abbreviated as 1KGP), launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one th ...
focuses on genetic variation in a single species, the 1000 Plant Genomes Project looks at the evolutionary relationships and genes of 1000 different plant species. While the
1000 Genomes Project The 1000 Genomes Project (abbreviated as 1KGP), launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one th ...
was estimated to cost up to $50 million USD, the 1000 Plant Genomes Project was not as expensive; the difference in cost coming from the target sequence in the genomes. Since the 1000 Plant Genomes Project only sequenced the transcriptome, whereas the human project sequenced as much of the genome as is decided feasible, there is a much lower amount of sequencing effort needed in this more specific approach. While this means that there was less overall sequence output relative to the
1000 Genomes Project The 1000 Genomes Project (abbreviated as 1KGP), launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one th ...
, the non-coding portions of the genomes excluded in the 1000 Plant Genomes Project were not as important to its goals like they are to the human project. So then the more focused approach of the 1000 Plant Genomes Project minimized cost while still achieving its goals.


Funding

The project was funded by Alberta Innovates - Technology Futures (merger of iCOR


Genome Alberta
th
University of Alberta
th
Beijing Genomics Institute
(BGI), an
Musea Ventures
(a USA-based private investment firm). To date, the project received $1.5 million CAD from the
Alberta Government The Executive Council of Alberta (the Cabinet) is a body of ministers of the Crown in right of Alberta, who along with the lieutenant governor, exercises the powers of the Government of Alberta. Ministers are selected by the premier and typica ...
and another $0.5 million from Musea Ventures. In January 2010, BGI announced that it would be contributing $100 million to large-scale sequencing projects of plants and animals (including the 1000 Plant Genomes Project, and then following on to the 10,000 Plant Genome Project).


Related projects

*
The 1000 Genomes Project The 1000 Genomes Project (abbreviated as 1KGP), launched in January 2008, was an international research effort to establish by far the most detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of at least one th ...
– A Deep Catalog of Human Genetic Variation * The 1001 Genomes Project – Sequencing the whole genome of 1,001 Arabidopsis strains * Genome 10K – Whole genome sequence of 10,000
vertebrate Vertebrates () comprise all animal taxa within the subphylum Vertebrata () (chordates with backbones), including all mammals, birds, reptiles, amphibians, and fish. Vertebrates represent the overwhelming majority of the phylum Chordata, with c ...
species


See also

* * * * *


References


External links

* {{Official website, http://www.onekp.com Genome projects