HOME

TheInfoList



OR:

The '
Earth Microbiome Project
'' (EMP) is an initiative founded by Janet Jansson, Jack Gilbert and Rob Knight in 2010 to collect natural samples and to analyze the microbial community around the globe.
Microbes A microorganism, or microbe,, ''mikros'', "small") and ''organism'' from the el, ὀργανισμός, ''organismós'', "organism"). It is usually written as a single word but is sometimes hyphenated (''micro-organism''), especially in olde ...
are highly abundant, diverse, and have an important role in the ecological system. For example, the ocean contains an estimated 1.3 × 1028 archaeal cells, 3.1 × 1028
bacterial Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
cells, and 1 × 1030
virus A virus is a submicroscopic infectious agent that replicates only inside the living cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Since Dmitri Ivanovsky's ...
particles. The bacterial diversity, a measure of the number of ''types'' of bacteria in a community, is estimated to be about 160 for a mL of ocean water, 6,400–38,000 for a g of soil, and 70 for a mL of sewage works. Yet , it was estimated that the total global
environmental DNA Environmental DNA or eDNA is DNA that is collected from a variety of environmental samples such as soil, seawater, snow or air, rather than directly sampled from an individual organism. As various organisms interact with the environment, DNA ...
sequencing effort had produced less than 1 percent of the total DNA found in a liter of seawater or a gram of soil, and the specific interactions between microbes are largely unknown. The EMP aims to process as many as 200,000 samples in different
biomes A biome () is a biogeographical unit consisting of a biological community that has formed in response to the physical environment in which they are found and a shared regional climate. Biomes may span more than one continent. Biome is a broader ...
, generating a complete database of microbes on earth to characterize environments and ecosystems by microbial composition and interaction. Using these data, new ecological and evolutionary theories can be proposed and tested.


Actors

The non-governmental international project was launched in 2010. As of January 2018, it listed 161 institutions, all of them universities and university-affiliated institutions, except for
IBM Research IBM Research is the research and development division for IBM, an American multinational information technology company headquartered in Armonk, New York, with operations in over 170 countries. IBM Research is the largest industrial research org ...
and the Atlanta Zoo. Crowdsourcing has come from
John Templeton Foundation The John Templeton Foundation (Templeton Foundation) is a philanthropic organization that reflects the ideas of its founder, John Templeton, who became wealthy via a career as a contrarian investor, and wanted to support progress in religious an ...
, the W. M. Keck Foundation, the
Argonne National Laboratory Argonne National Laboratory is a science and engineering research national laboratory operated by UChicago Argonne LLC for the United States Department of Energy. The facility is located in Lemont, Illinois, outside of Chicago, and is the larg ...
by the U.S. Dept. of Energy, the
Australian Research Council The Australian Research Council (ARC) is the primary non-medical research funding agency of the Australian Government, distributing more than in grants each year. The Council was established by the ''Australian Research Council Act 2001'', ...
, the Tula Foundation, and the Samuel Lawrence Foundation. Companies have provided in-kind support including MO BIO Laboratories, Luca Technologies, Eppendorf, Boreal Genomics, Illumina,
Roche F. Hoffmann-La Roche AG, commonly known as Roche, is a Swiss multinational healthcare company that operates worldwide under two divisions: Pharmaceuticals and Diagnostics. Its holding company, Roche Holding AG, has shares listed on the SIX S ...
and
Integrated DNA Technologies Integrated DNA Technologies, Inc. (IDT), headquartered in Coralville, Iowa, is a supplier of custom nucleic acids, serving the areas of academic research, biotechnology, clinical diagnostics, and pharmaceutical development. IDT's primary busin ...
.


Goals

The primary goal of the Earth Microbiome Project (EMP) has been to survey microbial composition in many environments across the planet, across time as well as space, using a standard set of protocols. The development of standardized protocols is vital, because variations in sample extraction, amplification, sequencing and analysis introduce biases that would invalidate comparisons of microbial community structure. Another important goal is to determine how reconstruction of microbial communities is affected by analytic biases. The rate of technological advance is rapid, and it is necessary to understand how data using updated protocols will compare with data collected using earlier techniques. Information from this project will be archived in a database to facilitate analysis. Other outputs will include a global atlas of protein function and a catalog of reassembled genomes classified by their taxonomic distributions.


Methods

Standard protocols for sampling, DNA extraction,
16S rRNA 16S rRNA may refer to: * 16S ribosomal RNA 16 S ribosomal RNA (or 16 S rRNA) is the RNA component of the 30S subunit of a prokaryotic ribosome ( SSU rRNA). It binds to the Shine-Dalgarno sequence and provides most of the SSU structure. The g ...
amplification,
18S rRNA 18S may refer to: *18S ribosomal RNA 18S ribosomal RNA (abbreviated 18S rRNA) is a part of the ribosomal RNA. The S in 18S represents Svedberg units. 18S rRNA is an SSU rRNA, a component of the eukaryotic ribosomal small subunit (40S). 18S rRN ...
amplification, and "
shotgun A shotgun (also known as a scattergun, or historically as a fowling piece) is a long-barreled firearm designed to shoot a straight-walled cartridge known as a shotshell, which usually discharges numerous small pellet-like spherical sub- pr ...
"
metagenomics Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microb ...
have been developed or are under development.


Sample collection

Samples will be collected using appropriate methods from various environments including deep ocean, fresh water lakes, desert sand, and soil. Standardized collection protocols will be used when possible, so that the results are comparable. Microbes from natural samples cannot always be cultured. Because of this, metagenomic methods will be employed to sequence all the DNA or RNA in a sample in a culture-independent fashion.


Wet lab

The wet lab usually needs to perform a series of procedures to select and purify the microbial portion of the samples. The purification process may be very different according to the type of sample. DNA will be extracted from soil particles, or microbes will be concentrated using a series of filtration techniques. In addition, various amplification techniques may be used to increase DNA yield. For example, non-
PCR PCR or pcr may refer to: Science * Phosphocreatine, a phosphorylated creatine molecule * Principal component regression, a statistical technique Medicine * Polymerase chain reaction ** COVID-19 testing, often performed using the polymerase chain r ...
based Multiple displacement amplification is preferred by some researchers. DNA extraction, the use of primers, and PCR protocols are all areas that, in order to avoid bias, need to be performed following carefully standardized protocols.


Sequencing

Depending on the biological question, researchers can choose to sequence a metagenomic sample using two main approaches. If the biological question to be resolved is, what types of organisms are present and in what abundance, the preferred approach would be to target and amplify a specific gene that is highly conserved among the species of interest. The
16S ribosomal RNA 16 S ribosomal RNA (or 16 S rRNA) is the RNA component of the 30S subunit of a prokaryotic ribosome (SSU rRNA). It binds to the Shine-Dalgarno sequence and provides most of the SSU structure. The genes coding for it are referred to as 16S rRN ...
gene for bacteria and the
18S ribosomal RNA 18S ribosomal RNA (abbreviated 18S rRNA) is a part of the ribosomal RNA. The S in 18S represents Svedberg units. 18S rRNA is an SSU rRNA, a component of the eukaryotic ribosomal small subunit (40S). 18S rRNA is the structural RNA for the small ...
gene for protists are often used as target genes for this purpose. The advantage of targeting a specific gene is that the gene can be amplified and sequenced at a very high coverage. This approach is called "deep sequencing", which allows rare species to be identified in a sample. However, this approach will not enable assembly of any whole genomes, nor will it provide information on how organisms may interact with each other. The second approach is called shotgun metagenomics, in which all the DNA in the sample is sheared and the random fragments sequenced. In principle, this approach allows for the assembly of whole microbial genomes, and it allows inference of metabolic relationships. However, if most of microbes are uncharacterised in a given environment, ''de novo'' assembly will be computationally expensive.


Data analysis

EMP proposes to standardize the
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combine ...
aspects of sample processing. Data analysis usually includes the following steps: 1) Data clean up. A pre-procedure to clean up any reads with low quality scores removing any sequences containing "N" or ambiguous nucleotides and 2) Assigning taxonomy to the sequences which is usually done using tools such as
BLAST Blast or The Blast may refer to: *Explosion, a rapid increase in volume and release of energy in an extreme manner *Detonation, an exothermic front accelerating through a medium that eventually drives a shock front Film * ''Blast'' (1997 film), ...
or RDP. Very often, novel sequences are discovered which cannot be mapped to existing taxonomy. In this case, taxonomy is derived from a
phylogenetic tree A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
which is created with the novel sequences and a pool of closely related known sequences. Depending on the sequencing technology and the underlying biological question, additional methods may be employed. For example, if the sequenced reads are too short to infer any useful information, an assembly will be required. An assembly can also be used to construct whole genomes, which will provide useful information on the species. Furthermore, if the metabolic relationships within a microbial metagenome are to be understood, DNA sequences need to be translated into amino acid sequences, for example with using gene prediction tools such as GeneMark or FragGeneScan.


Project output

The four key outputs from the EMP have been: * All primary data generated from the Earth Microbiome Project, regardless of their degree of conclusiveness, will be stored in a centralized database called the "Gene Atlas" (GA). The GA will have sequence data, annotations and environmental metadata. Known as well as unknown sequences, ''i.e.'' "Dark Matter", will be included hoping that, given the time, the unknown sequences may eventually be characterized. * Assembled genomes, annotated using an automated pipeline, will be stored in "Earth Microbiome Assembled Genomes" (EM-AG) in public repositories. These will enable comparative genomic analysis. * Interactive visualizations of the data will be provided through the "Earth Microbiome Visualization Portal" (EM-VIP), which will allow the relationship between microbial makeup, environmental parameters, and genomic function to be viewed. * Reconstructed metabolic profiles will be offered through "Earth Microbiome Metabolic Reconstruction" (EMMR).


Challenges

Large amounts of sequence data generated from analyzing diverse microbial communities are a challenge to store, organize and analyse. The problem is exacerbated by the short reads provided by the high-throughput sequencing platform that will be the standard instrument used in the EMP project. Improved algorithms, improved analysis tools, huge amounts of computer storage, and access to many thousands of hours of supercomputer time will be necessary. Another challenge will be the large number of sequencing errors that are expected.
Next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation s ...
technologies provide enormous throughput but lower accuracies than older sequencing methods. When sequencing a single genome, the intrinsic lower accuracy of these methods is far more than compensated for by the ability to cover the entire genome multiple times in opposite directions from multiple start points, but this capability provides no improvement in accuracy when sequencing a diverse mixture of genomes. The question will be, how can sequencing errors be distinguished from actual diversity in the collected microbial samples? Despite the issuance of standard protocols, systematic biases from lab to lab are expected. The need to amplify DNA from samples with low biomass will introduce additional distortions of the data. Assembly of genomes of even the dominant organisms in a diverse sample of organisms requires gigabytes of sequence data. The EMP must avoid a problem that has become prevalent in the public sequence databases. With the advancement in
high-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
technologies, many sequences are entering public databases with no experimentally determined function, but which have been annotated on the basis of observed homologies with a known sequence. The first known sequence is used to annotate the first unknown sequence, but what is happening is that the first unknown sequence is being used to annotate the second unknown sequence and so on.
Sequence homology Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a sp ...
is only a modestly reliable predictor of function.


See also

* Earth BioGenome Project *
Human microbiome project The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on id ...
*
Microbiome A microbiome () is the community of microorganisms that can usually be found living together in any given habitat. It was defined more precisely in 1988 by Whipps ''et al.'' as "a characteristic microbial community occupying a reasonably well ...
*
Metagenomics Metagenomics is the study of genetic material recovered directly from environmental or clinical samples by a method called sequencing. The broad field may also be referred to as environmental genomics, ecogenomics, community genomics or microb ...
*
Skin flora Skin flora, also called skin microbiota, refers to microbiota (communities of microorganisms) that reside on the skin, typically human skin. Many of them are bacteria of which there are around 1,000 species upon human skin from nineteen phyla. ...


Notes

{{reflist, 2


External links


The Earth Microbiome Project



US NIH human microbiome project page

The International Human Microbiome Consortium



Microbiome.org
A microbiome wiki portal site. Microbiology Microbiomes