Genome Project
   HOME

TheInfoList



OR:

Genome projects are
scientific Science is a systematic endeavor that builds and organizes knowledge in the form of testable explanations and predictions about the universe. Science may be as old as the human species, and some of the earliest archeological evidence for ...
endeavours that ultimately aim to determine the complete
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ge ...
sequence of an
organism In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells (cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and ...
(be it an
animal Animals are multicellular, eukaryotic organisms in the Kingdom (biology), biological kingdom Animalia. With few exceptions, animals Heterotroph, consume organic material, Cellular respiration#Aerobic respiration, breathe oxygen, are Motilit ...
, a
plant Plants are predominantly photosynthetic eukaryotes of the kingdom Plantae. Historically, the plant kingdom encompassed all living things that were not animals, and included algae and fungi; however, all current definitions of Plantae exclud ...
, a
fungus A fungus ( : fungi or funguses) is any member of the group of eukaryotic organisms that includes microorganisms such as yeasts and molds, as well as the more familiar mushrooms. These organisms are classified as a kingdom, separately from th ...
, a
bacterium Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
, an
archaea Archaea ( ; singular archaeon ) is a domain of single-celled organisms. These microorganisms lack cell nuclei and are therefore prokaryotes. Archaea were initially classified as bacteria, receiving the name archaebacteria (in the Archaebac ...
n, a
protist A protist () is any eukaryotic organism (that is, an organism whose cells contain a cell nucleus) that is not an animal, plant, or fungus. While it is likely that protists share a common ancestor (the last eukaryotic common ancestor), the exc ...
or a
virus A virus is a submicroscopic infectious agent that replicates only inside the living cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Since Dmitri Ivanovsky's 1 ...
) and to annotate protein-coding
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
s and other important genome-encoded features. The genome sequence of an organism includes the collective DNA sequences of each
chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins are ...
in the organism. For a
bacterium Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of
autosome An autosome is any chromosome that is not a sex chromosome. The members of an autosome pair in a diploid cell have the same morphology, unlike those in allosome, allosomal (sex chromosome) pairs, which may have different structures. The DNA in au ...
s and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences. The
Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
is a well known example of a genome project.


Genome assembly

Genome assembly refers to the process of taking a large number of short
DNA sequence DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
s and reassembling them to create a representation of the original
chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins are ...
s from which the DNA originated. In a
shotgun sequencing In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun. The Sanger sequencing#Method, chain-termination method of DNA sequencin ...
project, all the DNA from a source (usually a single
organism In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells (cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and ...
, anything from a
bacterium Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
to a
mammal Mammals () are a group of vertebrate animals constituting the class Mammalia (), characterized by the presence of mammary glands which in females produce milk for feeding (nursing) their young, a neocortex (a region of the brain), fur or ...
) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines. A genome assembly
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or ''reads'', overlap. These overlapping reads can be merged, and the process continues. Genome assembly is a very difficult
computational Computation is any type of arithmetic or non-arithmetic calculation that follows a well-defined model (e.g., an algorithm). Mechanical or electronic devices (or, historically, people) that perform computations are known as ''computers''. An espe ...
problem, made more difficult because many genomes contain large numbers of identical sequences, known as repeats. These repeats can be thousands of nucleotides long, and occur different locations, especially in the large genomes of
plant Plants are predominantly photosynthetic eukaryotes of the kingdom Plantae. Historically, the plant kingdom encompassed all living things that were not animals, and included algae and fungi; however, all current definitions of Plantae exclud ...
s and
animal Animals are multicellular, eukaryotic organisms in the Kingdom (biology), biological kingdom Animalia. With few exceptions, animals Heterotroph, consume organic material, Cellular respiration#Aerobic respiration, breathe oxygen, are Motilit ...
s. The resulting (draft) genome sequence is produced by combining the information sequenced
contig A contig (from ''contiguous'') is a set of overlapping DNA segments that together represent a consensus region of DNA.Gregory, S. ''Contig Assembly''. Encyclopedia of Life Sciences, 2005. In bottom-up sequencing projects, a contig refers to ov ...
s and then employing linking information to create scaffolds. Scaffolds are positioned along the physical map of the chromosomes creating a "golden path".


Assembly software

Originally, most large-scale DNA sequencing centers developed their own software for assembling the sequences that they produced. However, this has changed as the software has grown more complex and as the number of sequencing centers has increased. An example of such
assembler Assembler may refer to: Arts and media * Nobukazu Takemura, avant-garde electronic musician, stage name Assembler * Assemblers, a fictional race in the ''Star Wars'' universe * Assemblers, an alternative name of the superhero group Champions of ...
''Short Oligonucleotide Analysis Package'' developed by BGI for de novo assembly of human-sized genomes, alignment, SNP detection, resequencing, indel finding, and structural variation analysis.


Genome annotation

Since the 1980s,
molecular biology Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and physi ...
and
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
have created the need for
DNA annotation DNA annotation or genome annotation is the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. An annotation (irrespective of the context) is a note added by way of explanati ...
. DNA annotation or genome annotation is the process of identifying attaching biological information to
sequences In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called t ...
, and particularly in identifying the locations of genes and determining what those genes do.


Time of completion

When
sequencing In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succ ...
a genome, there are usually regions that are difficult to sequence (often regions with highly
repetitive DNA Repeated sequences (also known as repetitive elements, repeating units or repeats) are short or long patterns of nucleic acids (DNA or RNA) that occur in multiple copies throughout the genome. In many organisms, a significant fraction of the geno ...
). Thus, 'completed' genome sequences are rarely ever complete, and terms such as 'working draft' or 'essentially complete' have been used to more accurately describe the status of such genome projects. Even when every
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
of a genome sequence has been determined, there are still likely to be errors present because DNA sequencing is not a completely accurate process. It could also be argued that a complete genome project should include the sequences of
mitochondria A mitochondrion (; ) is an organelle found in the Cell (biology), cells of most Eukaryotes, such as animals, plants and Fungus, fungi. Mitochondria have a double lipid bilayer, membrane structure and use aerobic respiration to generate adenosi ...
and (for plants)
chloroplasts A chloroplast () is a type of membrane-bound organelle known as a plastid that conducts photosynthesis mostly in plant cell, plant and algae, algal cells. The photosynthetic pigment chlorophyll captures the energy from sunlight, converts it, ...
as these
organelles In cell biology, an organelle is a specialized subunit, usually within a cell, that has a specific function. The name ''organelle'' comes from the idea that these structures are parts of cells, as organs are to the body, hence ''organelle,'' th ...
have their own genomes. It is often reported that the goal of sequencing a genome is to obtain information about the complete set of
genes In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
in that particular genome sequence. The proportion of a genome that encodes for genes may be very small (particularly in
eukaryotes Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
such as humans, where
coding DNA The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non ...
may only account for a few percent of the entire sequence). However, it is not always possible (or desirable) to only sequence the
coding region The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to no ...
s separately. Also, as scientists understand more about the role of this
noncoding DNA Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regul ...
(often referred to as
junk DNA Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regula ...
), it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism. In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. Such projects may also include
gene prediction In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functiona ...
to find out where the genes are in a genome, and what those genes do. There may also be related projects to sequence ESTs or
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
s to help find out where the genes actually are.


Historical and technological perspectives

Historically, when sequencing eukaryotic genomes (such as the worm '' Caenorhabditis elegans'') it was common to first
map A map is a symbolic depiction emphasizing relationships between elements of some space, such as objects, regions, or themes. Many maps are static, fixed to paper or some other durable medium, while others are dynamic or interactive. Although ...
the genome to provide a series of landmarks across the genome. Rather than sequence a chromosome in one go, it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). Changes in technology and in particular improvements to the processing power of computers, means that genomes can now be ' shotgun sequenced' in one go (there are caveats to this approach though when compared to the traditional approach). Improvements in
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
technology has meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
) and newer technology has also meant that genomes can be sequenced far more quickly. When research agencies decide what new genomes to sequence, the emphasis has been on species which are either high importance as
model organism A model organism (often shortened to model) is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workin ...
or have a relevance to human health (e.g. pathogenic
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
or vectors of disease such as
mosquito Mosquitoes (or mosquitos) are members of a group of almost 3,600 species of small flies within the family Culicidae (from the Latin ''culex'' meaning " gnat"). The word "mosquito" (formed by ''mosca'' and diminutive ''-ito'') is Spanish for "li ...
s) or species which have commercial importance (e.g. livestock and crop plants). Secondary emphasis is placed on species whose genomes will help answer important questions in
molecular evolution Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics ...
(e.g. the
common chimpanzee The chimpanzee (''Pan troglodytes''), also known as simply the chimp, is a species of Hominidae, great ape native to the forest and savannah of tropical Africa. It has four confirmed subspecies and a fifth proposed subspecies. When its close r ...
). In the future, it is likely that it will become even cheaper and quicker to sequence a genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. For humans, this will allow us to better understand aspects of human genetic diversity.


Examples

Many organisms have genome projects that have either been completed or will be completed shortly, including: *
Human Humans (''Homo sapiens'') are the most abundant and widespread species of primate, characterized by bipedalism and exceptional cognitive skills due to a large and complex brain. This has enabled the development of advanced tools, culture, ...
s, ''Homo sapiens''; see
Human genome project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
* Humans, ''Homo sapiens''; see The Human Genome Project–Write * Palaeo-Eskimo, an ancient-human *
Neanderthal Neanderthals (, also ''Homo neanderthalensis'' and erroneously ''Homo sapiens neanderthalensis''), also written as Neandertals, are an extinct species or subspecies of archaic humans who lived in Eurasia until about 40,000 years ago. While th ...
, ''Homo sapiens neanderthalensis'' (partial); see
Neanderthal Genome Project The Neanderthal genome project is an effort of a group of scientists to sequence the Neanderthal genome, founded in July 2006. It was initiated by 454 Life Sciences, a biotechnology company based in Branford, Connecticut in the United States and ...
*
Common chimpanzee The chimpanzee (''Pan troglodytes''), also known as simply the chimp, is a species of Hominidae, great ape native to the forest and savannah of tropical Africa. It has four confirmed subspecies and a fifth proposed subspecies. When its close r ...
''Pan troglodytes''; see
Chimpanzee Genome Project The Chimpanzee Genome Project was an effort to determine the DNA sequence of the chimpanzee genome. Sequencing began in 2005 and by 2013 twenty-four individual chimpanzees had been sequenced. This project was folded into the Great Ape Genome Pro ...
*
Wooly mammoth Wool is the textile fibre obtained from sheep and other mammals, especially goats, rabbits, and camelids. The term may also refer to inorganic materials, such as mineral wool and glass wool, that have properties similar to animal wool. As ...
, ''Mammuthus primigenius'' * Domestic
cow Cattle (''Bos taurus'') are large, domesticated, cloven-hooved, herbivores. They are a prominent modern member of the subfamily Bovinae and the most widespread species of the genus ''Bos''. Adult females are referred to as cows and adult ma ...
, ''Bos taurus'' *
Bovine genome The genome of a female Hereford cow was published in 2009. It was sequenced by the Bovine Genome Sequencing and Analysis Consortium, a team of researchers led by the National Institutes of Health and the U.S. Department of Agriculture. It was par ...
* Honey Bee Genome Sequencing Consortium *
Horse genome The horse genome was first sequenced in 2006. The Horse Genome Project mapped 2.7 billion DNA base pairs, and released the full map in 2009. The horse genome is larger than the dog genome, but smaller than the human genome or the bovine genome. ...
*
Human microbiome project The Human Microbiome Project (HMP) was a United States National Institutes of Health (NIH) research initiative to improve understanding of the microbiota involved in human health and disease. Launched in 2007, the first phase (HMP1) focused on i ...
*
International Grape Genome Program The International Grape Genomics Program (IGGP) is a collaborative genome project dedicated to determining the genome sequence of the grapevine ''Vitis vinifera''. It is a multinational project involving research centers in Australia, Canada, Chi ...
*
International HapMap Project The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease a ...
* Tomato 150+ genome resequencing project *
100,000 Genomes Project The 100,000 Genomes Project is a now-completed UK Government project managed by Genomics England that is sequencing whole genomes from National Health Service patients. The project is focusing on rare diseases, some common types of cancer, and ...
* 100K Pathogen Genome Project * International Mouse Phenotyping Consortium IMPC * Knockout Mouse Phenotyping Project KOMP2 *
Giant Sequoia ''Sequoiadendron giganteum'' (giant sequoia; also known as giant redwood, Sierra redwood, Sierran redwood, California big tree, Wellingtonia or simply big treea nickname also used by John Muir) is the sole living species in the genus ''Sequoiade ...
, ''Sequoiadendron giganteum''


See also

*
Joint Genome Institute The U.S. Department of Energy (DOE) Joint Genome Institute (JGI), first located in Walnut Creek then Berkeley, California, was created in 1997 to unite the expertise and resources in genome mapping, DNA sequencing, technology development, and i ...
* Illumina, private company involved in genome sequencing *
Knome Knome, Inc. was a human genome interpretation company based in Cambridge, Massachusetts. Launched in 2007, Knome focused on improving quality of life by applying scientific insights gained from the interpretation of human genomes. Their product ...
, private company offering genome analysis & sequencing *
Model organism A model organism (often shortened to model) is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workin ...
*
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...


References


External links


GOLD:Genomes OnLine DatabaseGenome Project DatabaseThe Protein Naming UtilitySUPERFAMILYEchinoBase
An Echinoderm genomic database, (previous SpBase, a sea urchin genome database)
NRCPB

Global Invertebrate Genomics Alliance (GIGA)

Wellcome Sanger Institute

Wellcome Genome Campus
{{DEFAULTSORT:Genome Project