HOME

TheInfoList



OR:

Genome projects are
scientific Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divided into twoor threemajor branches: the natural sciences, which stu ...
endeavours that ultimately aim to determine the complete
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
sequence of an
organism An organism is any life, living thing that functions as an individual. Such a definition raises more problems than it solves, not least because the concept of an individual is also difficult. Many criteria, few of them widely accepted, have be ...
(be it an
animal Animals are multicellular, eukaryotic organisms in the Biology, biological Kingdom (biology), kingdom Animalia (). With few exceptions, animals heterotroph, consume organic material, Cellular respiration#Aerobic respiration, breathe oxygen, ...
, a
plant Plants are the eukaryotes that form the Kingdom (biology), kingdom Plantae; they are predominantly Photosynthesis, photosynthetic. This means that they obtain their energy from sunlight, using chloroplasts derived from endosymbiosis with c ...
, a
fungus A fungus (: fungi , , , or ; or funguses) is any member of the group of eukaryotic organisms that includes microorganisms such as yeasts and mold (fungus), molds, as well as the more familiar mushrooms. These organisms are classified as one ...
, a
bacterium Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among the ...
, an
archaea Archaea ( ) is a Domain (biology), domain of organisms. Traditionally, Archaea only included its Prokaryote, prokaryotic members, but this has since been found to be paraphyletic, as eukaryotes are known to have evolved from archaea. Even thou ...
n, a
protist A protist ( ) or protoctist is any eukaryotic organism that is not an animal, land plant, or fungus. Protists do not form a natural group, or clade, but are a paraphyletic grouping of all descendants of the last eukaryotic common ancest ...
or a
virus A virus is a submicroscopic infectious agent that replicates only inside the living Cell (biology), cells of an organism. Viruses infect all life forms, from animals and plants to microorganisms, including bacteria and archaea. Viruses are ...
) and to annotate protein-coding
gene In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
s and other important genome-encoded features. The genome sequence of an organism includes the collective
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
sequences of each
chromosome A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
in the organism. For a
bacterium Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among the ...
containing a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs of
autosome An autosome is any chromosome that is not a sex chromosome. The members of an autosome pair in a diploid cell have the same morphology, unlike those in allosomal (sex chromosome) pairs, which may have different structures. The DNA in autosomes ...
s and 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences. The
Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
is a well known example of a genome project.


Genome assembly

Genome assembly refers to the process of taking a large number of short
DNA sequence A nucleic acid sequence is a succession of bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the order of the nu ...
s and reassembling them to create a representation of the original
chromosome A chromosome is a package of DNA containing part or all of the genetic material of an organism. In most chromosomes, the very long thin DNA fibers are coated with nucleosome-forming packaging proteins; in eukaryotic cells, the most import ...
s from which the DNA originated. In a
shotgun sequencing In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun. The Sanger sequencing#Method, chain-termination method of DNA sequencin ...
project, all the DNA from a source (usually a single
organism An organism is any life, living thing that functions as an individual. Such a definition raises more problems than it solves, not least because the concept of an individual is also difficult. Many criteria, few of them widely accepted, have be ...
, anything from a
bacterium Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among the ...
to a
mammal A mammal () is a vertebrate animal of the Class (biology), class Mammalia (). Mammals are characterised by the presence of milk-producing mammary glands for feeding their young, a broad neocortex region of the brain, fur or hair, and three ...
) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines. A genome assembly
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or ''reads'', overlap. These overlapping reads can be merged, and the process continues. Genome assembly is a very difficult
computational A computation is any type of arithmetic or non-arithmetic calculation that is well-defined. Common examples of computation are mathematical equation solving and the execution of computer algorithms. Mechanical or electronic devices (or, historic ...
problem, made more difficult because many genomes contain large numbers of identical sequences, known as repeats. These repeats can be thousands of nucleotides long, and occur different locations, especially in the large genomes of
plant Plants are the eukaryotes that form the Kingdom (biology), kingdom Plantae; they are predominantly Photosynthesis, photosynthetic. This means that they obtain their energy from sunlight, using chloroplasts derived from endosymbiosis with c ...
s and
animal Animals are multicellular, eukaryotic organisms in the Biology, biological Kingdom (biology), kingdom Animalia (). With few exceptions, animals heterotroph, consume organic material, Cellular respiration#Aerobic respiration, breathe oxygen, ...
s. The resulting (draft) genome sequence is produced by combining the information sequenced contigs and then employing linking information to create scaffolds. Scaffolds are positioned along the physical map of the chromosomes creating a "golden path".


Assembly software

Originally, most large-scale DNA sequencing centers developed their own software for assembling the sequences that they produced. However, this has changed as the software has grown more complex and as the number of sequencing centers has increased. An example of such assembler ''Short Oligonucleotide Analysis Package'' developed by BGI for de novo assembly of human-sized genomes, alignment, SNP detection, resequencing, indel finding, and structural variation analysis.


Genome annotation

Since the 1980s,
molecular biology Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
and
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
have created the need for
DNA annotation In molecular biology and genetics, DNA annotation or genome annotation is the process of describing the structure and function of the components of a genome, by analyzing and interpreting them in order to extract their biological significance and ...
. DNA annotation or genome annotation is the process of identifying attaching biological information to
sequences In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is call ...
, and particularly in identifying the locations of genes and determining what those genes do.


Time of completion

When
sequencing In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succ ...
a genome, there are usually regions that are difficult to sequence (often regions with highly repetitive DNA). Thus, 'completed' genome sequences are rarely ever complete, and terms such as 'working draft' or 'essentially complete' have been used to more accurately describe the status of such genome projects. Even when every
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
of a genome sequence has been determined, there are still likely to be errors present because DNA sequencing is not a completely accurate process. It could also be argued that a complete genome project should include the sequences of
mitochondria A mitochondrion () is an organelle found in the cells of most eukaryotes, such as animals, plants and fungi. Mitochondria have a double membrane structure and use aerobic respiration to generate adenosine triphosphate (ATP), which is us ...
and (for plants)
chloroplasts A chloroplast () is a type of membrane-bound organelle, organelle known as a plastid that conducts photosynthesis mostly in plant cell, plant and algae, algal cells. Chloroplasts have a high concentration of chlorophyll pigments which captur ...
as these
organelles In cell biology, an organelle is a specialized subunit, usually within a cell, that has a specific function. The name ''organelle'' comes from the idea that these structures are parts of cells, as organs are to the body, hence ''organelle,'' th ...
have their own genomes. It is often reported that the goal of sequencing a genome is to obtain information about the complete set of
genes In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
in that particular genome sequence. The proportion of a genome that encodes for genes may be very small (particularly in
eukaryotes The eukaryotes ( ) constitute the domain of Eukaryota or Eukarya, organisms whose cells have a membrane-bound nucleus. All animals, plants, fungi, seaweeds, and many unicellular organisms are eukaryotes. They constitute a major group of ...
such as humans, where coding DNA may only account for a few percent of the entire sequence). However, it is not always possible (or desirable) to only sequence the
coding region The coding region of a gene, also known as the coding DNA sequence (CDS), is the portion of a gene's DNA or RNA that codes for a protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared ...
s separately. Also, as scientists understand more about the role of this
noncoding DNA Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regu ...
(often referred to as
junk DNA Junk DNA (non-functional DNA) is a DNA sequence that has no known biological function. Most organisms have some junk DNA in their genomes—mostly pseudogenes and fragments of transposons and viruses—but it is possible that some organ ...
), it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism. In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. Such projects may also include
gene prediction In computational biology, gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functio ...
to find out where the genes are in a genome, and what those genes do. There may also be related projects to sequence ESTs or
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
s to help find out where the genes actually are.


Historical and technological perspectives

Historically, when sequencing eukaryotic genomes (such as the worm ''
Caenorhabditis elegans ''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a Hybrid word, blend of the Greek ''caeno-'' (recent), ''r ...
'') it was common to first
map A map is a symbolic depiction of interrelationships, commonly spatial, between things within a space. A map may be annotated with text and graphics. Like any graphic, a map may be fixed to paper or other durable media, or may be displayed on ...
the genome to provide a series of landmarks across the genome. Rather than sequence a chromosome in one go, it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). Changes in technology and in particular improvements to the processing power of computers, means that genomes can now be ' shotgun sequenced' in one go (there are caveats to this approach though when compared to the traditional approach). Improvements in
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
technology have meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost per
base pair A base pair (bp) is a fundamental unit of double-stranded nucleic acids consisting of two nucleobases bound to each other by hydrogen bonds. They form the building blocks of the DNA double helix and contribute to the folded structure of both DNA ...
) and newer technology has also meant that genomes can be sequenced far more quickly. When research agencies decide what new genomes to sequence, the emphasis has been on species which are either high importance as
model organism A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms. Mo ...
or have a relevance to human health (e.g. pathogenic
bacteria Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
or vectors of disease such as
mosquito Mosquitoes, the Culicidae, are a Family (biology), family of small Diptera, flies consisting of 3,600 species. The word ''mosquito'' (formed by ''Musca (fly), mosca'' and diminutive ''-ito'') is Spanish and Portuguese for ''little fly''. Mos ...
s) or species which have commercial importance (e.g. livestock and crop plants). Secondary emphasis is placed on species whose genomes will help answer important questions in
molecular evolution Molecular evolution describes how Heredity, inherited DNA and/or RNA change over evolutionary time, and the consequences of this for proteins and other components of Cell (biology), cells and organisms. Molecular evolution is the basis of phylogen ...
(e.g. the
common chimpanzee The chimpanzee (; ''Pan troglodytes''), also simply known as the chimp, is a species of great ape native to the forests and savannahs of tropical Africa. It has four confirmed subspecies and a fifth proposed one. When its close relative the ...
). In the future, it is likely that it will become even cheaper and quicker to sequence a genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. For humans, this will allow us to better understand aspects of human genetic diversity.


Examples

Many organisms have genome projects that have either been completed or will be completed shortly, including: *
Human Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are Hominidae, great apes characterized by their Prehistory of nakedness and clothing ...
s, ''Homo sapiens''; see
Human genome project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
* Humans, ''Homo sapiens''; see The Human Genome Project–Write * Palaeo-Eskimo, an ancient-human *
Neanderthal Neanderthals ( ; ''Homo neanderthalensis'' or sometimes ''H. sapiens neanderthalensis'') are an extinction, extinct group of archaic humans who inhabited Europe and Western and Central Asia during the Middle Pleistocene, Middle to Late Plei ...
, ''Homo sapiens neanderthalensis'' (partial); see Neanderthal Genome Project *
Common chimpanzee The chimpanzee (; ''Pan troglodytes''), also simply known as the chimp, is a species of great ape native to the forests and savannahs of tropical Africa. It has four confirmed subspecies and a fifth proposed one. When its close relative the ...
''Pan troglodytes''; see Chimpanzee Genome Project *
Woolly mammoth The woolly mammoth (''Mammuthus primigenius'') is an extinct species of mammoth that lived from the Middle Pleistocene until its extinction in the Holocene epoch. It was one of the last in a line of mammoth species, beginning with the African ...
, ''Mammuthus primigenius'' * Domestic cow, ''Bos taurus'' *
Bovine genome The genome of a female Hereford cow was published in 2009. It was sequenced by the Bovine Genome Sequencing and Analysis Consortium, a team of researchers led by the National Institutes of Health and the U.S. Department of Agriculture. It was p ...
* Honey Bee Genome Sequencing Consortium * Horse genome * HRDetect * Human microbiome project * International Grape Genome Program *
International HapMap Project The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease ...
* Tomato 150+ genome resequencing project *
100,000 Genomes Project The 100,000 Genomes Project is a now-completed UK Government project managed by Genomics England that is sequencing whole genomes from National Health Service patients. The project is focusing on rare diseases, some common types of cancer, and i ...
* 100K Pathogen Genome Project * International Mouse Phenotyping Consortium IMPC * Knockout Mouse Phenotyping Project KOMP2 *
Giant Sequoia ''Sequoiadendron giganteum'' (also known as the giant sequoia, giant redwood, Sierra redwood or Wellingtonia) is a species of coniferous tree, classified in the family Cupressaceae in the subfamily Sequoioideae. Giant sequoia specimens are the la ...
, ''Sequoiadendron giganteum''


See also

* Joint Genome Institute * Illumina, private company involved in genome sequencing * Knome, private company offering genome analysis & sequencing *
Model organism A model organism is a non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms. Mo ...
*
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is lo ...


References


External links


GOLD:Genomes OnLine DatabaseGenome Project DatabaseThe Protein Naming UtilitySUPERFAMILYEchinoBase
An Echinoderm genomic database, (previous SpBase, a sea urchin genome database)
NRCPB

Global Invertebrate Genomics Alliance (GIGA)

Wellcome Sanger Institute

Wellcome Genome Campus
{{DEFAULTSORT:Genome Project