HOME
The Info List - Ensembl


--- Advertisement ---



Ensembl
Ensembl
genome database project is a joint scientific project between the European Bioinformatics Institute
European Bioinformatics Institute
and the Wellcome Trust
Wellcome Trust
Sanger Institute, which was launched in 1999 in response to the imminent completion of the Human Genome
Genome
Project.[2] Ensembl
Ensembl
aims to provide a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms.[3] Ensembl
Ensembl
is one of several well known genome browsers for the retrieval of genomic information. Similar databases and browsers are found at NCBI and the University of California, Santa Cruz (UCSC).

Contents

1 Background 2 Displaying genomic data 3 Alternative access methods 4 Current species 5 See also 6 References 7 External links

Background[edit] The human genome consists of three billion base pairs, which code for approximately 20,000–25,000 genes. However the genome alone is of little use, unless the locations and relationships of individual genes can be identified. One option is manual annotation, whereby a team of scientists tries to locate genes using experimental data from scientific journals and public databases. However this is a slow, painstaking task. The alternative, known as automated annotation, is to use the power of computers to do the complex pattern-matching of protein to DNA.[citation needed] In the Ensembl
Ensembl
project, sequence data are fed into the gene annotation system (a collection of software "pipelines" written in Perl) which creates a set of predicted gene locations and saves them in a MySQL database for subsequent analysis and display. Ensembl
Ensembl
makes these data freely accessible to the world research community. All the data and code produced by the Ensembl
Ensembl
project is available to download,[4] and there is also a publicly accessible database server allowing remote access. In addition, the Ensembl
Ensembl
website provides computer-generated visual displays of much of the data. Over time the project has expanded to include additional species (including key model organisms such as mouse, fruitfly and zebrafish) as well as a wider range of genomic data, including genetic variations and regulatory features. Since April 2009, a sister project, Ensembl Genomes, has extended the scope of Ensembl
Ensembl
into invertebrate metazoa, plants, fungi, bacteria, and protists, whilst the original project continues to focus on vertebrates. Displaying genomic data[edit]

Gene
Gene
SGCB aligned to the human genome

Central to the Ensembl
Ensembl
concept is the ability to automatically generate graphical views of the alignment of genes and other genomic data against a reference genome. These are shown as data tracks, and individual tracks can be turned on and off, allowing the user to customise the display to suit their research interests. The interface also enables the user to zoom in to a region or move along the genome in either direction. Other displays show data at varying levels of resolution, from whole karyotypes down to text-based representations of DNA
DNA
and amino acid sequences, or present other types of display such as trees of similar genes (homologues) across a range of species. The graphics are complemented by tabular displays, and in many cases data can be exported directly from the page in a variety of standard file formats such as FASTA. Externally produced data can also be added to the display, either via a DAS (Distributed Annotation System) server on the internet, or by uploading a suitable file in one of the supported formats, such as BAM, BED, or PSL. Graphics are generated using a suite of custom Perl
Perl
modules based on GD, the standard Perl
Perl
graphics display library. Alternative access methods[edit] In addition to its website, Ensembl
Ensembl
provides a Perl
Perl
API[5] (Application Programming Interface) that models biological objects such as genes and proteins, allowing simple scripts to be written to retrieve data of interest. The same API is used internally by the web interface to display the data. It is divided in sections like the core API, the compara API (for comparative genomics data), the variation API (for accessing SNPs, SNVs, CNVs..), and the functional genomics API (to access regulatory data). The Ensembl
Ensembl
website provides extensive information on how to install and use the API. This software can be used to access the public MySQL
MySQL
database, avoiding the need to download enormous datasets. The users could even choose to retrieve data from the MySQL
MySQL
with direct SQL queries, but this requires an extensive knowledge of the current database schema. Large datasets can be retrieved using the BioMart
BioMart
data-mining tool. It provides a web interface for downloading datasets using complex queries. Last, there is an FTP server which can be used to download entire MySQL
MySQL
databases as well some selected data sets in other formats. Current species[edit] The annotated genomes include most fully sequenced vertebrates and selected model organisms. All of them are eukaryotes, there are no prokaryotes. As of 2008[update], this includes:

Chordata

Mammalia

Euarchontoglires

Primates: bushbaby, chimp, human, macaque, mouse lemur, orangutan, tarsier; Scandentia: tree shrew ; Glires
Glires
(= Rodents + Lagomorphs): guineapig, kangaroo rat, mouse, rat, ground squirrel, pika, rabbit ;

Laurasiatheria: cow, dolphin, alpaca, pig, cat, dog, horse, megabat, microbat, hedgehog, shrew ; Afrotheria: elephant, hyrax, tenrec Xenarthra: armadillo, sloth ; Marsupialia: opossum, wallaby ; Monotremes: platypus;

Birds: chicken, zebra finch; Lepidosauria: anole lizard (pre); Lissamphibia: Xenopus tropicalis; Teleost
Teleost
fishes: Takifugu rubripes (fugu), Tetraodon nigroviridis (green spotted pufferfish), Danio rerio (zebrafish), Oryzias latipes (medaka), Gasterosteus aculeatus (stickleback); Cyclostomata: Petromyzon marinus (sea lamprey) (pre); Tunicates: Ciona intestinalis, Ciona savignyi;

Non-vertebrates

Insects: Drosophila melanogaster
Drosophila melanogaster
(fruitfly), Anopheles gambiae (mosquito), Aedes aegypti (mosquito) Worm: Caenorhabditis elegans

Yeast: Saccharomyces cerevisiae
Saccharomyces cerevisiae
(baker's yeast)

See also[edit]

List of sequenced eukaryotic genomes Sequence analysis Sequence profiling tool Sequence motif UCSC Genome
Genome
Browser

References[edit]

^ Hubbard T.; et al. (January 2002). "The Ensembl
Ensembl
genome database project". Nucleic Acids Res. 30 (1): 38–41. doi:10.1093/nar/30.1.38. PMC 99161 . PMID 11752248. Retrieved 11 November 2014.  ^ Flicek P, Amode MR, Barrell D, et al. (November 2010). "Ensembl 2011". Nucleic Acids Res. 39 ( Database
Database
issue): D800–D806. doi:10.1093/nar/gkq1064. PMC 3013672 . PMID 21045057.  ^ Flicek P, Aken BL, Ballester B, et al. (January 2010). "Ensembl's 10th year". Nucleic Acids Res. 38 ( Database
Database
issue): D557–62. doi:10.1093/nar/gkp972. PMC 2808936 . PMID 19906699.  ^ Ruffier, Magali; Kähäri, Andreas; Komorowska, Monika; Keenan, Stephen; Laird, Matthew; Longden, Ian; Proctor, Glenn; Searle, Steve; Staines, Daniel; Taylor, Kieron; Vullo, Alessandro; Yates, Andrew; Zerbino, Daniel; Flicek, Paul (January 2017). " Ensembl
Ensembl
core software resources: storage and programmatic access for DNA
DNA
sequence and genome annotation". Database. 2017 (1). doi:10.1093/database/bax020.  ^ Stabenau A, McVicker G, Melsopp C, Proctor G, Clamp M, Birney E (February 2004). "The Ensembl
Ensembl
Core Software Libraries". Genome Research. 14 (5): 929–933. doi:10.1101/gr.1857204. PMC 479122 . PMID 15123588. 

External links[edit]

Wikimedia Commons has media related to Ensembl.

Official website Vega Pre-Ensembl Ensembl
Ensembl
genomes UCSC Genome
Genome
Browser NCBI Ensembl: Browsing chordate genomes on EBI Train OnLine

v t e

Bioinformatics

Databases

Sequence databases: GenBank, European Nucleotide Archive
European Nucleotide Archive
and DNA
DNA
Data Bank of Japan Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL
TrEMBL
and Protein
Protein
Information Resource Other databases: Protein
Protein
Data Bank, Ensembl
Ensembl
and InterPro Specialised genomic databases: BOLD, Saccharomyces Genome
Genome
Database, FlyBase, VectorBase, WormBase, PHI-base, Arabidopsis Information Resource and Zebrafish
Zebrafish
Information Network

Software

BLAST Bowtie Clustal HMMER MUSCLE SAMtools TopHat

Other

Server: ExPASy Ontology: Gene
Gene
Ontology

Institutions

European Bioinformatics
Bioinformatics
Institute US National Center for Biotechnology Information Swiss Institute of Bioinformatics Japanese Institute of Genetics Broad Institute Wellcome Sanger Institute

Organizations

International Society for Computational Biology
International Society for Computational Biology
(ISCB) European Molecular Biology network (EMBnet) African Society for Bioinformatics
Bioinformatics
and Computational Biology (ASBCB) Japanese Society for Bioinformatics
Bioinformatics
(JSBi)

Meetings

Intelligent Systems for Molecular Biology
Intelligent Systems for Molecular Biology
(ISMB) Research in Computational Molecular Biology
Research in Computational Molecular Biology
(RECOMB) European Conference on Computational Biology
European Conference on Computational Biology
(ECCB) Pacific Symposium on Biocomputing (PSB) ISCB Africa ASBCB Conference on Bioinformatics Basel Computational Biology Conference‎ ([BC2]) International Conference on Bioinformatics
Bioinformatics
(InCoB)

Computational biology List of biological databases Sequencing Sequence database Sequence alignment Molecular phylogenetics

v t e

Wellcome Trust

Centres and institutes

Current

Francis Crick Institute Gurdon Institute Sainsbury Wellcome Centre for Neural Circuits and Behaviour Science Learning Centres WTC for Cell-Matrix Research WTC for Gene
Gene
Regulation and Expression WTC for Human Genetics WTC for Mitochondrial Research WTC for Molecular Parasitology WTC for Neuroimaging WTC for Stem Cell Research Wellcome Sanger Institute

Former

Wellcome Trust
Wellcome Trust
Centre for the History of Medicine Wellcome Research Laboratories

Projects and facilities

1000 Genomes Project Big Picture Cambridge Biomedical Campus Cancer Genome
Genome
Project ChEMBL COSMIC cancer database DECIPHER Diamond Light Source eLife Ensembl Farmcare Genome
Genome
Reference Consortium Human Genome
Genome
Project Malawi Liverpool Wellcome Trust MEROPS Pfam Rfam UK Biobank Wellcome Collection Wellcome Genome
Genome
Campus Wellcome Library WormBase

Board of Governors

Tobias Bonhoeffer Alan Brown Damon Buffini William Burns Kay Davies Michael Ferguson Bryan Grenfell Richard Hynes Anne Johnson Eliza Manningham-Buller Peter Rigby

Executive Board

Jeremy Farrar Stephen Caddick Simon Chaplin Tim Livett Clare Matterson Ted Smith Danny Truell

Former directors

Peter Williams (1965–1991) Bridget Ogilvie
Bridget Ogilvie
(1991–1998) Michael Dexter
Michael Dexter
(1998–2008) Mark Walport
Mark Walport
(2003–2013)

Other key people

William Castell Dominic Cadbury Harold Cook Oliver Franks Roger Gibbs Henry Dale Roy Porter David Steel David Stuart John Sulston Henry Wellcome

Awards and fellowships

Capital Awards Collaborative Awards in Science Investigator Awards in Science Institutional Strategic Support Fund Science Strategic Award Sir Henry Dale Fellowship Sir Henry Wellcome
Henry Wellcome
Postdoctoral Fellowship Wellcome Book Prize Wellcome Image Awards Wellcome Trust
Wellcome Trust
Centre Wellcome Trust
Wellcome Trust
Principal Research Fellow Wellcome Trust
Wellcome Trust
Senior Research Fell

.