HOME

TheInfoList



OR:

The UCSC Genome Browser is an online and downloadable
genome browser In bioinformatics, a genome browser is a graphical interface for display of information from a biological database Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughp ...
hosted by the
University of California, Santa Cruz The University of California, Santa Cruz (UC Santa Cruz or UCSC) is a public land-grant research university in Santa Cruz, California. It is one of the ten campuses in the University of California system. Located on Monterey Bay, on the ed ...
(UCSC). It is an interactive website offering access to genome sequence data from a variety of
vertebrate Vertebrates () comprise all animal taxa within the subphylum Vertebrata () ( chordates with backbones), including all mammals, birds, reptiles, amphibians, and fish. Vertebrates represent the overwhelming majority of the phylum Chordata, with ...
and
invertebrate Invertebrates are a paraphyletic group of animals that neither possess nor develop a vertebral column (commonly known as a ''backbone'' or ''spine''), derived from the notochord. This is a grouping including all animals apart from the chorda ...
species and major model organisms, integrated with a large collection of aligned annotations. The Browser is a graphical viewer optimized to support fast interactive performance and is an open-source, web-based tool suite built on top of a
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database ...
database for rapid visualization, examination, and querying of the data at many levels. The Genome Browser Database, browsing tools, downloadable data files, and documentation can all be found on the UCSC Genome Bioinformatics website.


History

Initially built and still managed by
Jim Kent William James Kent (born February 10, 1960) is an American research scientist and computer programmer. He has been a contributor to genome database projects and the 2003 winner of the Benjamin Franklin Award. Early life Kent was born in Hawai ...
, then a graduate student, and David Haussler, professor of Computer Science (now Biomolecular Engineering) at the
University of California, Santa Cruz The University of California, Santa Cruz (UC Santa Cruz or UCSC) is a public land-grant research university in Santa Cruz, California. It is one of the ten campuses in the University of California system. Located on Monterey Bay, on the ed ...
in 2000, the UCSC Genome Browser began as a resource for the distribution of the initial fruits of the Human
Genome Project Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism (be it an animal, a plant, a fungus, a bacterium, an archaean, a protist or a virus) and to annotate protein-coding genes ...
. Funded by the
Howard Hughes Medical Institute The Howard Hughes Medical Institute (HHMI) is an American non-profit medical research organization based in Chevy Chase, Maryland. It was founded in 1953 by Howard Hughes, an American business magnate, investor, record-setting pilot, engineer, fi ...
and the National Human Genome Research Institute,
NHGRI The National Human Genome Research Institute (NHGRI) is an institute of the National Institutes of Health, located in Bethesda, Maryland. NHGRI began as the Office of Human Genome Research in The Office of the Director in 1988. This Office transi ...
(one of the US
National Institutes of Health The National Institutes of Health, commonly referred to as NIH (with each letter pronounced individually), is the primary agency of the United States government responsible for biomedical and public health research. It was founded in the lat ...
), the browser offered a graphical display of the first full-chromosome draft assembly of human genome sequence. Today the browser is used by geneticists, molecular biologists and physicians as well as students and teachers of evolution for access to genomic information.


Genomes

In the years since its inception, the UCSC Browser has expanded to accommodate genome sequences of all vertebrate species and selected invertebrates for which high-coverage genomic sequences is available, now including 108
species In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriat ...
. High coverage is necessary to allow overlap to guide the construction of larger contiguous regions. Genomic sequences with less coverage are included in multiple-alignment tracks on some browsers, but the fragmented nature of these assemblies does not make them suitable for building full featured browsers. (more below on multiple-alignment tracks). The species hosted with full-featured genome browsers are shown in the table. Apart from these 108 species and their assemblies, the UCSC Genome Browser also offer
Assembly Hubs
, web-accessible directories of genomic data that can be viewed on the browser and include assemblies that are not hosted natively on it. There, users can load and annotate unique assemblies for which UCSC does not provide an annotation database. A full list of species and their assemblies can be viewed in th
GenArk Portal
including 2,589 assemblies hosted by both UCSC Genome Browser database and Assembly Hubs. An example can be seen in th

assembly hub.


Browser functionality

The large amount of data about biological systems that is accumulating in the literature makes it necessary to collect and digest information using the tools of
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
. The UCSC Genome Browser presents a diverse collection of annotation datasets (known as "tracks" and presented graphically), including mRNA alignments, mappings of DNA repeat elements, gene predictions, gene-expression data, disease-association data (representing the relationships of genes to diseases), and mappings of commercially available gene chips (e.g., Illumina and
Agilent Agilent Technologies, Inc. is an American life sciences company that provides instruments, software, services, and consumables for the entire laboratory workflow. Its global headquarters is located in Santa Clara, California. Agilent was establ ...
). The basic paradigm of display is to show the genome sequence in the horizontal dimension, and show graphical representations of the locations of the mRNAs, gene predictions, etc. Blocks of color along the coordinate axis show the locations of the alignments of the various data types. The ability to show this large variety of data types on a single coordinate axis makes the browser a handy tool for the vertical integration of the data. To find a specific gene or genomic region, the user may type in the gene name, a DNA sequence, an accession number for an RNA, the name of a genomic cytological band (e.g., 20p13 for band 13 on the short arm of chr20) or a chromosomal position (chr17:38,450,000-38,531,000 for the region around the gene
BRCA1 Breast cancer type 1 susceptibility protein is a protein that in humans is encoded by the ''BRCA1'' () gene. Orthologs are common in other vertebrate species, whereas invertebrate genomes may encode a more distantly related gene. ''BRCA1'' is a ...
). Presenting the data in the graphical format allows the browser to present link access to detailed information about any of the annotations. The gene details page of the UCSC Genes track provides a large number of links to more specific information about the gene at many other data resources, such as Online Mendelian Inheritance in Man (
OMIM Online Mendelian Inheritance in Man (OMIM) is a continuously updated catalog of human genes and genetic disorders and traits, with a particular focus on the gene-phenotype relationship. , approximately 9,000 of the over 25,000 entries in OMIM ...
) and
SwissProt UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from ...
. Designed for the presentation of complex and voluminous data, the UCSC Browser is optimized for speed. By pre-aligning millions of RNA secuences from
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
to each of the 244 genome assemblies (many of the 108 species have more than one assembly), the browser allows instant access to the alignments of any RNA to any of the hosted species. The juxtaposition of the many types of data allow researchers to display exactly the combination of data that will answer specific questions. A pdf/postscript output functionality allows export of a camera-ready image for publication in academic journals. One unique and useful feature that distinguishes the UCSC Browser from other genome browsers is the continuously variable nature of the display. Sequence of any size can be displayed, from a single DNA base up to the entire chromosome (human chr1 = 245 million bases, Mb) with full annotation tracks. Researchers can display a single gene, a single exon, or an entire chromosome band, showing dozens or hundreds of genes and any combination of the many annotations. A convenient drag-and-zoom feature allows the user to choose any region in the genome image and expand it to occupy the full screen. Researchers may also use the browser to display their own data via the Custom Tracks tool. This feature allows users to upload a file of their own data and view the data in the context of the reference genome assembly. Users may also use the data hosted by UCSC, creating subsets of the data of their choosing with the Table Browser tool (such as only the SNPs that change the amino acid sequence of a protein) and display this specific subset of the data in the browser as a Custom Track. Any browser view created by a user, including those containing Custom Tracks, may be shared with other users via the Saved Sessions tool.


Tracks

Below the displayed images of the UCSC Genome browser are eleven categories of additional tracks that can be selected and displayed alongside the original data. Researchers can select tracks which best represent their query to allow for more applicable data to be displayed depending on the type and depth of research being done. These categories are as follows:


Analysis tools

The UCSC site hosts a set of genome analysis tools, including a full-featured GUI interface for mining the information in the browser database, a FASTA format sequence alignment tool BLAT that is also useful for simply finding sequences in the massive sequence (human genome = 3.23 billion bases b of any of the featured genomes. A liftOver tool uses whole-genome alignments to allow conversion of sequences from one assembly to another or between species. The Genome Graphs tool allows users to view all chromosomes at once and display the results of
genome-wide association studies In genomics, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an observational study of a genome-wide set of genetic variants in different individuals to see if any varian ...
(GWAS). The Gene Sorter displays genes grouped by parameters not linked to genome location, such as expression pattern in tissues.


Open source / mirrors

The UCSC Browser code base is open-source for non-commercial use, and is mirrored locally by many research groups, allowing private display of data in the context of the public data. The UCSC Browser is mirrored at several locations worldwide, as shown in the table. The Browser code is also used in separate installations by the
UCSC Malaria Genome Browser UCSC Malaria Genome Browser is a bioinformatic research tool to study the malaria genome, developed by Hughes Undergraduate Research Laboratory together with the laboratory of Prof. Manuel Ares Jr. at the University of California, Santa Cruz. The ...
and the Archaea Browser.


See also

*
Ensembl Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other v ...
*
ENCODE The Encyclopedia of DNA Elements (ENCODE) is a public research project which aims to identify functional elements in the human genome. ENCODE also supports further biomedical research by "generating community resources of genomics data, software ...
*
List of biological databases Biological databases are stores of biological information. The journal '' Nucleic Acids Research'' regularly publishes special issues on biological databases and has a list of such databases. The 2018 issue has a list of about 180 such databases a ...


References


External links

*
On-line Training/Tutorials & User's Guides

YouTube tutorials
{{DEFAULTSORT:Ucsc Genome Browser Bioinformatics software Genome databases Bioinformatics National Institutes of Health Computational biology Biological databases University of California, Santa Cruz