GenePattern is a freely available
computational biology
Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has fo ...
open-source software
Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Ope ...
package originally created and developed at the
Broad Institute
The Eli and Edythe L. Broad Institute of MIT and Harvard (IPA: , pronunciation respelling: ), often referred to as the Broad Institute, is a biomedical and genomic research center located in Cambridge, Massachusetts, United States. The institu ...
for the analysis of
genomic
Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
data. Designed to enable researchers to develop, capture, and reproduce genomic analysis methodologies, GenePattern was first released in 2004. GenePattern is currently developed at the
University of California, San Diego
The University of California, San Diego (UC San Diego or colloquially, UCSD) is a public university, public Land-grant university, land-grant research university in San Diego, California. Established in 1960 near the pre-existing Scripps Insti ...
.
Functionality
GenePattern is a powerful
scientific workflow system that provides access to hundreds of genomic analysis tools. Use these analysis tools as building blocks to design sophisticated analysis pipelines that capture the methods, parameters, and data used to produce analysis results. Pipelines can be used to create, edit and share reproducible in silico results.
Project Objectives
# Accessibility: Run over 200 regularly updated analysis and visualization tools (that support
data preprocessing,
gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. ...
analysis,
proteomics
Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
,
Single nucleotide polymorphism (SNP) analysis,
flow cytometry
Flow cytometry (FC) is a technique used to detect and measure physical and chemical characteristics of a population of cells or particles.
In this process, a sample containing cells or particles is suspended in a fluid and injected into the fl ...
, and
next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation s ...
) and create analytic workflows without any programming through a point and click user interface.
# Reproducibility: Automated history and provenance tracking with versioning so that any user can share, repeat and understand a complete computational analysis
# Extensibility: Computational users can import their methods and code for sharing using tools that support easy creation and integration
# Multiple interfaces: Web browser, application, and programmatic interfaces make analysis modules and pipelines available to a broad range of users; public hosted server
Features
* A regularly updated repository of hundreds of computational analysis modules that support data preprocessing,
gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. ...
analysis,
proteomics
Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
,
single nucleotide polymorphism (SNP) analysis,
flow cytometry
Flow cytometry (FC) is a technique used to detect and measure physical and chemical characteristics of a population of cells or particles.
In this process, a sample containing cells or particles is suspended in a fluid and injected into the fl ...
, and short-read sequencing.
* A programmatic interface that makes analysis modules available to computational biologists and developers from Python, Java, MATLAB, and R.
* The GenePattern Notebook Environment: Built on the
Jupyter Notebook environment, GenePattern Notebook allows researchers to run GenePattern analyses within notebooks that interleave text, graphics, and executable code, creating a single "research narrative."
* GParc: Repository and community for GenePattern users to share and discuss their own GenePattern modules
Availability
GenePattern is available:
# As a free public web application, hosted on Amazon Web Services. Users can create accounts, perform analyses, and create pipelines on the server.
# As
open-source software
Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Ope ...
that can be downloaded and installed locally.
# Public web servers hosted by other organizations.
Notes
References
* The GenePattern Notebook Environment] Reich M, Tabor T, Liefeld T, Thorvaldsdóttir H, Hill B, Tamayo P, Mesirov JP. ''Cell Syst''. 2017 Aug 23;5(2):149-151.e1. . Epub 2017 Aug 16. ; .
* Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace Qu K, Garamszegi S, Wu F, Thorvaldsdottir H, Liefeld T, Ocana M, Borges-Rivera D, Pochet N, Robinson JT, Demchak B, Hull T, Ben-Artzi G, Blankenberg D, Barber GP, Lee BT, Kuhn RM, Nekrutenko A, Segal E, Ideker T, Reich M, Regev A,
Howard Y. Chang, Chang HY, Mesirov JP. ''Nat Methods''. 2016 Mar;13(3):245-247. . Epub 2016 Jan 18. ; .
* Using GenePattern for Gene Expression Analysis] Kuehn, H., Liberzon, A., Reich, M. and Mesirov, J. P. ''Current Protocols in Bioinformatics''. 2008. 22:7.12:7.12.1–7.12.39.
* GenePattern 2.0 Michael Reich, Ted Liefeld, Joshua Gould, Jim Lerner, Pablo Tamayo &
Jill P. Mesirov. ''Nature Genetics'' - 38, 500 - 501 (2006)
External links
Official GenePattern websiteOfficial GenePattern Notebook websiteGParc*https://github.com/genepattern/genepattern-server
Related software:
GenomeSpaceIntegrative Genomics Viewer
{{DEFAULTSORT:Genepattern
Bioinformatics software
DNA sequencing
Gene expression