GeneNetwork
   HOME

TheInfoList



OR:

GeneNetwork is a combined database and
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
data analysis software resource for
systems genetics A system is a group of interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its environment, is described by its boundaries, structure and purpose and expresse ...
. This resource is used to study
gene regulatory network A gene (or genetic) regulatory network (GRN) is a collection of molecular regulators that interact with each other and with other substances in the cell to govern the gene expression levels of mRNA and proteins which, in turn, determine the fun ...
s that link DNA sequence differences to corresponding differences in gene and protein expression and to variation in traits such as health and disease risk. Data sets in GeneNetwork are typically made up of large collections of genotypes (e.g., SNPs) and phenotypes from groups of individuals, including humans, strains of mice and rats, and organisms as diverse as
Drosophila melanogaster ''Drosophila melanogaster'' is a species of fly (the taxonomic order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly" or "pomace fly". Starting with Ch ...
,
Arabidopsis thaliana ''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small flowering plant native to Eurasia and Africa. ''A. thaliana'' is considered a weed; it is found along the shoulders of roads and in disturbed land. A winter a ...
, and
barley Barley (''Hordeum vulgare''), a member of the grass family, is a major cereal grain grown in temperate climates globally. It was one of the first cultivated grains, particularly in Eurasia as early as 10,000 years ago. Globally 70% of barley pr ...
. The inclusion of genotypes makes it practical to carry out web-based
gene mapping Gene mapping describes the methods used to identify the locus of a gene and the distances between genes. Gene mapping can also describe the distances between different sites within a gene. The essence of all genome mapping is to place a co ...
to discover those regions of genomes that contribute to differences among individuals in mRNA, protein, and metabolite levels, as well as differences in cell function, anatomy, physiology, and behavior.


History

Development of GeneNetwork started at the University of Tennessee Health Science Center in 1994 as a web-based version of th
Portable Dictionary of the Mouse Genome (1994)
GeneNetwork is both the first and the longest continuously operating web service in biomedical research ee https://en.wikipedia.org/wiki/List_of_websites_founded_before_1995 In 1999 the Portable Gene Dictionary was combined with Kenneth F. Manly'
Map Manager
QT mapping program to produce an online system for real-time genetic analysis. In early 2003, the first large
Affymetrix Affymetrix is now Applied Biosystems, a brand of DNA microarray products sold by Thermo Fisher Scientific that originated with an American biotechnology research and development and manufacturing company of the same name. The Santa Clara, Califor ...
gene expression data sets (whole
mouse brain The mouse brain refers to the brain of Mus musculus. Various brain atlases exist. For reasons of reproducibility, genetically characterized, stable strains like C57BL/6 were chosen to produce high-resolution images and databases. Well known onli ...
mRNA and hematopoietic stem cells) were incorporated and the system was renamed WebQTL. GeneNetwork is now developed by an international group of developers and has mirror and development sites in Europe, Asia, and Australia. Production services are hosted on systems at
University of Tennessee Health Science Center The University of Tennessee Health Science Center (UTHSC) is a public medical school in Memphis, Tennessee. It includes the Colleges of Health Professions, Dentistry, Graduate Health Sciences, Medicine, Nursing, and Pharmacy. Since 1911, the ...
with a backup instance in Europe. A the current production version of GeneNetwork (also known as GN2) was released in 2016. The current version of GeneNetwork uses the same database as its predecessor, GN1, but has much more modular and maintainable open source code (available o
GitHub
. GeneNetwork now also has significant new features including support for: * Genetically complex populations using linear mixed model implemented with an updated version o
GEMMA

R/qtl
modules with many mapping options, including mapping of 4-way intercrosses and heterogeneous stock *
Weighted correlation network analysis Weighted correlation network analysis, also known as weighted gene co-expression network analysis (WGCNA), is a widely used data mining method especially for studying biological networks based on pairwise correlations between variables. While it ...
, also known as WGCNA *
Cytoscape Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating with gene expression profiles and other state data. Additional features are available as plugins. Plugins are availabl ...
network display
Correlated trait loci mapping
ref name="do1 dx.doi.org/10.21105/joss.00087"> * A genome browser to display genetic and genomic data that is based on Biodalliance * Linked modules to th
Bayesian Network Webserver
for causal modeling


Organization and use

GeneNetwork consists of two major components: * Massive collections of genetic, genomic, and phenotype data for large cohorts of individuals * Sophisticated statistical analysis and gene mapping software that enable analysis of molecular and cellular networks and genotype-to-phenotype relations Four levels of data are usually obtained for each family or population: # DNA sequences and
genotype The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
s # Molecular expression data often generated using
arrays An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
,
RNA-seq RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing c ...
, epigenomic, proteomic, metabolomic, and metagenomic methods (molecular phenotypes) # Standard quantitative
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology or physical form and structure, its developmental processes, its biochemical and physiological proper ...
s that are often parts of a typical medical record (e.g., blood chemistry, body weight) # Annotation files and
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
for traits and data sets The combined data types are housed together in a relational database and IPSF fileserver, and are conceptually organized and grouped by species, cohort, and family. The system is implemented as a
LAMP (software bundle) LAMP (Linux, Apache, MySQL, PHP/Perl/Python) is an acronym denoting one of the most common software stacks for many of the web's most popular applications. However, LAMP now refers to a generic software stack model and its components are largel ...
stack. Code and a simplified version of the
MariaDB MariaDB is a community-developed, commercially supported fork of the MySQL relational database management system (RDBMS), intended to remain free and open-source software under the GNU General Public License. Development is led by some of the ori ...
database are available o
GitHub
GeneNetwork is primarily used by researchers, but has also been adopted successfully for undergraduate and graduate courses in genetics and bioinformatics (se
YouTube example
, bioinformatics, physiology, and psychology. Researchers and students typically retrieve sets of genotypes and phenotypes from one or more families and use built-in statistical and mapping functions to explore relations among variables and to assemble networks of associations. Key steps include the analysis of these factors: # The range of variation of traits # Covariation among traits (scatterplots and correlations, principal component analysis) # Architecture of larger networks of traits #
Quantitative trait locus A quantitative trait locus (QTL) is a locus (section of DNA) that correlates with variation of a quantitative trait in the phenotype of a population of organisms. QTLs are mapped by identifying which molecular markers (such as SNPs or AFLPs) c ...
mapping and causal models of the linkage between sequence differences and phenotype differences


Data sources

Traits and molecular expression data sets are submitted by researchers directly or are extracted from repositories such as
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...
Gene Expression Omnibus. Data cover a variety of cells and tissues—from single cell populations of the immune system, specific tissues (retina, prefrontal cortex), to entire systems (whole brain, lung, muscle, heart, fat, kidney, flower, whole plant embryos). A typical data set covers hundreds of fully genotyped individuals and may also include technical and biological replicates. Genotypes and phenotypes are usually taken from peer-reviewed papers. GeneNetwork includes annotation files for several RNA profiling platforms (Affymetrix, Illumina, and Agilent). RNA-seq and quantitative proteomic, metabolomic, epigenetics, and metagenomic data are also available for several species, including mouse and human.


Tools and features

There are tools on the site for a wide range of functions that range from simple graphical displays of variation in gene expression or other phenotypes, scatter plots of pairs of traits (Pearson or rank order), construction of both simple and complex network graphs, analysis of principal components and synthetic traits, QTL mapping using marker regression, interval mapping, and pair scans for epistatic interactions. Most functions work with up to 100 traits and several functions work with an entire
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
. The database can be browsed and searched at the mai
search
page. An on-lin
tutorial
is available. Users can als
download
the primary data sets as text files, Excel, or in the case of network graphs, as
SBML The Systems Biology Markup Language (SBML) is a representation format, based on XML, for communicating and storing computational models of biological processes. It is a free and open standard with widespread software support and a community of use ...
. As of 2017
GN2
is available as a beta release.


Code

GeneNetwork is an open source project released under the
Affero General Public License The Affero General Public License (Affero GPL and informally Affero License) is a free software license. The first version of the Affero General Public License (AGPLv1), was published by Affero, Inc. in March 2002, and based on the GNU General Pu ...
(AGPLv3). The majority of code is written in Python, but includes modules and other code written in C, R, and JavaScript. The code is mainly Python 2.4. GN2 is mainly written in Python 2.7 in a
Flask Flask may refer to: Container * Hip flask, a small container used to carry a small amount of liquid * Laboratory flask, laboratory glassware for holding larger volumes than simple test tubes ** Erlenmeyer flask, a common laboratory flask wit ...
framework with Jinja2 HTML templates) but with conversion to Python 3.X planned over the next few years. GN2 calls many statistical procedures written in the
R programming language R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinform ...
. The original source code from 2010 along with a compact database are available o
SourceForge
Whil
GN1
was actively maintained through 2019
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
, as of 2020 all work is focused o
GN2


See also

*
Computational genomics Computational genomics refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data, including both DNA and RNA sequence as well as other "post-genomic" data (i.e., experimental data obtai ...
*
Cytoscape Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating with gene expression profiles and other state data. Additional features are available as plugins. Plugins are availabl ...
*
KEGG KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis i ...
(The Kyoto Encyclopedia of Genes and Genomes) *
Reactome Reactome is a free online database of biological pathways. There are several Reactomes that concentrate on specific organisms, the largest of these is focused on human biology, the following description concentrates on the human Reactome. It is au ...
*
WikiPathways WikiPathways is a community resource for contributing and maintaining content dedicated to biological pathways. Any registered WikiPathways user can contribute, and anybody can become a registered user. Contributions are monitored by a group of a ...


References

{{reflist


External links


GeneNetwork homepage
;Related resources Other systems genetics and network databases
BioGPS

Sage Bionetworks

AmiGo

WikiPathways

Cytoscape

esyN

GeneNetwork, Netherlands
Genetics databases Systems biology Bioinformatics software Software using the GNU AGPL license