Weighted correlation network analysis, also known as weighted gene co-expression
network
Network, networking and networked may refer to:
Science and technology
* Network theory, the study of graphs as a representation of relations between discrete objects
* Network science, an academic field that studies complex networks
Mathematics
...
analysis (WGCNA), is a widely used
data mining method especially for studying
biological network
A biological network is a method of representing systems as complex sets of binary interactions or relations between various biological entities. In general, networks or graphs are used to capture relationships between entities or objects. A typi ...
s based on pairwise
correlations
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variable
A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formal ...
between variables. While it can be applied to most
high-dimensional
In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coord ...
data sets, it has been most widely used in
genomic
Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
applications. It allows one to define modules (clusters), intramodular hubs, and network nodes with regard to module membership, to study the relationships between co-expression modules, and to compare the network topology of different networks (differential network analysis). WGCNA can be used as a
data reduction technique (related to oblique
factor analysis
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed ...
), as a
clustering method (fuzzy clustering), as a
feature
Feature may refer to:
Computing
* Feature (CAD), could be a hole, pocket, or notch
* Feature (computer vision), could be an edge, corner or blob
* Feature (software design) is an intentional distinguishing characteristic of a software item ...
selection method (e.g. as gene screening method), as a framework for integrating complementary (genomic) data (based on weighted correlations between quantitative variables), and as a
data exploratory technique.
Although WGCNA incorporates traditional data exploratory techniques, its intuitive network language and analysis framework transcend any standard analysis technique. Since it uses network methodology and is well suited for integrating complementary genomic data sets, it can be interpreted as
systems biologic or systems genetic data analysis method. By selecting intramodular hubs in consensus modules, WGCNA also gives rise to network based
meta analysis
A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Meta-analyses can be performed when there are multiple scientific studies addressing the same question, with each individual study reporting me ...
techniques.
History
The WGCNA method was developed by
Steve Horvath
Steve Horvath is a German–American aging researcher, geneticist, and biostatistician. As the familyname Horváth indicates, he is of hungarian ancestry.
He is a professor at the University of California, Los Angeles known for developing the Epi ...
, a professor of
human genetics
Human genetics is the study of inheritance as it occurs in human beings. Human genetics encompasses a variety of overlapping fields including: classical genetics, cytogenetics, molecular genetics, biochemical genetics, genomics, population gene ...
at the David Geffen School of Medicine at
UCLA
The University of California, Los Angeles (UCLA) is a public land-grant research university in Los Angeles, California. UCLA's academic roots were established in 1881 as a teachers college then known as the southern branch of the California St ...
and of
biostatistics
Biostatistics (also known as biometry) are the development and application of statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experime ...
at the
UCLA
The University of California, Los Angeles (UCLA) is a public land-grant research university in Los Angeles, California. UCLA's academic roots were established in 1881 as a teachers college then known as the southern branch of the California St ...
Fielding School of Public Health and his colleagues at UCLA, and (former) lab members (in particular Peter Langfelder, Bin Zhang, Jun Dong). Much of the work arose from collaborations with applied researchers. In particular, weighted correlation networks were developed in joint discussions with cancer researchers
Paul Mischel
Paul S. Mischel (born July 13, 1962) is an American physician-scientist whose laboratory has made pioneering discoveries in the pathogenesis of human cancer. He is currently a Professor and Vice Chair of Research for the Department of Pathology and ...
, Stanley F. Nelson, and neuroscientists
Daniel H. Geschwind, Michael C. Oldham (according to the acknowledgement section in
). There is a vast literature on dependency networks, scale free networks and coexpression networks.
Comparison between weighted and unweighted correlation networks
A weighted correlation network can be interpreted as special case of a
weighted network A weighted network is a network where the ties among nodes have weights assigned to them. A network is a system whose elements are somehow connected. The elements of a system are represented as nodes (also known as actors or vertices) and the connec ...
,
dependency network
The dependency network approach provides a system level analysis of the activity and topology of directed networks. The approach extracts causal topological relations between the network's nodes (when the network structure is analyzed), and provide ...
or correlation network. Weighted correlation network analysis can be attractive for the following reasons:
* The network construction (based on soft thresholding the
correlation coefficient
A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two components ...
) preserves the continuous nature of the underlying correlation information. For example, weighted correlation networks that are constructed on the basis of correlations between numeric variables do not require the choice of a hard threshold. Dichotomizing information and (hard)-thresholding may lead to information loss.
* The network construction gives highly robust results with respect to different choices of the soft threshold.
In contrast, results based on unweighted networks, constructed by thresholding a pairwise association measure, often strongly depend on the threshold.
* Weighted correlation networks facilitate a geometric interpretation based on the angular interpretation of the correlation, chapter 6 in.
* Resulting network statistics can be used to enhance standard data-mining methods such as cluster analysis since (dis)-similarity measures can often be transformed into weighted networks;
see chapter 6 in.
* WGCNA provides powerful module preservation statistics which can be used to quantify similarity to another condition. Also module preservation statistics allow one to study differences between the modular structure of networks.
* Weighted networks and correlation networks can often be approximated by "factorizable" networks.
Such approximations are often difficult to achieve for sparse, unweighted networks. Therefore, weighted (correlation) networks allow for a parsimonious parametrization (in terms of modules and module membership) (chapters 2, 6 in
) and.
Method
First, one defines a gene co-expression
similarity measure
In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such meas ...
which is used to define the network. We denote the gene co-expression similarity measure of a pair of genes i and j by
. Many co-expression studies use the absolute value of the correlation as an unsigned co-expression similarity measure,
where gene expression profiles
and
consist of the expression of genes i and j across multiple samples. However, using the absolute value of the correlation may obfuscate biologically relevant information, since no distinction is made between gene repression and activation. In contrast, in signed networks the similarity between genes reflects the sign of the correlation of their expression profiles. To define a signed co-expression measure between gene expression profiles
and
, one can use a simple transformation of the correlation:
As the unsigned measure
, the signed similarity
takes on a value between 0 and 1. Note that the unsigned similarity between two oppositely expressed genes (
) equals 1 while it equals 0 for the signed similarity. Similarly, while the unsigned co-expression measure of two genes with zero correlation remains zero, the signed similarity equals 0.5.
Next, an
adjacency matrix
In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
In the special case of a finite simp ...
(network),
, is used to quantify how strongly genes are connected to one another.
is defined by thresholding the co-expression similarity matrix
. 'Hard' thresholding (dichotomizing) the similarity measure
results in an unweighted gene co-expression network. Specifically an unweighted network adjacency is defined to be 1 if
and 0 otherwise.
Because hard thresholding encodes gene connections in a binary fashion, it can be sensitive to the choice of the threshold and result in the loss of co-expression information.
The continuous nature of the co-expression information can be preserved by employing soft thresholding, which results in a weighted network. Specifically, WGCNA uses the following power function assess their connection strength:
,
where the power
is the soft thresholding parameter. The default values
and
are used for unsigned and signed networks, respectively. Alternatively,
can be chosen using the
scale-free topology criterion which amounts to choosing the smallest value of
such that approximate scale free topology is reached.
Since
, the weighted network adjacency is linearly related to the co-expression similarity on a logarithmic scale. Note that a high power
transforms high similarities into high adjacencies, while pushing low similarities towards 0. Since this soft-thresholding procedure applied to a pairwise correlation matrix leads to weighted adjacency matrix, the ensuing analysis is referred to as weighted gene co-expression network analysis.
A major step in the module centric analysis is to cluster genes into network modules using a network proximity measure. Roughly speaking, a pair of genes has a high proximity if it is closely interconnected. By convention, the maximal proximity between two genes is 1 and the minimum proximity is 0. Typically, WGCNA uses the topological overlap measure (TOM) as proximity.
which can also be defined for weighted networks.
The TOM combines the adjacency of two genes and the connection strengths these two genes share with other "third party" genes. The TOM is a highly robust measure of network interconnectedness (proximity). This proximity is used as input of average linkage hierarchical clustering. Modules are defined as branches of the resulting cluster tree using the dynamic branch cutting approach.
Next the genes inside a given module are summarized with the module
eigengene, which can be considered as the best summary of the standardized module expression data.
The module eigengene of a given module is defined as the first principal component of the standardized expression profiles. Eigengenes define robust biomarkers,
and can be used as features in complex
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
models such as
Bayesian networks
A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bay ...
. To find modules that relate to a clinical trait of interest, module eigengenes are correlated with the clinical trait of interest, which gives rise to an eigengene significance measure. Eigengenes can be used as features in more complex predictive models including decision trees and Bayesian networks.
One can also construct co-expression networks between module eigengenes (eigengene networks), i.e. networks whose nodes are modules.
To identify intramodular hub genes inside a given module, one can use two types of connectivity measures. The first, referred to as
, is defined based on correlating each gene with the respective module eigengene. The second, referred to as kIN, is defined as a sum of adjacencies with respect to the module genes. In practice, these two measures are equivalent.
To test whether a module is preserved in another data set, one can use various network statistics, e.g.
.
Applications
WGCNA has been widely used for analyzing gene expression data (i.e. transcriptional data), e.g. to find intramodular hub genes.
Such as, WGCNA study reveals novel transcription factors are associated with
Bisphenol A (BPA) dose-response.
It is often used as data reduction step in systems genetic applications where modules are represented by "module eigengenes" e.g.
Module eigengenes can be used to correlate modules with clinical traits. Eigengene networks are coexpression networks between module eigengenes (i.e. networks whose nodes are modules) .
WGCNA is widely used in neuroscientific applications, e.g.
and for analyzing genomic data including
microarray
A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silicon t ...
data, single cell
RNA-Seq
RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing c ...
data
DNA methylation
DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter, DNA methylation typically acts t ...
data,
miRNA data, peptide counts
and
microbiota
Microbiota are the range of microorganisms that may be commensal, symbiotic, or pathogenic found in and on all multicellular organisms, including plants. Microbiota include bacteria, archaea, protists, fungi, and viruses, and have been found t ...
data (16S rRNA gene sequencing).
Other applications include brain imaging data, e.g.
functional MRI
Functional magnetic resonance imaging or functional MRI (fMRI) measures brain activity by detecting changes associated with blood flow. This technique relies on the fact that cerebral blood flow and neuronal activation are coupled. When an area o ...
data.
R software package
The WGCNA
R software package
provides functions for carrying out all aspects of weighted network analysis (module construction, hub gene selection, module preservation statistics, differential network analysis, network statistics). The WGCNA package is available from the Comprehensive
R Archive Network (CRAN), the standard repository for
R add-on packages.
References
{{reflist, 2, refs=
[{{cite journal , last1=Chen , first1=Y , last2=Zhu , first2=J , last3=Lum , first3=PY , last4=Yang , first4=X , last5=Pinto , first5=S , last6=MacNeil , first6=DJ , last7=Zhang , first7=C , last8=Lamb , first8=J , last9=Edwards , first9=S , last10=Sieberts , first10=SK , last11=Leonardson , first11=A , last12=Castellini , first12=LW , last13=Wang , first13=S , last14=Champy , first14=MF , last15=Zhang , first15=B , last16=Emilsson , first16=V , last17=Doss , first17=S , last18=Ghazalpour , first18=A , last19=Horvath , first19=S , last20=Drake , first20=TA , last21=Lusis , first21=AJ , last22=Schadt , first22=EE , title=Variations in DNA elucidate molecular networks that cause disease , journal=Nature , date=27 March 2008 , volume=452 , issue=7186 , pages=429–35 , doi=10.1038/nature06757 , pmid=18344982, name-list-style=vanc, pmc=2841398 , bibcode=2008Natur.452..429C ]
[{{cite journal , last1=Dong , first1=J , last2=Horvath , first2=S , title=Understanding network concepts in modules , journal=BMC Systems Biology , date=4 June 2007 , volume=1 , pages=24 , doi=10.1186/1752-0509-1-24 , pmid=17547772, pmc=3238286 , name-list-style=vanc]
[{{cite journal , last1=Foroushani , first1=Amir , last2=Agrahari , first2=Rupesh , last3=Docking , first3=Roderick , last4=Chang , first4=Linda , last5=Duns , first5=Gerben , last6=Hudoba , first6=Monika , last7=Karsan , first7=Aly , last8=Zare , first8=Habil , title=Large-scale gene network analysis reveals the significance of extracellular matrix pathway and homeobox genes in acute myeloid leukemia: an introduction to the Pigengene package and its applications , journal=BMC Medical Genomics , date=16 March 2017 , volume=10 , issue=1 , pages=16 , doi=10.1186/s12920-017-0253-6, pmid=28298217 , pmc=5353782 , name-list-style=vanc]
[{{cite journal , last1=Hawrylycz , first1=MJ , last2=Lein , first2=ES , last3=Guillozet-Bongaarts , first3=AL , last4=Shen , first4=EH , last5=Ng , first5=L , last6=Miller , first6=JA , last7=van de Lagemaat , first7=LN , last8=Smith , first8=KA , last9=Ebbert , first9=A , last10=Riley , first10=ZL , last11=Abajian , first11=C , last12=Beckmann , first12=CF , last13=Bernard , first13=A , last14=Bertagnolli , first14=D , last15=Boe , first15=AF , last16=Cartagena , first16=PM , last17=Chakravarty , first17=MM , last18=Chapin , first18=M , last19=Chong , first19=J , last20=Dalley , first20=RA , last21=David Daly , first21=B , last22=Dang , first22=C , last23=Datta , first23=S , last24=Dee , first24=N , last25=Dolbeare , first25=TA , last26=Faber , first26=V , last27=Feng , first27=D , last28=Fowler , first28=DR , last29=Goldy , first29=J , last30=Gregor , first30=BW , last31=Haradon , first31=Z , last32=Haynor , first32=DR , last33=Hohmann , first33=JG , last34=Horvath , first34=S , last35=Howard , first35=RE , last36=Jeromin , first36=A , last37=Jochim , first37=JM , last38=Kinnunen , first38=M , last39=Lau , first39=C , last40=Lazarz , first40=ET , last41=Lee , first41=C , last42=Lemon , first42=TA , last43=Li , first43=L , last44=Li , first44=Y , last45=Morris , first45=JA , last46=Overly , first46=CC , last47=Parker , first47=PD , last48=Parry , first48=SE , last49=Reding , first49=M , last50=Royall , first50=JJ , last51=Schulkin , first51=J , last52=Sequeira , first52=PA , last53=Slaughterbeck , first53=CR , last54=Smith , first54=SC , last55=Sodt , first55=AJ , last56=Sunkin , first56=SM , last57=Swanson , first57=BE , last58=Vawter , first58=MP , last59=Williams , first59=D , last60=Wohnoutka , first60=P , last61=Zielke , first61=HR , last62=Geschwind , first62=DH , last63=Hof , first63=PR , last64=Smith , first64=SM , last65=Koch , first65=C , last66=Grant , first66=SGN , last67=Jones , first67=AR , title=An anatomically comprehensive atlas of the adult human brain transcriptome , journal=Nature , date=20 September 2012 , volume=489 , issue=7416 , pages=391–399 , doi=10.1038/nature11405 , pmid=22996553, pmc=4243026 , name-list-style=vanc, bibcode=2012Natur.489..391H ]
[{{cite journal , last1=Horvath , first1=S , last2=Zhang , first2=B , last3=Carlson , first3=M , last4=Lu , first4=KV , last5=Zhu , first5=S , last6=Felciano , first6=RM , last7=Laurance , first7=MF , last8=Zhao , first8=W , last9=Shu , first9=Q , last10=Lee , first10=Y , last11=Scheck , first11=AC , last12=Liau , first12=LM , last13=Wu , first13=H , last14=Geschwind , first14=DH , last15=Febbo , first15=PG , last16=Kornblum , first16=HI , last17=Cloughesy , first17=TF , last18=Nelson , first18=SF , last19=Mischel , first19=PS , authorlink17=Timothy Cloughesy , year=2006 , title=Analysis of Oncogenic Signaling Networks in Glioblastoma Identifies ASPM as a Novel Molecular Target , journal=PNAS , volume=103 , issue=46 , pages=17402–17407 , doi=10.1073/pnas.0608396103 , pmid=17090670 , pmc=1635024 , name-list-style=vanc, bibcode=2006PNAS..10317402H , doi-access=free ]
[{{cite journal , last1=Horvath , first1=S , last2=Dong , first2=J , year=2008 , title=Geometric Interpretation of Gene Coexpression Network Analysis , journal=PLOS Computational Biology , volume=4 , issue=8 , page=e1000117 , pmid=18704157 , doi=10.1371/journal.pcbi.1000117 , pmc=2446438, name-list-style=vanc, bibcode=2008PLSCB...4E0117H ]
[{{cite book , last1=Horvath , first1=Steve , title=Weighted Network Analysis: Application in Genomics and Systems Biology , date=2011 , publisher=Springer , location=New York, NY , isbn=978-1-4419-8818-8 , name-list-style=vanc]
[{{cite journal , last1=Horvath , first1=S , last2=Zhang , first2=Y , last3=Langfelder , first3=P , last4=Kahn , first4=RS , last5=Boks , first5=MP , last6=van Eijk , first6=K , last7=van den Berg , first7=LH , last8=Ophoff , first8=RA , title=Aging effects on DNA methylation modules in human brain and blood tissue , journal=Genome Biology , date=3 October 2012 , volume=13 , issue=10 , pages=R97 , doi=10.1186/gb-2012-13-10-r97 , pmid=23034122, pmc=4053733 , name-list-style=vanc]
[{{cite journal , last1=Langfelder , first1=P , last2=Zhang , first2=B , last3=Horvath , first3=S , year=2007 , title=Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut library for R , url= https://semanticscholar.org/paper/20d16d229ed5fddcb9ac1c3a7925582c286d3927, journal=Bioinformatics , volume=24 , issue= 5, pages=719–20 , pmid=18024473 , doi=10.1093/bioinformatics/btm563, s2cid=1095190 , name-list-style=vanc]
[{{cite journal , last1=Langfelder , first1=P , last2=Horvath , first2=S , year=2007 , title=Eigengene networks for studying the relationships between co-expression modules , journal=BMC Systems Biology , volume=2007 , issue=1 , page=54 , pmid=18031580 , doi=10.1186/1752-0509-1-54 , pmc=2267703, name-list-style=vanc]
[{{cite journal , last1=Langfelder , first1=P , last2=Horvath , first2=S , title=WGCNA: an R package for weighted correlation network analysis , journal=BMC Bioinformatics , date=29 December 2008 , volume=9 , pages=559 , doi=10.1186/1471-2105-9-559 , pmid=19114008 , pmc=2631488, name-list-style=vanc]
[{{cite journal , last1=Langfelder , first1=P , last2=Luo , first2=R , last3=Oldham , first3=MC , last4=Horvath , first4=S , title=Is my network module preserved and reproducible? , journal=PLOS Computational Biology , date=20 January 2011 , volume=7 , issue=1 , pages=e1001057 , doi=10.1371/journal.pcbi.1001057 , pmid=21283776, pmc=3024255 , name-list-style=vanc, bibcode=2011PLSCB...7E1057L ]
[{{cite journal , last1=Langfelder , first1=Peter , last2=Mischel , first2=Paul S. , last3=Horvath , first3=Steve , last4=Ravasi , first4=Timothy , title=When Is Hub Gene Selection Better than Standard Meta-Analysis? , journal=PLOS ONE , date=17 April 2013 , volume=8 , issue=4 , pages=e61505 , doi=10.1371/journal.pone.0061505, pmid=23613865 , pmc=3629234 , name-list-style=vanc, bibcode=2013PLoSO...861505L , doi-access=free ]
[{{cite journal , last1=Mumford , first1=JA , last2=Horvath , first2=S , last3=Oldham , first3=MC , last4=Langfelder , first4=P , last5=Geschwind , first5=DH , last6=Poldrack , first6=RA , title=Detecting network modules in fMRI time series: a weighted network analysis approach , journal=NeuroImage , date=1 October 2010 , volume=52 , issue=4 , pages=1465–76 , doi=10.1016/j.neuroimage.2010.05.047 , pmid=20553896, name-list-style=vanc, pmc=3632300 ]
[{{cite journal , last1=Oldham , first1=MC , last2=Langfelder , first2=P , last3=Horvath , first3=S , title=Network methods for describing sample relationships in genomic datasets: application to Huntington's disease , journal=BMC Systems Biology , date=12 June 2012 , volume=6 , pages=63 , doi=10.1186/1752-0509-6-63 , pmid=22691535, pmc=3441531 , name-list-style=vanc]
[{{cite journal , last1=Plaisier , first1=Christopher L. , last2=Horvath , first2=Steve , last3=Huertas-Vazquez , first3=Adriana , last4=Cruz-Bautista , first4=Ivette , last5=Herrera , first5=Miguel F. , last6=Tusie-Luna , first6=Teresa , last7=Aguilar-Salinas , first7=Carlos , last8=Pajukanta , first8=Päivi , last9=Storey , first9=John D. , title=A Systems Genetics Approach Implicates USF1, FADS3, and Other Causal Candidate Genes for Familial Combined Hyperlipidemia , journal=PLOS Genetics , date=11 September 2009 , volume=5 , issue=9 , pages=e1000642 , doi=10.1371/journal.pgen.1000642, pmid=19750004 , pmc=2730565 , name-list-style=vanc]
[{{cite journal , last1=Ranola , first1=JM , last2=Langfelder , first2=P , last3=Lange , first3=K , last4=Horvath , first4=S , title=Cluster and propensity based approximation of a network , journal=BMC Systems Biology , date=14 March 2013 , volume=7 , pages=21 , doi=10.1186/1752-0509-7-21 , pmid=23497424, pmc=3663730 , name-list-style=vanc]
[{{cite journal , last1=Ravasz , first1=E , last2=Somera , first2=AL , last3=Mongru , first3=DA , last4=Oltvai , first4=ZN , last5=Barabasi , first5=AL , year=2002 , title=Hierarchical organization of modularity in metabolic networks , journal=Science , volume=297 , issue=5586 , pages=1551–1555 , doi=10.1126/science.1073374 , pmid=12202830 , arxiv=cond-mat/0209244 , name-list-style=vanc, bibcode=2002Sci...297.1551R , s2cid=14452443 ]
[{{cite journal , last1=Shirasaki , first1=DI , last2=Greiner , first2=ER , last3=Al-Ramahi , first3=I , last4=Gray , first4=M , last5=Boontheung , first5=P , last6=Geschwind , first6=DH , last7=Botas , first7=J , last8=Coppola , first8=G , last9=Horvath , first9=S , last10=Loo , first10=JA , last11=Yang , first11=XW , title=Network organization of the huntingtin proteomic interactome in mammalian brain , journal=Neuron , date=12 July 2012 , volume=75 , issue=1 , pages=41–57 , doi=10.1016/j.neuron.2012.05.024 , pmid=22794259, pmc=3432264 , name-list-style=vanc]
[{{cite Q , Q21559533]
[{{cite journal , last1=Voineagu , first1=I , last2=Wang , first2=X , last3=Johnston , first3=P , last4=Lowe , first4=JK , last5=Tian , first5=Y , last6=Horvath , first6=S , last7=Mill , first7=J , last8=Cantor , first8=RM , last9=Blencowe , first9=BJ , last10=Geschwind , first10=DH , title=Transcriptomic analysis of autistic brain reveals convergent molecular pathology , journal=Nature , date=25 May 2011 , volume=474 , issue=7351 , pages=380–4 , doi=10.1038/nature10110 , pmid=21614001, pmc=3607626 , name-list-style=vanc]
[{{cite journal , last1=Xue , first1=Z , last2=Huang , first2=K , last3=Cai , first3=C , last4=Cai , first4=L , last5=Jiang , first5=CY , last6=Feng , first6=Y , last7=Liu , first7=Z , last8=Zeng , first8=Q , last9=Cheng , first9=L , last10=Sun , first10=YE , last11=Liu , first11=JY , last12=Horvath , first12=S , last13=Fan , first13=G , title=Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing , journal=Nature , date=29 August 2013 , volume=500 , issue=7464 , pages=593–7 , doi=10.1038/nature12364 , pmid=23892778, pmc=4950944 , name-list-style=vanc, bibcode=2013Natur.500..593X ]
[{{cite journal , last1=Yip , first1=AM , last2=Horvath , first2=S , title=Gene network interconnectedness and the generalized topological overlap measure , url= , journal=BMC Bioinformatics , date=24 January 2007 , volume=8 , pages=22 , doi=10.1186/1471-2105-8-22 , pmid=17250769, pmc=1797055 , name-list-style=vanc]
[{{cite journal , last1=Zhang , first1=B , last2=Horvath , first2=S , title=A general framework for weighted gene co-expression network analysis , journal=Statistical Applications in Genetics and Molecular Biology , date=2005 , volume=4 , pages=17 , doi=10.2202/1544-6115.1128 , pmid=16646834 , url=http://dibernardo.tigem.it/files/papers/2008/zhangbin-statappsgeneticsmolbio.pdf, name-list-style=vanc, citeseerx=10.1.1.471.9599 , s2cid=7756201 ]
Bioinformatics
Data mining