Pathway Analysis
   HOME

TheInfoList



OR:

Pathway is the term from molecular biology for a curated schematic representation of a well characterized segment of the molecular physiological machinery, such as a
metabolic pathway In biochemistry, a metabolic pathway is a linked series of chemical reactions occurring within a cell. The reactants, products, and intermediates of an enzymatic reaction are known as metabolites, which are modified by a sequence of chemical reac ...
describing an enzymatic process within a cell or tissue or a
signaling pathway In biology, cell signaling (cell signalling in British English) or cell communication is the ability of a cell to receive, process, and transmit signals with its environment and with itself. Cell signaling is a fundamental property of all cellula ...
model representing a regulatory process that might, in its turn, enable a metabolic or another regulatory process downstream. A typical pathway model starts with an extracellular
signaling molecule In biology, cell signaling (cell signalling in British English) or cell communication is the ability of a cell to receive, process, and transmit signals with its environment and with itself. Cell signaling is a fundamental property of all cellula ...
that activates a specific
receptor Receptor may refer to: * Sensory receptor, in physiology, any structure which, on receiving environmental stimuli, produces an informative nerve impulse *Receptor (biochemistry), in biochemistry, a protein molecule that receives and responds to a ...
, thus triggering a chain of molecular interactions. A pathway is most often represented as a relatively small
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
with gene, protein, and/or small molecule
nodes In general, a node is a localized swelling (a "knot") or a point of intersection (a Vertex (graph theory), vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two ...
connected by edges of known functional relations. While a simpler pathway might appear as a chain, complex pathway topologies with loops and alternative routes are much more common. Computational analyses employ special formats of pathway representation. In the simplest form, however, a pathway might be represented as a list of member molecules with order and relations unspecified. Such a representation, generally called Functional Gene Set (FGS), can also refer to other functionally characterised groups such as
protein families A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be ...
,
Gene Ontology The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and g ...
(GO) and
Disease Ontology The Disease Ontology (DO) is a formal ontology of human disease. ThDisease Ontology projectis hosted at thInstitute for Genome Sciencesat the University of Maryland School of Medicine. The Disease Ontology project was initially developed in 2003 ...
(DO) terms etc. In bioinformatics, methods of pathway analysis might be used to identify key
genes In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
/
proteins Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
within a previously known pathway in relation to a particular experiment / pathological condition or building a pathway ''de novo'' from proteins that have been identified as key affected elements. By examining changes in e.g.
gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. The ...
in a pathway, its biological activity can be explored. However most frequently, pathway analysis refers to a method of initial characterization and interpretation of an experimental (or pathological) condition that was studied with
omics The branches of science known informally as omics are various disciplines in biology whose names end in the suffix '' -omics'', such as genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics. Omics aims at the collect ...
tools or
genome-wide association study In genomics, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an observational study of a genome-wide set of Single-nucleotide polymorphism, genetic variants in different i ...
. Such studies might identify long lists of altered genes. A visual inspection is then challenging and the information is hard to summarize, since the altered genes map to a broad range of pathways, processes, and molecular functions (with a large gene fraction lacking any annotation). In such situations, the most productive way of exploring the list is to identify enrichment of specific s in it. The general approach of enrichment analyses is to identify FGSs, members of which were most ''frequently'' or most ''strongly'' altered in the given condition, in comparison to a gene set sampled by chance. In other words, enrichment can map canonical prior knowledge structured in the form of FGSs to the condition represented by altered genes.


Use

The data for pathway analysis come from high throughput biology. This includes high throughput
sequencing In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succ ...
data and
microarray A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silicon t ...
data. Before pathway analysis can be done, each gene's alteration should be evaluated using the
omics The branches of science known informally as omics are various disciplines in biology whose names end in the suffix '' -omics'', such as genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics. Omics aims at the collect ...
dataset in either quantitative ( differential expression analysis) or qualitative (detection of somatic
point mutations A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequences ...
or mapping neighbor genes to a disease-associated SNP). It is also possible to combine datasets from different research groups or multiple omics platform with a meta-analysis and cross-platform regularization. Further, a list where gene identifiers are accompanied by the alteration attributes is subjected to a pathway analysis. By using pathway analysis software, researchers can determine which s are enriched with the altered experimental genes For example, pathway analysis of several independent microarray experiments (
meta-analysis A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Meta-analyses can be performed when there are multiple scientific studies addressing the same question, with each individual study reporting me ...
) helped to discover potential
biomarkers In biomedical contexts, a biomarker, or biological marker, is a measurable indicator of some biological state or condition. Biomarkers are often measured and evaluated using blood, urine, or soft tissues to examine normal biological processes, p ...
in a single pathway important for fast-to-slow switch fiber type transition in
Duchenne muscular dystrophy Duchenne muscular dystrophy (DMD) is a severe type of muscular dystrophy that primarily affects boys. Muscle weakness usually begins around the age of four, and worsens quickly. Muscle loss typically occurs first in the thighs and pelvis fol ...
. In another study
meta-analysis A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Meta-analyses can be performed when there are multiple scientific studies addressing the same question, with each individual study reporting me ...
identified two
biomarkers In biomedical contexts, a biomarker, or biological marker, is a measurable indicator of some biological state or condition. Biomarkers are often measured and evaluated using blood, urine, or soft tissues to examine normal biological processes, p ...
in blood of patients with
Parkinson's disease Parkinson's disease (PD), or simply Parkinson's, is a long-term degenerative disorder of the central nervous system that mainly affects the motor system. The symptoms usually emerge slowly, and as the disease worsens, non-motor symptoms becom ...
, which can be useful for monitoring the disease. Candidate gene alleles causative of Alzheimer's disease and elderly dementia where first discovered via
genome-wide association study In genomics, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS), is an observational study of a genome-wide set of Single-nucleotide polymorphism, genetic variants in different i ...
and further validated with network enrichment analysis against consisting of known Alzheimer's genes.


Databases

Pathway collections and interaction networks constitute the
knowledge base A knowledge base (KB) is a technology used to store complex structured and unstructured information used by a computer system. The initial use of the term was in connection with expert systems, which were the first knowledge-based systems. Ori ...
required for a pathway analysis. Pathway content, structure, format, and functionality vary between different database resources such as
KEGG KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis i ...
,
WikiPathways WikiPathways is a community resource for contributing and maintaining content dedicated to biological pathways. Any registered WikiPathways user can contribute, and anybody can become a registered user. Contributions are monitored by a group of a ...
, or
Reactome Reactome is a free online database of biological pathways. There are several Reactomes that concentrate on specific organisms, the largest of these is focused on human biology, the following description concentrates on the human Reactome. It is au ...
. Also exist proprietary pathways collections used by e.g. Pathway Studio and Ingenuity Pathway Analysis tools. Public online tools can provide pre-compiled and ready-to-go menus of pathways and
networks Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...
from different open sources (e.g
EviNet
.


Methods and software

Pathway analysis software can be found in the form of desktop programs, web-based applications, or packages coded in such languages as R and
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
and shared openly through the BioConductor and GitHub projects. The methodology of pathway analysis evolves fast and the classification is still discussable, with the following main categories of pathway enrichment analysis applicable to high-throughput data:


Over-representation analysis (ORA)

This method measures the overlap between, on the one hand, a set of genes (or proteins) in an and, on the other hand, a list of most altered genes generally called Altered Gene Sets (AGS). A typical AGS example is a list of top ''N'' differentially expressed genes from an
RNA-Seq RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing c ...
assay. The basic assumption behind ORA is that a biologically relevant pathway can be identified by excess of genes in it compared to the number expected by chance. The aim of ORA is to identify such enriched pathways, judging by
statistical significance In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the p ...
of the overlap between FGS and AGS as determined either by an appropriate statistic, such as
Jaccard index The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It was developed by Grove Karl Gilbert in 1884 as his ratio of verification (v) and now is freque ...
or by a statistical test producing p-values (
Fisher's exact test Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. It is named after its inventor, Ronald Fisher, ...
or the test using
hypergeometric distribution In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of k successes (random draws for which the object drawn has a specified feature) in n draws, ''without'' ...
).


Functional class scoring (FCS)

This method identifies by considering their relative positions in the full list of genes studied in the experiment. This full list should be therefore ranked in advance by a statistic (such as
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
expression fold-change,
Student's t-test A ''t''-test is any statistical hypothesis test in which the test statistic follows a Student's ''t''-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of ...
etc.) or a p-value - while watching the direction of fold change, since p-values are non-directional. Thus FCS takes into account every FGS gene regardless of its statistical significance and does not require pre-compiled . One of the first and most popular methods deploying the FCS approach was the Gene Set Enrichment Analysis (GSEA).


Pathway topology analysis (PTA)

Similarly to , PTA accounts for high-throughput data for every gene. In addition, specific topological information is used about role, position, and interaction directions of the pathway genes. This requires additional input data from a pathway database in a pre-specified format, such as KEGG Markup Language
KGML
. Using this information, PTA estimates a pathway significance by considering how much each individual gene alteration might have affected the whole pathway. Multiple alteration types can be used in parallel (somatic copy-number variations,
point mutations A point mutation is a genetic mutation where a single nucleotide base is changed, inserted or deleted from a DNA or RNA sequence of an organism's genome. Point mutations have a variety of effects on the downstream protein product—consequences ...
etc.) when available. The set of PTA methods includes the Impact Analysis, EnrichNet, GGEA, and TopoGSA.


Network enrichment analysis (NEA)

Network enrichment analysis (NEA) has been an extension of gene-set enrichment analysis to the domain of global gene networks The major principle of NEA can be understood in comparison with , where enrichment of in genes of the is determined by how many genes are directly shared by AGS and FGS. In NEA, on the contrary, the global network is searched for network edges that connect any genes of AGS with any genes of FGS. Since enrichment significance is influenced by the highly variable node degrees of individual AGS and FGS genes, it should be determined by a dedicated statistical test, which compares the observed number of network edges to the number expected by chance in the same network context. Some valuable properties of NEA are that: # it is more robust to biological and technical variability between sample replicates; # genes may not necessarily be annotated as pathway members; # members do not have to be altered themselves, but still are accounted for due to possessing network links to AGS genes.


Commercial solutions

Beyond open-source tools, such as
STRING String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
or
Cytoscape Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and integrating with gene expression profiles and other state data. Additional features are available as plugins. Plugins are availabl ...
, a number of companies sell licensed software products to analyse gene sets. While most of the publicly available solutions use online and public pathway collections, the commercial products mostly promote own, proprietary pathways and networks. The choice of such products might be driven by customers' skills, financial and time resources, and needs. Ingenuity, for example, maintains a knowledge base for comparative analysis of gene expression data. Pathways Studio is commercial software which allows searching for biologically relevant facts, analyze experiments, and create pathways. Pathways Studio ViewerPathway Studio Viewer
/ref> is a free resource from the same company for presenting the Pathway Studio interactive pathway collection and database. Two commercial solutions offer : iPathwayGuide fro
Advaita Corporation
and MetaCore from Thomson Reuters. Advaita uses the peer reviewed Impact Analysis method while the MetaCore method is unpublished.


Limitations


Lack of annotations

Application of pathway analysis methods depends on annotations found in existing
databases In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
, such as gene set membership in pathways, pathway topology, presence of genes in the global network etc. These annotations, however, are far from being complete and have highly variable degrees of confidence. In addition, such information is usually general, i.e. deprived of e.g. cell type, compartment, or developmental context. Therefore, interpretation of pathway analysis results for
omics The branches of science known informally as omics are various disciplines in biology whose names end in the suffix '' -omics'', such as genomics, proteomics, metabolomics, metagenomics, phenomics and transcriptomics. Omics aims at the collect ...
datasets should be done with caution Partially, the problem can be addressed by analysing larger gene sets in a more, such as big pathway collections or global interaction networks.


See also

*
Biological pathway A biological pathway is a series of interactions among molecules in a cell that leads to a certain product or a change in a cell. Such a pathway can trigger the assembly of new molecules, such as a fat or protein. Pathways can also turn genes on a ...


References

{{reflist Bioinformatics software