HOME

TheInfoList



OR:

Computational biology refers to the use of data analysis,
mathematical modeling A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used in the natural sciences (such as physics, b ...
and computational simulations to understand biological systems and relationships. An intersection of
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
,
biology Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary i ...
, and big data, the field also has foundations in
applied mathematics Applied mathematics is the application of mathematical methods by different fields such as physics, engineering, medicine, biology, finance, business, computer science, and industry. Thus, applied mathematics is a combination of mathemati ...
, chemistry, and
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinian friar wor ...
. It differs from
biological computing Biological computers use biologically derived molecules — such as DNA and/or proteins — to perform digital or real computations. The development of biocomputers has been made possible by the expanding new science of nanobiotechnology. The ter ...
, a subfield of computer engineering which uses bioengineering to build computers.


History

Bioinformatics, the analysis of informatics processes in
biological system A biological system is a complex network which connects several biologically relevant entities. Biological organization spans several scales and are determined based different structures depending on what the system is. Examples of biological syst ...
s, began in the early 1970s. At this time, research in
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
was using
network model The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, ...
s of the human brain in order to generate new
algorithms In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
. This use of biological data pushed biological researchers to use computers to evaluate and compare large data sets in their own field. By 1982, researchers shared information via
punch cards A punched card (also punch card or punched-card) is a piece of stiff paper that holds digital data represented by the presence or absence of holes in predefined positions. Punched cards were once common in data processing applications or to di ...
. The amount of data grew exponentially by the end of the 1980s, requiring new computational methods for quickly interpreting relevant information. Perhaps the best-known example of computational biology, the Human Genome Project, officially began in 1990. By 2003, the project had mapped around 85% of the human genome, satisfying its initial goals. Work continued, however, and by 2021 level "complete genome" was reached with only 0.3% remaining bases covered by potential issues. The missing Y
chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins are ...
was added in January 2022. Since the late 1990s, computational biology has become an important part of biology, leading to numerous subfields. Today, the
International Society for Computational Biology The International Society for Computational Biology (ISCB) is a scholarly society for researchers in computational biology and bioinformatics. The society was founded in 1997 to provide a stable financial home for the Intelligent Systems for Mole ...
recognizes 21 different 'Communities of Special Interest', each representing a slice of the larger field. In addition to helping sequence the human genome, computational biology has helped create accurate
model A model is an informative representation of an object, person or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a measure. Models c ...
s of the
human brain The human brain is the central organ (anatomy), organ of the human nervous system, and with the spinal cord makes up the central nervous system. The brain consists of the cerebrum, the brainstem and the cerebellum. It controls most of the act ...
, map the 3D structure of genomes, and model biological systems.


Applications


Anatomy

Computational anatomy is the study of anatomical shape and form at the visible or gross anatomical 50-100 \mu scale of
morphology Morphology, from the Greek and meaning "study of shape", may refer to: Disciplines * Morphology (archaeology), study of the shapes or forms of artifacts * Morphology (astronomy), study of the shape of astronomical objects such as nebulae, galaxies ...
. It involves the development of computational mathematical and data-analytical methods for modeling and simulating biological structures. It focuses on the anatomical structures being imaged, rather than the medical imaging devices. Due to the availability of dense 3D measurements via technologies such as magnetic resonance imaging, computational anatomy has emerged as a subfield of medical imaging and bioengineering for extracting anatomical coordinate systems at the morpheme scale in 3D. The original formulation of computational anatomy is as a generative model of shape and form from exemplars acted upon via transformations. The
diffeomorphism In mathematics, a diffeomorphism is an isomorphism of smooth manifolds. It is an invertible function that maps one differentiable manifold to another such that both the function and its inverse are differentiable. Definition Given two ...
group is used to study different coordinate systems via
coordinate transformations In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine the position of the points or other geometric elements on a manifold such as Euclidean space. The order of the coordinates is sign ...
as generated via the Lagrangian and Eulerian velocities of flow from one anatomical configuration in ^3 to another. It relates with shape statistics and
morphometrics Morphometrics (from Greek μορϕή ''morphe'', "shape, form", and -μετρία ''metria'', "measurement") or morphometry refers to the quantitative analysis of ''form'', a concept that encompasses size and shape. Morphometric analyses are co ...
, with the distinction that
diffeomorphism In mathematics, a diffeomorphism is an isomorphism of smooth manifolds. It is an invertible function that maps one differentiable manifold to another such that both the function and its inverse are differentiable. Definition Given two ...
s are used to map coordinate systems, whose study is known as diffeomorphometry.


Data and modeling

Mathematical biology is the use of mathematical models of living organisms to examine the systems that govern structure, development, and behavior in
biological system A biological system is a complex network which connects several biologically relevant entities. Biological organization spans several scales and are determined based different structures depending on what the system is. Examples of biological syst ...
s. This entails a more theoretical approach to problems, rather than its more empirically-minded counterpart of
experimental biology Experimental biology is the set of approaches in the field of biology concerned with the conduction of experiments to investigate and understand biological phenomena. The term is opposed to theoretical biology which is concerned with the mathematic ...
. Mathematical biology draws on discrete mathematics,
topology In mathematics, topology (from the Greek words , and ) is concerned with the properties of a geometric object that are preserved under continuous deformations, such as stretching, twisting, crumpling, and bending; that is, without closing ...
(also useful for computational modeling),
Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
,
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices ...
and
Boolean algebra In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variables are the truth values ''true'' and ''false'', usually denoted 1 and 0, whereas i ...
. These mathematical approaches have enabled the creation of
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
s and other methods for storing, retrieving, and analyzing biological data, a field known as bioinformatics. Usually, this process involves
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinian friar wor ...
and analyzing
gene In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
s. Gathering and analyzing large datasets have made way for growing research fields such as data mining, and computational biomodeling, which refers to building
computer model Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be deter ...
s and visual simulations of biological systems. This allows researchers to predict how such systems will react to different environments, useful for determining if a system can "maintain their state and functions against external and internal perturbations". While current techniques focus on small biological systems, researchers are working on approaches that will allow for larger networks to be analyzed and modeled. A majority of researchers believe that this will be essential in developing modern medical approaches to creating new drugs and gene
therapy A therapy or medical treatment (often abbreviated tx, Tx, or Tx) is the attempted remediation of a health problem, usually following a medical diagnosis. As a rule, each therapy has indications and contraindications. There are many different ...
. A useful modeling approach is to use
Petri nets A Petri net, also known as a place/transition (PT) net, is one of several mathematical modeling languages for the description of distributed systems. It is a class of discrete event dynamic system. A Petri net is a directed bipartite graph that ...
via tools such as esyN. Along similar lines, until recent decades
theoretical ecology Theoretical ecology is the scientific discipline devoted to the study of ecological systems using theoretical methods such as simple conceptual models, mathematical models, computational simulations, and advanced data analysis. Effective models im ...
has largely dealt with analytic models that were detached from the statistical models used by empirical ecologists. However, computational methods have aided in developing ecological theory via
simulation A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of models; the model represents the key characteristics or behaviors of the selected system or process, whereas the s ...
of ecological systems, in addition to increasing application of methods from
computational statistics Computational statistics, or statistical computing, is the bond between statistics and computer science. It means statistical methods that are enabled by using computational methods. It is the area of computational science (or scientific computin ...
in ecological analyses.


Systems Biology

Systems biology consists of computing the interactions between various biological systems ranging from the cellular level to entire populations with the goal of discovering emergent properties. This process usually involves networking
cell signaling In biology, cell signaling (cell signalling in British English) or cell communication is the ability of a cell to receive, process, and transmit signals with its environment and with itself. Cell signaling is a fundamental property of all cellula ...
and
metabolic pathway In biochemistry, a metabolic pathway is a linked series of chemical reactions occurring within a cell. The reactants, products, and intermediates of an enzymatic reaction are known as metabolites, which are modified by a sequence of chemical reac ...
s. Systems biology often uses computational techniques from biological modeling and
graph theory In mathematics, graph theory is the study of ''graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are conn ...
to study these complex interactions at cellular levels.


Evolutionary biology

Computational biology has assisted evolutionary biology by: * Using DNA data to reconstruct the tree of life with
computational phylogenetics Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic
* Fitting
population genetics Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, ...
models (either forward time or backward time) to DNA data to make inferences about demographic or selective history * Building
population genetics Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as Adaptation (biology), adaptation, ...
models of
evolutionary systems Evolutionary systems are a type of system, which reproduce with mutation whereby the most fit elements survive, and the less fit die down. One of the developers of the evolutionary systems thinking is Béla H. Bánáthy. Evolutionary systems are ch ...
from first principles in order to predict what is likely to evolve


Genomics

Computational genomics is the study of the
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding g ...
s of cells and
organism In biology, an organism () is any living system that functions as an individual entity. All organisms are composed of cells (cell theory). Organisms are classified by taxonomy into groups such as multicellular animals, plants, and ...
s. The Human Genome Project is one example of computational genomics. This project looks to sequence the entire human genome into a set of data. Once fully implemented, this could allow for doctors to analyze the genome of an individual
patient A patient is any recipient of health care services that are performed by healthcare professionals. The patient is most often ill or injured and in need of treatment by a physician, nurse, optometrist, dentist, veterinarian, or other hea ...
. This opens the possibility of personalized medicine, prescribing treatments based on an individual's pre-existing genetic patterns. Researchers are looking to sequence the genomes of animals, plants,
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of prokaryotic microorganisms. Typically a few micrometr ...
, and all other types of life. One of the main ways that genomes are compared is by sequence homology. Homology is the study of biological structures and nucleotide sequences in different organisms that come from a common ancestor. Research suggests that between 80 and 90% of genes in newly sequenced
prokaryotic A prokaryote () is a single-celled organism that lacks a nucleus and other membrane-bound organelles. The word ''prokaryote'' comes from the Greek πρό (, 'before') and κάρυον (, 'nut' or 'kernel').Campbell, N. "Biology:Concepts & Connec ...
genomes can be identified this way. Sequence alignment is another process for comparing and detecting similarities between biological sequences or genes. Sequence alignment is useful in a number of bioinformatics applications, such as computing the
longest common subsequence A longest common subsequence (LCS) is the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring: unlike substrings, subsequences are not required to occupy conse ...
of two genes or comparing variants of certain
disease A disease is a particular abnormal condition that negatively affects the structure or function of all or part of an organism, and that is not immediately due to any external injury. Diseases are often known to be medical conditions that a ...
s. An untouched project in computational genomics is the analysis of intergenic regions, which comprise roughly 97% of the human genome. Researchers are working to understand the functions of non-coding regions of the human genome through the development of computational and statistical methods and via large consortia projects such as
ENCODE The Encyclopedia of DNA Elements (ENCODE) is a public research project which aims to identify functional elements in the human genome. ENCODE also supports further biomedical research by "generating community resources of genomics data, software ...
and the Roadmap Epigenomics Project. Understanding how individual
gene In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a b ...
s contribute to the
biology Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary i ...
of an organism at the molecular,
cellular Cellular may refer to: *Cellular automaton, a model in discrete mathematics * Cell biology, the evaluation of cells work and more * ''Cellular'' (film), a 2004 movie *Cellular frequencies, assigned to networks operating in cellular RF bands *Cell ...
, and organism levels is known as
gene ontology The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and ge ...
. The
Gene Ontology Consortium The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and ge ...
's mission is to develop an up-to-date, comprehensive, computational model of
biological system A biological system is a complex network which connects several biologically relevant entities. Biological organization spans several scales and are determined based different structures depending on what the system is. Examples of biological syst ...
s, from the molecular level to larger pathways, cellular, and organism-level systems. The Gene Ontology resource provides a computational representation of current scientific knowledge about the functions of genes (or, more properly, the
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
and non-coding
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
molecules produced by genes) from many different organisms, from humans to bacteria. 3D genomics is a subsection in computational biology that focuses on the organization and interaction of genes within a
eukaryotic cell Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacter ...
. One method used to gather 3D genomic data is through
Genome Architecture Mapping In molecular biology, genome architecture mapping (GAM) is a cryosectioning method to map colocalized DNA regions in a ligation independent manner. It overcomes some limitations of Chromosome conformation capture (3C), as these methods have a re ...
(GAM). GAM measures 3D distances of
chromatin Chromatin is a complex of DNA and protein found in eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important roles in r ...
and DNA in the genome by combining cryosectioning, the process of cutting a strip from the nucleus to examine the DNA, with laser microdissection. A nuclear profile is simply this strip or slice that is taken from the nucleus. Each nuclear profile contains genomic windows, which are certain sequences of
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...
s - the base unit of DNA. GAM captures a genome network of complex, multi enhancer chromatin contacts throughout a cell.


Neuroscience

Computational
neuroscience Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions and disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, developme ...
is the study of brain function in terms of the information processing properties of the
nervous system In biology, the nervous system is the highly complex part of an animal that coordinates its actions and sensory information by transmitting signals to and from different parts of its body. The nervous system detects environmental changes ...
. A subset of neuroscience, it looks to model the brain to examine specific aspects of the neurological system. Models of the brain include: * Realistic Brain Models: These models look to represent every aspect of the brain, including as much detail at the cellular level as possible. Realistic models provide the most information about the brain, but also have the largest margin for
error An error (from the Latin ''error'', meaning "wandering") is an action which is inaccurate or incorrect. In some usages, an error is synonymous with a mistake. The etymology derives from the Latin term 'errare', meaning 'to stray'. In statistics ...
. More variables in a brain model create the possibility for more error to occur. These models do not account for parts of the cellular structure that scientists do not know about. Realistic brain models are the most computationally heavy and the most expensive to implement. * Simplifying Brain Models: These models look to limit the scope of a model in order to assess a specific physical property of the neurological system. This allows for the intensive computational problems to be solved, and reduces the amount of potential error from a realistic brain model. It is the work of computational neuroscientists to improve the
algorithms In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
and data structures currently used to increase the speed of such calculations. Computational
neuropsychiatry Neuropsychiatry or Organic Psychiatry is a branch of medicine that deals with psychiatry as it relates to neurology, in an effort to understand and attribute behavior to the interaction of neurobiology and social psychology factors. Within neurop ...
is an emerging field that uses mathematical and computer-assisted modeling of brain mechanisms involved in
mental disorder A mental disorder, also referred to as a mental illness or psychiatric disorder, is a behavioral or mental pattern that causes significant distress or impairment of personal functioning. Such features may be persistent, relapsing and remitt ...
s. Several initiatives have demonstrated that computational modeling is an important contribution to understand neuronal circuits that could generate mental functions and dysfunctions.


Pharmacology

Computational pharmacology is "the study of the effects of genomic data to find links between specific genotypes and diseases and then screening drug data". The
pharmaceutical industry The pharmaceutical industry discovers, develops, produces, and markets drugs or pharmaceutical drugs for use as medications to be administered to patients (or self-administered), with the aim to cure them, vaccinate them, or alleviate symptoms. ...
requires a shift in methods to analyze drug data. Pharmacologists were able to use
Microsoft Excel Microsoft Excel is a spreadsheet developed by Microsoft for Windows, macOS, Android and iOS. It features calculation or computation capabilities, graphing tools, pivot tables, and a macro programming language called Visual Basic for App ...
to compare chemical and genomic data related to the effectiveness of drugs. However, the industry has reached what is referred to as the Excel barricade. This arises from the limited number of cells accessible on a
spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in c ...
. This development led to the need for computational pharmacology. Scientists and researchers develop computational methods to analyze these massive
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
s. This allows for an efficient comparison between the notable data points and allows for more accurate drugs to be developed. Analysts project that if major medications fail due to patents, that computational biology will be necessary to replace current drugs on the market. Doctoral students in computational biology are being encouraged to pursue careers in industry rather than take Post-Doctoral positions. This is a direct result of major pharmaceutical companies needing more qualified analysts of the large data sets required for producing new drugs. Similarly, computational
oncology Oncology is a branch of medicine that deals with the study, treatment, diagnosis and prevention of cancer. A medical professional who practices oncology is an ''oncologist''. The name's etymological origin is the Greek word ὄγκος (''� ...
aims to determine the future
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA replication, DNA or viral repl ...
s in
cancer Cancer is a group of diseases involving abnormal cell growth with the potential to invade or spread to other parts of the body. These contrast with benign tumors, which do not spread. Possible signs and symptoms include a lump, abnormal b ...
through algorithmic approaches. Research in this field has led to the use of high-throughput measurement that millions of data points using
robotics Robotics is an interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist humans. Robotics integrate ...
and other sensing devices. This data is collected from DNA,
RNA Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression of genes. RNA and deoxyribonucleic acid ( DNA) are nucleic acids. Along with lipids, proteins, and carbohydra ...
, and other biological structures. Areas of focus include determining the characteristics of
tumors A neoplasm () is a type of abnormal and excessive growth of tissue. The process that occurs to form or produce a neoplasm is called neoplasia. The growth of a neoplasm is uncoordinated with that of the normal surrounding tissue, and persists ...
, analyzing molecules that are deterministic in causing cancer, and understanding how the human genome relates to the causation of tumors and cancer.


Techniques

Computational biologists use a wide range of software and algorithms to carry out their research.


Unsupervised Learning

Unsupervised learning Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and t ...
is a type of algorithm that finds patterns in unlabeled data. One example is k-means clustering, which aims to partition ''n'' data points into ''k'' clusters, in which each data point belongs to the cluster with the nearest mean. Another version is the
k-medoids The -medoids problem is a clustering problem similar to -means. The name was coined by Leonard Kaufman and Peter J. Rousseeuw with their PAM algorithm. Both the -means and -medoids algorithms are partitional (breaking the dataset up into group ...
algorithm, which, when selecting a cluster center or cluster centroid, will pick one of its data points in the set, and not just an average of the cluster. The algorithm follows these steps: # Randomly select ''k'' distinct data points. These are the initial clusters. # Measure the distance between each point and each of the 'k' clusters. (This is the distance of the points from each point ''k''). # Assign each point to the nearest cluster. # Find the center of each cluster (medoid). # Repeat until the clusters no longer change. # Assess the quality of the clustering by adding up the variation within each cluster. # Repeat the processes with different values of k. # Pick the best value for 'k' by finding the "elbow" in the plot of which k value has the lowest variance. One example of this in biology is used in the 3D mapping of a genome. Information of a mouse's HIST1 region of chromosome 13 is gathered from Gene Expression Omnibus. This information contains data on which nuclear profiles show up in certain genomic regions. With this information, the Jaccard distance can be used to find a normalized distance between all the loci.


Graph Analytics

Graph analytics, or
network analysis Network analysis can refer to: * Network theory, the analysis of relations through mathematical graphs ** Social network analysis, network theory applied to social relations * Network analysis (electrical circuits) See also *Network planning and ...
, is the study of graphs that represent connections between different objects. Graphs can represent all kinds of networks in biology such as Protein-protein interaction networks, regulatory networks, Metabolic and biochemical networks and much more. There are many ways to analyze these networks. One of which is looking at
Centrality In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, key ...
in graphs. Finding centrality in graphs assigns nodes rankings to their popularity or centrality in the graph. This can be useful in finding what nodes are most important. This can be very useful in biology in many ways. For example, if we were to have data on the activity of genes in a given time period, we can use degree centrality to see what genes are most active throughout the network, or what genes interact with others the most throughout the network. This can help us understand what roles certain genes play in the network. There are many ways to calculate centrality in graphs all of which can give different kinds of information on centrality. Finding centralities in biology can be applied in many different circumstances, some of which are gene regulatory, protein interaction and metabolic networks.


Supervised Learning

Supervised learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
is a type of algorithm that learns from labeled data and learns how to assign labels to future data that is unlabeled. In biology supervised learning can be helpful when we have data that we know how to categorize and we would like to categorize more data into those categories.A common supervised learning algorithm is the random forest, which uses numerous
decision trees A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains condit ...
to train a model to classify a dataset. Forming the basis of the random forest, a decision tree is a structure which aims to classify, or label, some set of data using certain known features of that data. A practical biological example of this would be taking an individual's genetic data and predicting whether or not that individual is predisposed to develop a certain disease or cancer. At each internal node the algorithm checks the dataset for exactly one feature, a specific gene in the previous example, and then branches left or right based on the result. Then at each leaf node, the decision tree assigns a class label to the dataset. So in practice, the algorithm walks a specific root-to-leaf path based on the input dataset through the decision tree, which results in the classification of that dataset. Commonly, decision trees have target variables that take on discrete values, like yes/no, in which case it is referred to as a
classification tree Classification chart or classification tree is a synopsis of the classification scheme, designed to illustrate the structure of any particular field. Overview Classification is the process in which ideas and objects are recognized, differentia ...
, but if the target variable is continuous then it is called a regression tree. To construct a decision tree, it must first be trained using a training set to identify which features are the best predictors of the target variable.


Open source software

Open source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open ...
provides a platform for computational biology where everyone can access and benefit from software developed in research.
PLOS PLOS (for Public Library of Science; PLoS until 2012 ) is a nonprofit publisher of open-access journals in science, technology, and medicine and other scientific literature, under an open-content license. It was founded in 2000 and laun ...
cites four main reasons for the use of open source software: * Reproducibility: This allows for researchers to use the exact methods used to calculate the relations between biological data. *Faster development: developers and researchers do not have to reinvent existing code for minor tasks. Instead they can use pre-existing programs to save time on the development and implementation of larger projects. * Increased quality: Having input from multiple researchers studying the same topic provides a layer of assurance that errors will not be in the code. *Long-term availability: Open source programs are not tied to any businesses or patents. This allows for them to be posted to multiple web pages and ensure that they are available in the future.


Research

There are several large conferences that are concerned with computational biology. Some notable examples are
Intelligent Systems for Molecular Biology Intelligent Systems for Molecular Biology (ISMB) is an annual academic conference on the subjects of bioinformatics and computational biology organised by the International Society for Computational Biology (ISCB). The principal focus of the co ...
,
European Conference on Computational Biology The European Conference on Computational Biology (ECCB) is a scientific meeting on the subjects of bioinformatics and computational biology. It covers a wide spectrum of disciplines, including bioinformatics, computational biology, genomics, ...
and
Research in Computational Molecular Biology Research in Computational Molecular Biology (RECOMB) is an annual academic conference on the subjects of bioinformatics and computational biology. The conference has been held every year since 1997 and is a major international conference in co ...
. There are also numerous journals dedicated to computational biology. Some notable examples include
Journal of Computational Biology The ''Journal of Computational Biology'' is a monthly peer-reviewed scientific journal covering computational biology and bioinformatics. It was established in 1994 and is published by Mary Ann Liebert, Inc. The editors-in-chief are Sorin Istrail ...
and PLOS Computational Biology, a peer-reviewed
open access journal Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre op ...
that has many notable research projects in the field of computational biology. They provide reviews on
software Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work. ...
, tutorials for open source software, and display information on upcoming computational biology conferences.


Related fields

Computational biology, bioinformatics and mathematical biology are all interdisciplinary approaches to the life sciences that draw from quantitative disciplines such as mathematics and
information science Information science (also known as information studies) is an academic field which is primarily concerned with analysis, collection, classification, manipulation, storage, retrieval, movement, dissemination, and protection of informatio ...
. The
NIH The National Institutes of Health, commonly referred to as NIH (with each letter pronounced individually), is the primary agency of the United States government responsible for biomedical and public health research. It was founded in the late ...
describes computational/mathematical biology as the use of computational/mathematical approaches to address theoretical and experimental questions in biology and, by contrast, bioinformatics as the application of information science to understand complex life-sciences data. Specifically, the NIH defines While each field is distinct, there may be significant overlap at their interface, so much so that to many, bioinformatics and computational biology are terms that are used interchangeably. The terms computational biology and evolutionary computation have a similar name, but are not to be confused. Unlike computational biology, evolutionary computation is not concerned with modeling and analyzing biological data. It instead creates algorithms based on the ideas of evolution across species. Sometimes referred to as genetic algorithms, the research of this field can be applied to computational biology. While evolutionary computation is not inherently a part of computational biology, computational evolutionary biology is a subfield of it.


See also


References


External links


bioinformatics.org
{{DEFAULTSORT:Computational Biology Bioinformatics Computational fields of study