HOME

TheInfoList



OR:

Multi-state modeling of biomolecules refers to a series of techniques used to represent and compute the behaviour of biological molecules or complexes that can adopt a large number of possible functional states. Biological signaling systems often rely on complexes of biological
macromolecule A macromolecule is a very large molecule important to biophysical processes, such as a protein or nucleic acid. It is composed of thousands of covalently bonded atoms. Many macromolecules are polymers of smaller molecules called monomers. The ...
s that can undergo several functionally significant modifications that are mutually compatible. Thus, they can exist in a very large number of functionally different states.
Modeling A model is an informative representation of an object, person or system. The term originally denoted the plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a measure. Models c ...
such multi-state systems poses two problems: The problem of how to describe and specify a multi-state system (the "specification problem") and the problem of how to use a computer to simulate the progress of the system over time (the "computation problem"). To address the specification problem, modelers have in recent years moved away from explicit specification of all possible states, and towards
rule-based modeling Rule-based modeling is a modeling approach that uses a set of rules that indirectly specifies a mathematical model. The rule-set can either be translated into a model such as Markov chains or differential equations, or be treated using tools that ...
that allow for implicit model specification, including the κ-calculus, BioNetGen, the Allosteric Network Compiler and others. To tackle the computation problem, they have turned to particle-based methods that have in many cases proved more computationally efficient than population-based methods based on
ordinary differential equation In mathematics, an ordinary differential equation (ODE) is a differential equation whose unknown(s) consists of one (or more) function(s) of one variable and involves the derivatives of those functions. The term ''ordinary'' is used in contrast w ...
s,
partial differential equation In mathematics, a partial differential equation (PDE) is an equation which imposes relations between the various partial derivatives of a Multivariable calculus, multivariable function. The function is often thought of as an "unknown" to be sol ...
s, or the Gillespie stochastic simulation algorithm. Given current computing technology, particle-based methods are sometimes the only possible option. Particle-based simulators further fall into two categories: Non- spatial simulators such as StochSim, DYNSTOC, RuleMonkey, and NFSim and spatial simulators, including Meredys, SRSim and MCell.Stiles JR, Bartol TM (2001). Computational Neuroscience: Realistic Modeling for Experimentalists. In: De Schutter, E (ed). Computational Neuroscience: Realistic Modeling for Experimentalists. CRC Press, Boca Raton. Modelers can thus choose from a variety of tools; the best choice depending on the particular problem. Development of faster and more powerful methods is ongoing, promising the ability to simulate ever more complex signaling processes in the future.


Introduction


Multi-state biomolecules in signal transduction

In living
cells Cell most often refers to: * Cell (biology), the functional basic unit of life Cell may also refer to: Locations * Monastic cell, a small room, hut, or cave in which a religious recluse lives, alternatively the small precursor of a monastery w ...
, signals are processed by networks of
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
s that can act as complex computational devices. These networks rely on the ability of single proteins to exist in a variety of functionally different states achieved through multiple mechanisms, including
post-translational modification Post-translational modification (PTM) is the covalent and generally enzymatic modification of proteins following protein biosynthesis. This process occurs in the endoplasmic reticulum and the golgi apparatus. Proteins are synthesized by ribosome ...
s,
ligand binding In biochemistry and pharmacology, a ligand is a substance that forms a complex with a biomolecule to serve a biological purpose. The etymology stems from ''ligare'', which means 'to bind'. In protein-ligand binding, the ligand is usually a mole ...
,
conformational change In biochemistry, a conformational change is a change in the shape of a macromolecule, often induced by environmental factors. A macromolecule is usually flexible and dynamic. Its shape can change in response to changes in its environment or oth ...
, or formation of new complexes. Similarly,
nucleic acid Nucleic acids are biopolymers, macromolecules, essential to all known forms of life. They are composed of nucleotides, which are the monomers made of three components: a 5-carbon sugar, a phosphate group and a nitrogenous base. The two main cl ...
s can undergo a variety of transformations, including protein binding, binding of other nucleic acids, conformational change and
DNA methylation DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter, DNA methylation typically acts t ...
. In addition, several types of modifications can co-exist, exerting a combined influence on a biological macromolecule at any given time. Thus, a biomolecule or complex of biomolecules can often adopt a very large number of functionally distinct states. The number of states scales exponentially with the number of possible modifications, a phenomenon known as "
combinatorial explosion In mathematics, a combinatorial explosion is the rapid growth of the complexity of a problem due to how the combinatorics of the problem is affected by the input, constraints, and bounds of the problem. Combinatorial explosion is sometimes used ...
". This is of concern for
computational biologists Computation is any type of arithmetic or non-arithmetic calculation that follows a well-defined model (e.g., an algorithm). Mechanical or electronic devices (or, historically, people) that perform computations are known as ''computers''. An espe ...
who model or simulate such biomolecules, because it raises questions about how such large numbers of states can be represented and simulated.


Examples of combinatorial explosion

Biological signaling networks incorporate a wide array of reversible
interactions Interaction is action that occurs between two or more objects, with broad use in philosophy and the sciences. It may refer to: Science * Interaction hypothesis, a theory of second language acquisition * Interaction (statistics) * Interactions o ...
,
post-translational modification Post-translational modification (PTM) is the covalent and generally enzymatic modification of proteins following protein biosynthesis. This process occurs in the endoplasmic reticulum and the golgi apparatus. Proteins are synthesized by ribosome ...
s and
conformational change In biochemistry, a conformational change is a change in the shape of a macromolecule, often induced by environmental factors. A macromolecule is usually flexible and dynamic. Its shape can change in response to changes in its environment or oth ...
s. Furthermore, it is common for a protein to be composed of several - identical or nonidentical - subunits, and for several proteins and/or nucleic acid species to assemble into larger complexes. A molecular species with several of those features can therefore exist in a large number of possible states. For instance, it has been estimated that the
yeast Yeasts are eukaryotic, single-celled microorganisms classified as members of the fungus kingdom. The first yeast originated hundreds of millions of years ago, and at least 1,500 species are currently recognized. They are estimated to constitut ...
scaffold protein In biology, scaffold proteins are crucial regulators of many key signalling pathways. Although scaffolds are not strictly defined in function, they are known to interact and/or bind with multiple members of a signalling pathway, tethering them in ...
Ste5 Ste5 is a MAPK scaffold protein involved in the mating of yeast. The active complex is formed by interactions with the MAPK Fus3, the MAPK kinase (MAPKK) Ste7, and the MAPKK kinase Ste11. After the induction of mating by an appropriate mating ph ...
can be a part of 25666 unique protein complexes. In ''
E. coli ''Escherichia coli'' (),Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. also known as ''E. coli'' (), is a Gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus ''Escher ...
'',
chemotaxis Chemotaxis (from '' chemo-'' + ''taxis'') is the movement of an organism or entity in response to a chemical stimulus. Somatic cells, bacteria, and other single-cell or multicellular organisms direct their movements according to certain chemica ...
receptors of four different kinds interact in groups of three, and each individual receptor can exist in at least two possible conformations and has up to eight
methylation In the chemical sciences, methylation denotes the addition of a methyl group on a substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replacing a hydrogen atom. These t ...
sites, resulting in billions of potential states. The protein
kinase In biochemistry, a kinase () is an enzyme that catalyzes the transfer of phosphate groups from high-energy, phosphate-donating molecules to specific substrates. This process is known as phosphorylation, where the high-energy ATP molecule don ...
CaMKII /calmodulin-dependent protein kinase II (CaM kinase II or CaMKII) is a serine/threonine-specific protein kinase that is regulated by the / calmodulin complex. CaMKII is involved in many signaling cascades and is thought to be an important mediato ...
is a dodecamer of twelve
catalytic Catalysis () is the process of increasing the rate of a chemical reaction by adding a substance known as a catalyst (). Catalysts are not consumed in the reaction and remain unchanged after it. If the reaction is rapid and the catalyst recyc ...
subunits, arranged in two
hexamer In chemistry and biochemistry, an oligomer () is a molecule that consists of a few repeating units which could be derived, actually or conceptually, from smaller molecules, monomers.Quote: ''Oligomer molecule: A molecule of intermediate relative ...
ic rings. Each subunit can exist in at least two distinct conformations, and each subunit features various
phosphorylation In chemistry, phosphorylation is the attachment of a phosphate group to a molecule or an ion. This process and its inverse, dephosphorylation, are common in biology and could be driven by natural selection. Text was copied from this source, wh ...
and ligand binding sites. A recent model incorporated conformational states, two
phosphorylation In chemistry, phosphorylation is the attachment of a phosphate group to a molecule or an ion. This process and its inverse, dephosphorylation, are common in biology and could be driven by natural selection. Text was copied from this source, wh ...
sites and two modes of binding calcium/calmodulin, for a total of around one billion possible states per hexameric ring. A model of coupling of the
EGF receptor EGF may refer to: * E.G.F., a Gabonese company * East Grand Forks, Minnesota, a city * East Garforth railway station in England * Epidermal growth factor * Equity Group Foundation, a Kenyan charity * European Gendarmerie Force, a military unit of ...
to a
MAP kinase A mitogen-activated protein kinase (MAPK or MAP kinase) is a type of protein kinase that is specific to the amino acids serine and threonine (i.e., a serine/threonine-specific protein kinase). MAPKs are involved in directing cellular responses ...
cascade presented by Danos and colleaguesDanos V, Feret J, Fontana W, Harmer R, Krivine J (2007). Rule-Based Modelling of Cellular Signalling. Proceedings of the Eighteenth International Conference on Concurrency Theory, CONCUR 2007, Lisbon, Portugal accounts for \sim 10^ distinct molecular species, yet the authors note several points at which the model could be further extended. A more recent model of
ErbB The ErbB family of proteins contains four receptor tyrosine kinases, structurally related to the epidermal growth factor receptor (EGFR), its first discovered member. In humans, the family includes Her1 (EGFR, ErbB1), Her2 (Neu, ErbB2), Her3 (Er ...
receptor signalling even accounts for more than one
googol A googol is the large number 10100. In decimal notation, it is written as the digit 1 followed by one hundred zeroes: 10,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, ...
(10^) distinct molecular species. The problem of combinatorial explosion is also relevant to
synthetic biology Synthetic biology (SynBio) is a multidisciplinary area of research that seeks to create new biological parts, devices, and systems, or to redesign systems that are already found in nature. It is a branch of science that encompasses a broad ran ...
, with a recent model of a relatively simple synthetic
eukaryotic Eukaryotes () are organisms whose cells have a nucleus. All animals, plants, fungi, and many unicellular organisms, are Eukaryotes. They belong to the group of organisms Eukaryota or Eukarya, which is one of the three domains of life. Bacte ...
gene circuit featuring 187 species and 1165
reactions Reaction may refer to a process or to a response to an action, event, or exposure: Physics and chemistry *Chemical reaction *Nuclear reaction *Reaction (physics), as defined by Newton's third law *Chain reaction (disambiguation). Biology and me ...
. Of course, not all of the possible states of a multi-state molecule or complex will necessarily be populated. Indeed, in systems where the number of possible states is far greater than that of molecules in the compartment (e.g. the cell), they cannot be. In some cases, empirical information can be used to rule out certain states if, for instance, some combinations of features are incompatible. In the absence of such information, however, all possible states need to be considered ''
a priori ("from the earlier") and ("from the later") are Latin phrases used in philosophy to distinguish types of knowledge, justification, or argument by their reliance on empirical evidence or experience. knowledge is independent from current ex ...
''. In such cases, computational modeling can be used to uncover to what extent the different states are populated. The existence (or potential existence) of such large numbers of molecular species is a
combinatorial Combinatorics is an area of mathematics primarily concerned with counting, both as a means and an end in obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many app ...
phenomenon: It arises from a relatively small set of features or modifications (such as post-translational modification or complex formation) that combine to dictate the state of the entire molecule or complex, in the same way that the existence of just a few choices in a coffee shop (small, medium or large, with or without milk,
decaf Decaffeination is the removal of caffeine from coffee beans, cocoa, tea leaves, and other caffeine-containing materials. Decaffeinated drinks contain typically 1–2% of the original caffeine content, and sometimes as much as 20%. Decaffeinated ...
or not, extra shot of
espresso Espresso (, ) is a coffee-brewing method of Italian origin, in which a small amount of nearly boiling water (about ) is forced under of pressure through finely-ground coffee beans. Espresso can be made with a wide variety of coffee beans and ...
) quickly leads to a large number of possible beverages (24 in this case; each additional binary choice will double that number). Although it is difficult for us to grasp the total numbers of possible combinations, it is usually not conceptually difficult to understand the (much smaller) set of features or modifications and the effect each of them has on the function of the biomolecule. The rate at which a molecule undergoes a particular reaction will usually depend mainly on a single feature or a small subset of features. It is the presence or absence of those features that dictates the
reaction rate The reaction rate or rate of reaction is the speed at which a chemical reaction takes place, defined as proportional to the increase in the concentration of a product per unit time and to the decrease in the concentration of a reactant per unit ...
. The reaction rate is the same for two molecules that differ only in features which do not affect this reaction. Thus, the number of parameters will be much smaller than the number of reactions. (In the coffee shop example, adding an extra shot of espresso will cost 40 cent, no matter what size the beverage is and whether or not it has milk in it). It is such "local rules" that are usually discovered in laboratory experiments. Thus, a multi-state model can be conceptualised in terms of combinations of modular features and local rules. This means that even a model that can account for a vast number of molecular species and reactions is not necessarily conceptually complex.


Specification vs computation

The combinatorial complexity of signaling systems involving multi-state proteins poses two kinds of problems. The first problem is concerned with how such a system can be specified; i.e. how a modeler can specify all complexes, all changes those complexes undergo and all parameters and conditions governing those changes in a robust and efficient way. This problem is called the "specification problem". The second problem concerns
computation Computation is any type of arithmetic or non-arithmetic calculation that follows a well-defined model (e.g., an algorithm). Mechanical or electronic devices (or, historically, people) that perform computations are known as ''computers''. An es ...
. It asks questions about whether a combinatorially complex model, once specified, is computationally tractable, given the large number of states and the even larger number of possible transitions between states, whether it can be stored electronically, and whether it can be evaluated in a reasonable amount of computing time. This problem is called the "computation problem". Among the approaches that have been proposed to tackle combinatorial complexity in multi-state modeling, some are mainly concerned with addressing the specification problem, some are focused on finding effective methods of computation. Some tools address both specification and computation. The sections below discuss rule-based approaches to the specification problem and particle-based approaches to solving the computation problem. A wide range of computational tools exist for multi-state modeling.Chylek LA, Stites EC, Posner RG, Hlavacek WS (2013) Innovations of the rule-based modeling approach. In Systems Biology: Integrative Biology and Simulation Tools, Volume 1 (Prokop A, Csukás B, Editors), Springer.


The specification problem


Explicit specification

The most naïve way of specifying, e.g., a protein in a biological model is to specify each of its states explicitly and use each of them as a molecular species in a
simulation A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of Conceptual model, models; the model represents the key characteristics or behaviors of the selected system or proc ...
framework that allows transitions from state to state. For instance, if a protein can be
ligand In coordination chemistry, a ligand is an ion or molecule (functional group) that binds to a central metal atom to form a coordination complex. The bonding with the metal generally involves formal donation of one or more of the ligand's electr ...
-bound or not, exist in two conformational states (e.g. open or closed) and be located in two possible subcellular areas (e.g.
cytosol The cytosol, also known as cytoplasmic matrix or groundplasm, is one of the liquids found inside cells (intracellular fluid (ICF)). It is separated into compartments by membranes. For example, the mitochondrial matrix separates the mitochondri ...
ic or
membrane A membrane is a selective barrier; it allows some things to pass through but stops others. Such things may be molecules, ions, or other small particles. Membranes can be generally classified into synthetic membranes and biological membranes. B ...
-bound), then the eight possible resulting states can be explicitly enumerated as: *bound, open, cytosol *bound, open, membrane *bound, closed, cytosol *bound, closed, membrane *unbound, open, cytosol *unbound, open, membrane *unbound, closed, cytosol *unbound, closed, membrane Enumerating all possible states is a lengthy and potentially error-prone process. For macromolecular complexes that can adopt multiple states, enumerating each state quickly becomes tedious, if not impossible. Moreover, the addition of a single additional modification or feature to the model of the complex under investigation will double the number of possible states (if the modification is binary), and it will more than double the number of transitions that need to be specified.


Rule-based model specification

It is clear that an explicit description, which lists all possible molecular species (including all their possible states), all possible reactions or transitions these species can undergo, and all parameters governing these reactions, very quickly becomes unwieldy as the complexity of the biological system increases. Modelers have therefore looked for
implicit Implicit may refer to: Mathematics * Implicit function * Implicit function theorem * Implicit curve * Implicit surface * Implicit differential equation Other uses * Implicit assumption, in logic * Implicit-association test, in social psychology ...
, rather than explicit, ways of specifying a biological signaling system. An implicit description is one that groups
reactions Reaction may refer to a process or to a response to an action, event, or exposure: Physics and chemistry *Chemical reaction *Nuclear reaction *Reaction (physics), as defined by Newton's third law *Chain reaction (disambiguation). Biology and me ...
and parameters that apply to many types of molecular species into one reaction template. It might also add a set of conditions that govern reaction parameters, i.e. the likelihood or rate at which a reaction occurs, or whether it occurs at all. Only properties of the molecule or complex that matter to a given reaction (either affecting the reaction or being affected by it) are explicitly mentioned, and all other properties are ignored in the specification of the reaction. For instance, the rate of ligand
dissociation Dissociation, in the wide sense of the word, is an act of disuniting or separating a complex object into parts. Dissociation may also refer to: * Dissociation (chemistry), general process in which molecules or ionic compounds (complexes, or salts) ...
from a protein might depend on the conformational state of the protein, but not on its subcellular localization. An implicit description would therefore list two dissociation processes (with different rates, depending on conformational state), but would ignore attributes referring to subcellular localization, because they do not affect the rate of ligand dissociation, nor are they affected by it. This specification rule has been summarized as "Don't care, don't write". Since it is not written in terms of reactions, but in terms of more general "reaction rules" encompassing sets of reactions, this kind of specification is often called "rule-based". This description of the system in terms of modular rules relies on the assumption that only a subset of features or attributes are relevant for a particular reaction rule. Where this assumption holds, a set of reactions can be coarse-grained into one reaction rule. This coarse-graining preserves the important properties of the underlying reactions. For instance, if the reactions are based on chemical kinetics, so are the rules derived from them. Many rule-based specification methods exist. In general, the specification of a model is a separate task from the execution of the simulation. Therefore, among the existing rule-based model specification systems, some concentrate on model specification only, allowing the user to then export the specified model into a dedicated simulation engine. However, many solutions to the specification problem also contain a method of interpreting the specified model. This is done by providing a method to simulate the model or a method to convert it into a form that can be used for simulations in other programs. An early rule-based specification method is the κ-calculus, a
process algebra In computer science, the process calculi (or process algebras) are a diverse family of related approaches for formally modelling concurrent systems. Process calculi provide a tool for the high-level description of interactions, communications, and ...
that can be used to encode macromolecules with internal states and binding sites and to specify rules by which they interact. The κ-calculus is merely concerned with providing a language to encode multi-state models, not with interpreting the models themselves. A simulator compatible with Kappa is KaSim. BioNetGen is a software suite that provides both specification and simulation capacities. Rule-based models can be written down using a specified syntax, the BioNetGen language (BNGL). The underlying concept is to represent biochemical systems as
graphs Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discre ...
, where molecules are represented as nodes (or collections of nodes) and chemical bonds as edges. A reaction rule, then, corresponds to a graph rewriting rule. BNGL provides a syntax for specifying these graphs and the associated rules as structured strings. BioNetGen can then use these rules to generate ordinary differential equations (ODEs) to describe each biochemical reaction. Alternatively, it can generate a list of all possible species and reactions in
SBML The Systems Biology Markup Language (SBML) is a representation format, based on XML, for communicating and storing computational models of biological processes. It is a free and open standard with widespread software support and a community of use ...
, which can then be exported to simulation software packages that can read
SBML The Systems Biology Markup Language (SBML) is a representation format, based on XML, for communicating and storing computational models of biological processes. It is a free and open standard with widespread software support and a community of use ...
. One can also make use of BioNetGen's own ODE-based simulation software and its capability to generate reactions on-the-fly during a stochastic simulation. In addition, a model specified in BNGL can be read by other simulation software, such as DYNSTOC, RuleMonkey, and NFSim. Another tool that generates full reaction networks from a set of rules is the Allosteric Network Compiler (ANC). Conceptually, ANC sees molecules as allosteric devices with a Monod-Wyman-Changeux (MWC) type regulation mechanism, whose interactions are governed by their internal state, as well as by external modifications. A very useful feature of ANC is that it automatically computes dependent parameters, thereby imposing
thermodynamic Thermodynamics is a branch of physics that deals with heat, work, and temperature, and their relation to energy, entropy, and the physical properties of matter and radiation. The behavior of these quantities is governed by the four laws of ther ...
correctness. An extension of the κ-calculus is provided by ''React(C)''.John, M., Lhoussaine, C., Niehren, J., & Versari, C. (2011). Biochemical reaction rules with constraints. In Programming Languages and Systems (pp. 338-357). Springer Berlin Heidelberg. The authors of ''React C'' show that it can express the stochastic π calculus. They also provide a stochastic simulation algorithm based on the Gillespie stochastic algorithm for models specified in ''React(C)''. ML-Rules is similar to React(C), but provides the added possibility of nesting: A component species of the model, with all its attributes, can be part of a higher-order component species. This enables ML-Rules to capture multi-level models that can bridge the gap between, for instance, a series of biochemical processes and the macroscopic behaviour of a whole cell or group of cells. For instance, a proof-of-concept model of cell division in
fission yeast ''Schizosaccharomyces pombe'', also called "fission yeast", is a species of yeast used in traditional brewing and as a model organism in molecular and cell biology. It is a unicellular eukaryote, whose cells are rod-shaped. Cells typically measu ...
includes
cyclin Cyclin is a family of proteins that controls the progression of a cell through the cell cycle by activating cyclin-dependent kinase (CDK) enzymes or group of enzymes required for synthesis of cell cycle. Etymology Cyclins were originally disco ...
/
cdc2 Cyclin-dependent kinase 1 also known as CDK1 or cell division cycle protein 2 homolog is a highly conserved protein that functions as a serine/threonine protein kinase, and is a key player in cell cycle regulation. It has been highly studied in th ...
binding and activation,
pheromone A pheromone () is a secreted or excreted chemical factor that triggers a social response in members of the same species. Pheromones are chemicals capable of acting like hormones outside the body of the secreting individual, to affect the behavio ...
secretion and diffusion,
cell division Cell division is the process by which a parent cell (biology), cell divides into two daughter cells. Cell division usually occurs as part of a larger cell cycle in which the cell grows and replicates its chromosome(s) before dividing. In eukar ...
and movement of cells. Models specified in ML-Rules can be simulated using the James II simulation framework. A similar nested language to represent multi-level biological systems has been proposed by Oury and Plotkin. A specification formalism based on molecular
finite automata A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
(MFA) framework can then be used to generate and simulate a system of ODEs or for
stochastic simulation A stochastic simulation is a simulation of a system that has variables that can change stochastically (randomly) with individual probabilities.DLOUHÝ, M.; FÁBRY, J.; KUNCOVÁ, M.. Simulace pro ekonomy. Praha : VŠE, 2005. Realizations of these ...
using a kinetic
Monte Carlo Monte Carlo (; ; french: Monte-Carlo , or colloquially ''Monte-Carl'' ; lij, Munte Carlu ; ) is officially an administrative area of the Principality of Monaco, specifically the ward of Monte Carlo/Spélugues, where the Monte Carlo Casino is ...
algorithm. Some rule-based specification systems and their associated network generation and simulation tools have been designed to accommodate spatial heterogeneity, in order to allow for the realistic simulation of interactions within biological compartments. For instance, the Simmune project includes a spatial component: Users can specify their multi-state biomolecules and interactions within membranes or compartments of arbitrary shape. The reaction volume is then divided into interfacing voxels, and a separate reaction network generated for each of these subvolumes. The Stochastic Simulator Compiler (SSC) allows for rule-based, modular specification of interacting biomolecules in regions of arbitrarily complex geometries. Again, the system is represented using graphs, with chemical interactions or diffusion events formalised as graph-rewriting rules. The compiler then generates the entire reaction network before launching a stochastic reaction-diffusion algorithm. A different approach is taken by PySB, where model specification is embedded in the programming language
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
. A model (or part of a model) is represented as a Python programme. This allows users to store higher-order biochemical processes such as catalysis or
polymerisation In polymer chemistry, polymerization (American English), or polymerisation (British English), is a process of reacting monomer molecules together in a chemical reaction to form polymer chains or three-dimensional networks. There are many for ...
as macros and re-use them as needed. The models can be simulated and analysed using Python libraries, but PySB models can also be exported into BNGL, kappa, and SBML. Models involving multi-state and multi-component species can also be specified in Level 3 of the Systems Biology Markup Language (SBML) using the multi package. A draft specification is available. Thus, by only considering states and features important for a particular reaction, rule-based model specification eliminates the need to explicitly enumerate every possible molecular state that can undergo a similar reaction, and thereby allows for efficient specification.


The computation problem

When running
simulations A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of models; the model represents the key characteristics or behaviors of the selected system or process, whereas the s ...
on a biological model, any simulation software evaluates a set of rules, starting from a specified set of initial conditions, and usually iterating through a series of time steps until a specified end time. One way to classify simulation algorithms is by looking at the level of analysis at which the rules are applied: they can be population-based, single-particle-based or hybrid.


Population-based rule evaluation

In Population-based rule evaluation, rules are applied to populations. All
molecule A molecule is a group of two or more atoms held together by attractive forces known as chemical bonds; depending on context, the term may or may not include ions which satisfy this criterion. In quantum physics, organic chemistry, and bioch ...
s of the same
species In biology, a species is the basic unit of classification and a taxonomic rank of an organism, as well as a unit of biodiversity. A species is often defined as the largest group of organisms in which any two individuals of the appropriate s ...
in the same state are pooled together. Application of a specific rule reduces or increases the size of one of the pools, possibly at the expense of another. Some of the best-known classes of simulation approaches in computational biology belong to the population-based family, including those based on the numerical integration of ordinary and partial differential equations and the Gillespie stochastic simulation algorithm.
Differential equation In mathematics, a differential equation is an equation that relates one or more unknown functions and their derivatives. In applications, the functions generally represent physical quantities, the derivatives represent their rates of change, an ...
s describe changes in molecular concentrations over time in a deterministic manner. Simulations based on differential equations usually do not attempt to solve those equations analytically, but employ a suitable numerical solver. The stochastic Gillespie algorithm changes the composition of pools of molecules through a progression of
random In common usage, randomness is the apparent or actual lack of pattern or predictability in events. A random sequence of events, symbols or steps often has no :wikt:order, order and does not follow an intelligible pattern or combination. Ind ...
ness reaction events, the
probability Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
of which is computed from reaction rates and from the numbers of molecules, in accordance with the stochastic
master equation In physics, chemistry and related fields, master equations are used to describe the time evolution of a system that can be modelled as being in a probabilistic combination of states at any given time and the switching between states is determine ...
. In population-based approaches, one can think of the system being modeled as being in a given state at any given time point, where a state is defined according to the nature and size of the populated pools of molecules. This means that the space of all possible states can become very large. With some simulation methods implementing numerical integration of ordinary and partial differential equations or the Gillespie stochastic algorithm, all possible pools of molecules and the reactions they undergo are defined at the start of the simulation, even if they are empty. Such "generate-first" methods scale poorly with increasing numbers of molecular states. For instance, it has recently been estimated that even for a simple model of CaMKII with just 6 states per subunits and 10 subunits, it would take 290 years to generate the entire reaction network on a 2.54 GHz Intel
Xeon Xeon ( ) is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded system markets. It was introduced in June 1998. Xeon processors are based on the same arc ...
processor. In addition, the model generation step in generate-first methods does not necessarily terminate, for instance when the model includes assembly of proteins into complexes of arbitrarily large size, such as
actin Actin is a family of globular multi-functional proteins that form microfilaments in the cytoskeleton, and the thin filaments in muscle fibrils. It is found in essentially all eukaryotic cells, where it may be present at a concentration of over ...
filaments. In these cases, a termination condition needs to be specified by the user. Even if a large reaction system can be successfully generated, its simulation using population-based rule evaluation can run into computational limits. In a recent study, a powerful computer was shown to be unable to simulate a protein with more than 8
phosphorylation In chemistry, phosphorylation is the attachment of a phosphate group to a molecule or an ion. This process and its inverse, dephosphorylation, are common in biology and could be driven by natural selection. Text was copied from this source, wh ...
sites (2^8=256 phosphorylation states) using ordinary differential equations. Methods have been proposed to reduce the size of the state space. One is to consider only the states adjacent to the present state (i.e. the states that can be reached within the next iteration) at each time point. This eliminates the need for enumerating all possible states at the beginning. Instead, reactions are generated "on-the-fly" at each iteration. These methods are available both for stochastic and deterministic algorithms. These methods still rely on the definition of an (albeit reduced) reaction network - in contrast to the "network-free" methods discussed below. Even with "on-the-fly" network generation, networks generated for population-based rule evaluation can become quite large, and thus difficult - if not impossible - to handle computationally. An alternative approach is provided by particle-based rule evaluation.


Particle-based rule evaluation

In particle-based (sometimes called "agent-based") simulations, proteins, nucleic acids, macromolecular complexes or
small molecule Within the fields of molecular biology and pharmacology, a small molecule or micromolecule is a low molecular weight (≤ 1000 daltons) organic compound that may regulate a biological process, with a size on the order of 1 nm. Many drugs ar ...
s are represented as individual software
objects Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ...
, and their progress is tracked through the course of the entire simulation. Because particle-based rule evaluation keeps track of individual particles rather than populations, it comes at a higher computational cost when modeling systems with a high total number of particles, but a small number of kinds (or pools) of particles. In cases of combinatorial complexity, however, the modeling of individual particles is an advantage because, at any given point in the simulation, only existing molecules, their states and the reactions they can undergo need to be considered. Particle-based rule evaluation does not require the generation of complete or partial reaction networks at the start of the simulation or at any other point in the simulation and is therefore called "network-free". This method reduces the
complexity Complexity characterises the behaviour of a system or model whose components interaction, interact in multiple ways and follow local rules, leading to nonlinearity, randomness, collective dynamics, hierarchy, and emergence. The term is generall ...
of the model at the simulation stage, and thereby saves time and computational power.Hogg, J. S., Harris, L. A., Stover, L. J., Nair, N. S., & Faeder, J. R. (2013). Exact hybrid particle/population simulation of rule-based models of biochemical systems. arXiv preprint arXiv:1301.6854. The simulation follows each particle, and at each simulation step, a particle only "sees" the reactions (or rules) that apply to it. This depends on the state of the particle and, in some implementation, on the states of its neighbours in a holoenzyme or complex. As the simulation proceeds, the states of particles are updated according to the rules that are fired. Some particle-based simulation packages use an ad-hoc formalism for specification of reactants, parameters and rules. Others can read files in a recognised rule-based specification format such as BNGL.


Non-spatial particle-based methods

StochSim is a particle-based
stochastic Stochastic (, ) refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselv ...
simulator used mainly to model chemical reactions and other molecular transitions. The algorithm used in StochSim is different from the more widely known Gillespie stochastic algorithm in that it operates on individual entities, not entity pools, making it particle-based rather than population-based. In StochSim, each molecular species can be equipped with a number of binary state
flags A flag is a piece of fabric (most often rectangular or quadrilateral) with a distinctive design and colours. It is used as a symbol, a signalling device, or for decoration. The term ''flag'' is also used to refer to the graphic design employ ...
representing a particular modification. Reactions can be made dependent on a set of state flags set to particular values. In addition, the outcome of a reaction can include a state flag being changed. Moreover, entities can be arranged in geometric
arrays An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
(for instance, for holoenzymes consisting of several subunits), and reactions can be "neighbor-sensitive", i.e. the probability of a reaction for a given entity is affected by the value of a state flag on a neighboring entity. These properties make StochSim ideally suited to modeling multi-state molecules arranged in holoenzymes or complexes of specified size. Indeed, StochSim has been used to model clusters of
bacteria Bacteria (; singular: bacterium) are ubiquitous, mostly free-living organisms often consisting of one biological cell. They constitute a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria were among ...
l
chemotactic Chemotaxis (from '' chemo-'' + ''taxis'') is the movement of an organism or entity in response to a chemical stimulus. Somatic cells, bacteria, and other single-cell or multicellular organisms direct their movements according to certain chemical ...
receptors, and CaMKII holoenzymes. An extension to StochSim includes a particle-based simulator DYNSTOC, which uses a StochSim-like algorithm to simulate models specified in the BioNetGen language (BNGL), and improves the handling of molecules within macromolecular complexes. Another particle-based stochastic simulator that can read BNGL input files is RuleMonkey. Its simulation algorithm differs from the algorithms underlying both StochSim and DYNSTOC in that the simulation time step is variable. The Network-Free Stochastic Simulator (NFSim) differs from those described above by allowing for the definition of reaction rates as arbitrary mathematical or conditional expressions and thereby facilitates selective coarse-graining of models. RuleMonkey and NFsim implement distinct but related simulation algorithms. A detailed review and comparison of both tools is given by Yang and Hlavacek. It is easy to imagine a biological system where some components are complex multi-state molecules, whereas others have few possible states (or even just one) and exist in large numbers. A hybrid approach has been proposed to model such systems: Within the Hybrid Particle/Population (HPP) framework, the user can specify a rule-based model, but can designate some species to be treated as populations (rather than particles) in the subsequent simulation. This method combines the computational advantages of particle-based modeling for multi-state systems with relatively low molecule numbers and of population-based modeling for systems with high molecule numbers and a small number of possible states. Specification of HPP models is supported by BioNetGen, and simulations can be performed with NFSim.


Spatial particle-based methods

Spatial particle-based methods differ from the methods described above by their explicit representation of space. One example of a particle-based simulator that allows for a representation of cellular compartments is SRSim. SRSim is integrated in the LAMMPS molecular dynamics simulator and allows the user to specify the model in BNGL. SRSim allows users to specify the geometry of the particles in the simulation, as well as interaction sites. It is therefore especially good at simulating the assembly and structure of complex biomolecular complexes, as evidenced by a recent model of the inner
kinetochore A kinetochore (, ) is a disc-shaped protein structure associated with duplicated chromatids in eukaryotic cells where the spindle fibers attach during cell division to pull sister chromatids apart. The kinetochore assembles on the centromere and ...
. MCell allows individual molecules to be traced in arbitrarily complex geometric environments which are defined by the user. This allows for simulations of biomolecules in realistic reconstructions of living cells, including cells with complex geometries like those of
neuron A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa. N ...
s. The reaction compartment is a reconstruction of a dendritic spine. MCell uses an ad-hoc formalism within MCell itself to specify a multi-state model: In MCell, it is possible to assign "slots" to any molecular species. Each slot stands for a particular modification, and any number of slots can be assigned to a molecule. Each slot can be occupied by a particular state. The states are not necessarily binary. For instance, a slot describing binding of a particular
ligand In coordination chemistry, a ligand is an ion or molecule (functional group) that binds to a central metal atom to form a coordination complex. The bonding with the metal generally involves formal donation of one or more of the ligand's electr ...
to a protein of interest could take the states "unbound", "partially bound", and "fully bound". The slot-and-state syntax in MCell can also be used to model multimeric proteins or macromolecular complexes. When used in this way, a slot is a placeholder for a subunit or a molecular component of a
complex Complex commonly refers to: * Complexity, the behaviour of a system whose components interact in multiple ways so possible interactions are difficult to describe ** Complex system, a system composed of many components which may interact with each ...
, and the state of the slot will indicate whether a specific protein component is absent or present in the complex. A way to think about this is that MCell macromolecules can have several
dimensions In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordina ...
: A "state dimension" and one or more "spatial dimensions". The "state dimension" is used to describe the multiple possible states making up a multi-state protein, while the spatial dimension(s) describe
topological In mathematics, topology (from the Greek words , and ) is concerned with the properties of a geometric object that are preserved under continuous deformations, such as stretching, twisting, crumpling, and bending; that is, without closing h ...
relationships between neighboring subunits or members of a macromolecular complex. One drawback of this method for representing protein complexes, compared to Meredys, is that MCell does not allow for the
diffusion Diffusion is the net movement of anything (for example, atoms, ions, molecules, energy) generally from a region of higher concentration to a region of lower concentration. Diffusion is driven by a gradient in Gibbs free energy or chemical p ...
of complexes, and hence, of multi-state molecules. This can in some cases be circumvented by adjusting the diffusion constants of ligands that interact with the complex, by using checkpointing functions or by combining simulations at different levels.


Examples of multi-state models in biology

A (by no means exhaustive) selection of models of biological systems involving multi-state molecules and using some of the tools discussed here is give in the table below.


See also

*
Multiscale modeling Multiscale modeling or multiscale mathematics is the field of solving problems which have important features at multiple scales of time and/or space. Important problems include multiscale modeling of fluids, solids, polymers, proteins, nucleic ac ...
*
Rule-based modeling Rule-based modeling is a modeling approach that uses a set of rules that indirectly specifies a mathematical model. The rule-set can either be translated into a model such as Markov chains or differential equations, or be treated using tools that ...


References

{{reflist, 32em Biomolecules Cell signaling Chemical bonding Proteins Enzyme kinetics Stochastic simulation