Systems Biology Markup Language
   HOME

TheInfoList



OR:

The Systems Biology Markup Language (SBML) is a representation format, based on
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
, for communicating and storing
computational model A computational model uses computer programs to simulate and study complex systems using an algorithmic or mechanistic approach and is widely used in a diverse range of fields spanning from physics, engineering, chemistry and biology to economics ...
s of biological processes. It is a free and open standard with widespread software support and a community of users and developers. SBML can represent many different classes of biological phenomena, including
metabolic network A metabolic network is the complete set of metabolic and physical processes that determine the physiological and biochemical properties of a cell. As such, these networks comprise the chemical reactions of metabolism, the metabolic pathways, as ...
s,
cell signaling In biology, cell signaling (cell signalling in British English) is the Biological process, process by which a Cell (biology), cell interacts with itself, other cells, and the environment. Cell signaling is a fundamental property of all Cell (biol ...
pathways, regulatory networks,
infectious disease An infection is the invasion of tissue (biology), tissues by pathogens, their multiplication, and the reaction of host (biology), host tissues to the infectious agent and the toxins they produce. An infectious disease, also known as a transmis ...
s, and many others. It has been proposed as a standard for representing computational models in systems biology today.


History

Late in the year 1999 through early 2000, with funding from the Japan Science and Technology Corporation (JST),
Hiroaki Kitano is a Japanese scientist. He is the head of th Systems Biology Institute(SBI); Senior Executive Vice President and Chief Technology Officer of Sony Group Corporation, Chief Executive Officer of Sony Research Inc. and Sony Computer Science Labora ...
and John C. Doyle assembled a small team of researchers to work on developing better software infrastructure for
computational modeling Computer simulation is the running of a mathematical model on a computer, the model being designed to represent the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be determin ...
in
systems biology Systems biology is the computational modeling, computational and mathematical analysis and modeling of complex biological systems. It is a biology-based interdisciplinary field of study that focuses on complex interactions within biological system ...
. Hamid Bolouri was the leader of the development team, which consisted of Andrew Finney, Herbert Sauro, and Michael Hucka. Bolouri identified the need for a framework to enable interoperability and sharing between the different simulation software systems for biology in existence during the late 1990s, and he organized an informal workshop in December 1999 at the
California Institute of Technology The California Institute of Technology (branded as Caltech) is a private research university in Pasadena, California, United States. The university is responsible for many modern scientific advancements and is among a small group of institutes ...
to discuss the matter. In attendance at that workshop were the groups responsible for the development of DBSolve, E-Cell, Gepasi, Jarnac, StochSim, and The Virtual Cell. Separately, earlier in 1999, some members of these groups also had discussed the creation of a portable file format for metabolic network models in the BioThermoKinetics (BTK) group. The same groups who attended the first Caltech workshop met again on April 28–29, 2000, at the first of a newly created meeting series called ''Workshop on Software Platforms for Systems Biology''. It became clear during the second workshop that a ''common model representation format'' was needed to enable the exchange of models between software tools as part of any functioning interoperability framework, and the workshop attendees decided the format should be encoded in
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
. The Caltech ERATO team developed a proposal for this XML-based format and circulated the draft definition to the attendees of the 2nd Workshop on Software Platforms for Systems Biology in August 2000. This draft underwent extensive discussion over mailing lists and during the 2nd Workshop on Software Platforms for Systems Biology, held in
Tokyo Tokyo, officially the Tokyo Metropolis, is the capital of Japan, capital and List of cities in Japan, most populous city in Japan. With a population of over 14 million in the city proper in 2023, it is List of largest cities, one of the most ...
, Japan, in November 2000 as a satellite workshop of the ICSB 2000 conference. After further revisions, discussions and software implementations, the Caltech team issued a specification for SBML Level 1, Version 1 in March 2001. SBML Level 2 was conceived at the 5th Workshop on Software Platforms for Systems Biology, held in July 2002, at the
University of Hertfordshire The University of Hertfordshire (UH) is a Universities in the United Kingdom, university in Hertfordshire, United Kingdom. The university is based largely in Hatfield, Hertfordshire, Hatfield, Hertfordshire. Its antecedent institution, Hatfield ...
, UK. By this time, far more people were involved than the original group of SBML collaborators and the continued evolution of SBML became a larger community effort, with many new tools having been enhanced to support SBML. The workshop participants in 2002 collectively decided to revise the form of SBML in Level 2. The first draft of the Level 2 Version 1 specification was released in August 2002, and the final set of features was finalized in May 2003 at the 7th Workshop on Software Platforms for Systems Biology in Ft. Lauderdale, Florida. The next iteration of SBML took two years in part because software developers requested time to absorb and understand the larger and more complex SBML Level 2. The inevitable discovery of limitations and errors led to the development of SBML Level 2 Version 2, issued in September 2006. By this time, the team of SBML Editors (who reconcile proposals for changes and write a coherent final specification document) had changed and now consisted of Andrew Finney, Michael Hucka and Nicolas Le Novère. SBML Level 2 Version 3 was published in 2007 after countless contributions by and discussions with the SBML community. 2007 also saw the election of two more SBML Editors as part of the introduction of the modern SBML Editor organization in the context of the SBML development process. SBML Level 2 Version 4 was published in 2008 after certain changes in Level 2 were requested by popular demand. (For example, an electronic vote by the SBML community in late 2007 indicated a majority preferred not to require strict unit consistency before an SBML model is considered valid.) Version 4 was finalized after the SBML Forum meeting held in
Gothenburg Gothenburg ( ; ) is the List of urban areas in Sweden by population, second-largest city in Sweden, after the capital Stockholm, and the fifth-largest in the Nordic countries. Situated by the Kattegat on the west coast of Sweden, it is the gub ...
, Sweden, as a satellite workshop of ICSB 2008 in the fall of 2008. SBML Level 3 Version 1 Core was published in final form in 2010, after prolonged discussion and revision by the SBML Editors and the SBML community. It contains numerous significant changes in syntax and constructs from Level 2 Version 4, but also represents a new modular base for continued expansion of SBML's features and capabilities going into the future. SBML Level 2 Version 5 was published in 2015. This revision included a number of textual (but not structural) changes in response to user feedback, thereby addressing the list of errata collected over many years for the SBML Level 2 Version 4 specification. In addition, Version 5 introduced a facility to use nested annotations within SBML's annotation format (an annotation format that is based on a subset of RDF).


The language

SBML is sometimes incorrectly assumed to be limited in scope only to biochemical network models because the original publications and early software focused on this domain. In reality, although the central features of SBML are indeed oriented towards representing chemical reaction-like processes that act on entities, this same formalism serves analogously for many other types of processes; moreover, SBML has language features supporting the direct expression of mathematical formulas and discontinuous events separate from reaction processes, allowing SBML to represent much more than solely biochemical reactions. Evidence for SBML's ability to be used for more than merely descriptions of biochemistry can be seen in the variety of models available from
BioModels Database BioModels is a free and open-source repository for storing, exchanging and retrieving quantitative models of biological interest created in 2006. All the models in the curated section of BioModels Database have been described in peer-reviewed scie ...
.


Purposes

SBML has three main purposes: * enable the use of multiple software tools without having to rewrite models to conform to every tool's idiosyncratic file format; * enable models to be shared and published in a form that other researchers can use even when working with different software environments; * ensure the survival of models beyond the lifetime of the software used to create them. SBML is not an attempt to define a universal language for quantitative models. SBML's purpose is to serve as a ''
lingua franca A lingua franca (; ; for plurals see ), also known as a bridge language, common language, trade language, auxiliary language, link language or language of wider communication (LWC), is a Natural language, language systematically used to make co ...
''—an exchange format used by different present-day software tools to communicate the essential aspects of a computational model.


Main capabilities

SBML can encode models consisting of entities (called ''species'' in SBML) acted upon by processes (called ''reactions''). An important principle is that models are decomposed into explicitly-labeled constituent elements, the set of which resembles a verbose rendition of chemical reaction equations (if the model uses reactions) together with optional explicit equations (again, if the model uses these); the SBML representation deliberately does not cast the model directly into a set of differential equations or other specific interpretation of the model. This explicit, modeling-framework-agnostic decomposition makes it easier for a software tool to interpret the model and translate the SBML form into whatever internal form the tool actually uses. A software package can read an SBML model description and translate it into its own internal format for model analysis. For example, a package might provide the ability to simulate the model by constructing differential equations and then perform numerical time integration on the equations to explore the model's dynamic behavior. Or, alternatively, a package might construct a discrete
stochastic Stochastic (; ) is the property of being well-described by a random probability distribution. ''Stochasticity'' and ''randomness'' are technically distinct concepts: the former refers to a modeling approach, while the latter describes phenomena; i ...
representation of the model and use a
Monte Carlo Monte Carlo ( ; ; or colloquially ; , ; ) is an official administrative area of Monaco, specifically the Ward (country subdivision), ward of Monte Carlo/Spélugues, where the Monte Carlo Casino is located. Informally, the name also refers to ...
simulation method such as the
Gillespie algorithm In probability theory, the Gillespie algorithm (or the Doob–Gillespie algorithm or stochastic simulation algorithm, the SSA) generates a statistically correct trajectory (possible solution) of a stochastic equation system for which the reaction r ...
. SBML allows models of arbitrary complexity to be represented. Each type of component in a model is described using a specific type of data structure that organizes the relevant information. The data structures determine how the resulting model is encoded in XML. In addition to the elements above, another important feature of SBML is that every entity can have machine-readable annotations attached to it. These annotations can be used to express relationships between the entities in a given model and entities in external resources such as databases. A good example of the value of this is in BioModels Database, where every model is annotated and linked to relevant data resources such as publications, databases of compounds and pathways, controlled vocabularies, and more. With annotations, a model becomes more than simply a rendition of a mathematical construct—it becomes a semantically-enriched framework for communicating knowledge.


Levels and versions

SBML is defined in Levels: upward-compatible specifications that add features and expressive power. Software tools that do not need or cannot support the complexity of higher Levels can go on using lower Levels; tools that can read higher Levels are assured of also being able to interpret models defined in the lower Levels. Thus new Levels do not supersede previous ones. However, each Level can have multiple Versions within it, and new Versions of a Level do supersede old Versions of that same Level. There are currently three Levels of SBML defined. The current Versions within those Levels are the following: * Level 3 Version 2 Core, for which the final Release 2 specification was issued 26 April 2019 * Level 2 Version 5 Release 1 * Level 1 Version 2 Open-source software infrastructure such as libSBML and JSBML allows developers to support all Levels of SBML their software with a minimum amount of effort. The SBML Team maintains a public issue tracker where readers may report errors or other issues in the SBML specification documents. Reported issues are eventually put on the list of official errata associated with each specification release. The lists of errata are documented on th
Specifications
page of SBML.org.


Level 3 packages

Development of SBML Level 3 has been proceeding in a modular fashion. The ''Core'' specification is a complete format that can be used alone. Additional Level 3 packages can be layered on to this core to provide additional, optional features.


Hierarchical Model Composition

The Hierarchical Model Composition package, known as "''comp''", was released in November 2012. This package provides the ability to include models as submodels inside another model. The goal is to support the ability of modelers and software tools to do such things as (1) decompose larger models into smaller ones, as a way to manage complexity; (2) incorporate multiple instances of a given model within one or more enclosing models, to avoid literal duplication of repeated elements; and (3) create libraries of reusable, tested models, much as is done in software development and other engineering fields. The specification was the culmination of years of discussion by a wide number of people.


Flux Balance Constraints

The Flux Balance Constraints package (nicknamed "''fbc''") was first released in February, 2013. Import revisions were introduced as part of Version 2, released in September, 2015. The "''fbc''" package provides support for constraint-based modeling, frequently used to analyze and study biological networks on both a small and large scale. This SBML package makes use of standard components from the SBML Level 3 core specification, including species and reactions, and extends them with additional attributes and structures to allow modelers to define such things as flux bounds and optimization functions.


Qualitative Models

The Qualitative Models or "''qual''" package for SBML Level 3 was released in May 2013. This package supports the representation of models where an in-depth knowledge of the biochemical reactions and their kinetics is missing and a qualitative approach must be used. Examples of phenomena that have been modeled in this way include gene regulatory networks and signaling pathways, basing the model structure on the definition of regulatory or influence graphs. The definition and use of some components of this class of models differ from the way that species and reactions are defined and used in ''core'' SBML models. For example, qualitative models typically associate discrete levels of activities with entity pools; consequently, the processes involving them cannot be described as reactions per se, but rather as transitions between states. These systems can be viewed as reactive systems whose dynamics are represented by means of state transition graphs (or other Kripke structures ) in which the nodes are the reachable states and the edges are the state transitions.


Layout

The SBML ''layout'' package originated as a set of annotation conventions usable in SBML Level 2. It was introduced at the SBML Forum in St. Louis in 2004. Ralph Gauges wrote the specification and provided an implementation that was widely used. This original definition was reformulated as an SBML Level 3 package, and a specification was formally released in August, 2013. The SBML Level 3 Layout package provides a specification for how to represent a reaction network in a graphical form. It is thus better tailored to the task than the use of an arbitrary drawing or graph. The SBML Level 3 package only deals with the information necessary to define the position and other aspects of a graph's layout; the additional details necessary to complete the graph—namely, how the visual aspects are meant to be rendered— are the purview of the separate SBML Level 3 package called ''Rendering'' (nicknamed "''render''"). As of November 2015, a draft specification for the "''render''" package is available, but it has not yet been officially finalized.


Packages under development

Development of SBML Level 3 packages is being undertaken such that specifications are reviewed and implementations attempted during the development process. Once a specification is stable and there are two implementations that support it, the package is considered accepted. The packages detailed above have all reached the acceptance stage. The table below gives a brief summary of packages that are currently in the development phase.


Structure

A model definition in SBML Levels 2 and 3 consists of lists of one or more of the following components: * Function definition: A named mathematical function that may be used throughout the rest of a model. * Unit definition: A named definition of a new unit of measure, or a redefinition of an existing SBML default unit. Named units can be used in the expression of quantities in a model. * Compartment Type (only in SBML Level 2): A type of location where reacting entities such as chemical substances may be located. * Species type (only in SBML Level 2): A type of entity that can participate in reactions. Examples of species types include ions such as Ca2+, molecules such as glucose or ATP, binding sites on a protein, and more. * Compartment: A well-stirred container of a particular type and finite-size where species may be located. A model may contain multiple compartments of the same compartment type. Every species in a model must be located in a compartment. * Species: A pool of entities of the same species type located in a specific compartment. * Parameter: A quantity with a symbolic name. In SBML, the term parameter is used in a generic sense to refer to named quantities regardless of whether they are constants or variables in a model. * Initial Assignment: A mathematical expression used to determine the initial conditions of a model. This type of structure can only be used to define how the value of a variable can be calculated from other values and variables at the start of simulated time. * Rule: A mathematical expression used in combination with the differential equations constructed based on the set of reactions in a model. It can be used to define how a variable's value can be calculated from other variables or used to define the rate of change of a variable. The set of rules in a model can be used with the reaction rate equations to determine the behavior of the model with respect to time. The set of rules constrains the model for the entire duration of simulated time. * Constraint: A mathematical expression that defines a constraint on the values of model variables. The constraint applies at all instants of simulated time. The set of constraints in the model should not be used to determine the behavior of the model with respect to time. * Reaction: A statement describing some transformation, transport or binding process that can change the amount of one or more species. For example, a reaction may describe how certain entities (reactants) are transformed into certain other entities (products). Reactions have associated kinetic rate expressions describing how quickly they take place. * Event: A statement describing an instantaneous, discontinuous change in a set of variables of any type (species concentration, compartment size or parameter value) when a triggering condition is satisfied.


DSLs Supporting SBML

SBML is primarily a format for the exchange of systems biology models between software modeling tools or for archiving models in repositories such as BiGG,
BioModels BioModels is a free and open-source repository for storing, exchanging and retrieving quantitative models of biological interest created in 2006. All the models in the curated section of BioModels Database have been described in peer-reviewed scie ...
, or
JWS Online JWS may refer to: * Jackson–Weiss syndrome * Java Web Start * Java Web Services Development Pack * John Wesley Shipp, American actor * John Woolman School, in Nevada City, California * ''Journal of Web Semantics'' * JSON Web Signature * ''Just Wo ...
. Since SBML is encoded in XML and in particular uses MathML for representing mathematics, the format is not human-readable. As a result, other groups have developed human-readable formats that can be converted to and from SBML.


SBML-shorthand

SBML shorthand is a specification and associated Python tooling to interconvert SBML and the shorthand notation. The format was developed by the UK Newcastle systems biology group sometime before 2006. Its aim was to enable modelers to more rapidly create models without having to either write raw XML or use GUI tools. Two Python tools are provided, mod2sbml.py and sbml2mod.py. The
libSBML LibSBML is an open-source software library that provides an application programming interface (API) for the SBML (Systems Biology Markup Language ) format. The libSBML library can be embedded in a software application or used in a web servl ...
package for Python is required to assist in the conversion. Currently, SBML-shorthand supports SBML Level 3, version 1. The following code is an example of SBML-shorthand being used to describe the simple enzyme-substrate mechanism. @compartments cell=1 @species cell:Substrate=10 cell:Enzyme=5 cell:Complex=0 cell:Product=0 @parameters k1=1 k1r=2 @reactions @rr=Binding Substrate+Enzyme -> Complex k1*Substrate*Enzyme-k1r*Complex @r=Conversion Complex -> Product + Enzyme kcat*Complex : kcat=3


Antimony

Antimony is based on an earlier DSL implemented in the Jarnac modeling application. That, in turn, was based on the SCAMP modeling application which ultimately drew inspiration from the DSL language developed by David Garfinkel for the BIOSIM simulator. Like SBML-shorthand, Antimony provides a simplified text representation of SBML. It uses a minimum of punctuation characters which renders the text easier to read and understand. It also allows users to add comments. Antimony is implemented using C/C++ and
Bison A bison (: bison) is a large bovine in the genus ''Bison'' (from Greek, meaning 'wild ox') within the tribe Bovini. Two extant taxon, extant and numerous extinction, extinct species are recognised. Of the two surviving species, the American ...
as the grammar parser. However, the distribution also includes Python bindings which can be installed using pip to make it easy to use from Python. It is also available via the Tellurium package. More recently, a JavaScript/WASM version has been generated which allows the Antimony language to be used on the web. The website too
makesbml
uses the Javascript version. Antimony supports SBML Level 3, version 2. Antimony also supports the following SBML packages: Hierarchical Model Composition, Flux Balance Constraints, and Distributions. The following example illustrates Antimony being used to describe a simple enzyme-kinetics model: binding: Substrate + Enzyme -> Complex; k1*Substrate*Enzyme - k1r*Complex; Conversion: Complex -> Product + Enzyme; kcat*Complex; // Species initializations Substrate = 10; Enzyme = 5; Complex = 0; Product = 0; // Variable initializations k1 = 1; k1r = 2; kcat = 3;


Community

As of February 2020, nearly 300 software systems advertise support for SBML. A current list is available in the form of th
SBML Software Guide
hosted at SBML.org. SBML has been and continues to be developed by the community of people making software platforms for systems biology, through active email discussion lists and biannual workshops. The meetings are often held in conjunction with other biology conferences, especially the International Conference on Systems Biology (ICSB). The community effort is coordinated by an elected editorial board made up of five members. Each editor is elected for a 3-year non-renewable term. Tools such as an online model validator as well as
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
libraries for incorporating SBML into software programmed in the C, C++,
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
,
Mathematica Wolfram (previously known as Mathematica and Wolfram Mathematica) is a software system with built-in libraries for several areas of technical computing that allows machine learning, statistics, symbolic computation, data manipulation, network ...
,
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
and other languages are developed partly by the SBML Team and partly by the broader SBML community. SBML is an official
IETF The Internet Engineering Task Force (IETF) is a standards organization for the Internet standard, Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster ...
MIME A mime artist, or simply mime (from Greek language, Greek , , "imitator, actor"), is a person who uses ''mime'' (also called ''pantomime'' outside of Britain), the acting out of a story through body motions without the use of speech, as a the ...
type, specified by RFC 3823.


See also

*
BioModels Database BioModels is a free and open-source repository for storing, exchanging and retrieving quantitative models of biological interest created in 2006. All the models in the curated section of BioModels Database have been described in peer-reviewed scie ...
*
BioPAX BioPAX (Biological Pathway Exchange) is a RDF/OWL-based standard language to represent biological pathways at the molecular and cellular level. Its major use is to facilitate the exchange of pathway data. Pathway data captures our understanding of ...
*
CellML CellML is an XML based markup language for describing mathematical models. Although it could theoretically describe any mathematical model, it was originally created with the Physiome Project in mind, and hence used primarily to describe models r ...
*
MIASE The minimum information about a simulation experiment (MIASE){{cite journal, author1=D. Waltemath , author2=Richard Adams , author3=Daniel A. Beard , author4=rank T. Bergmann , author5=Upinder S. Bhalla , author6=Randall Britten , author7=Vijayalak ...
*
MIRIAM Miriam (, lit. ‘rebellion’) is described in the Hebrew Bible as the daughter of Amram and Jochebed, and the older sister of Moses and Aaron. She was a prophetess and first appears in the Book of Exodus. The Torah refers to her as "Miria ...
* Systems Biology Ontology *
Systems Biology Graphical Notation The Systems Biology Graphical Notation (SBGN) is a standard graphical representation intended to foster the efficient storage, exchange and reuse of information about signaling pathways, metabolic networks, and gene regulatory networks amongst co ...


References


External links


SBML home page

SBML-related presentations and posters
from
Nature Precedings ''Nature Precedings'' was an open access electronic preprint repository of scholarly work in the fields of biomedical sciences, chemistry, and earth sciences. It ceased accepting new submissions as of April 3, 2012. ''Nature Precedings'' functi ...

COmputational Modeling in BIology NEtwork
{{COMBINE XML markup languages Industry-specific XML-based standards Systems biology