GenoCAD is one of the earliest
computer assisted design tools for
synthetic biology.
The software is a bioinformatics tool developed and maintained by GenoFAB, Inc.. GenoCAD facilitates the design of protein expression vectors, artificial gene networks and other genetic constructs for
genetic engineering
Genetic engineering, also called genetic modification or genetic manipulation, is the modification and manipulation of an organism's genes using technology. It is a set of technologies used to change the genetic makeup of cells, including t ...
and is based on the theory of
formal languages.
History
GenoCAD originated as an offshoot of an attempt to formalize functional constraints of genetic constructs using the theory of
formal languages. In 2007, the website genocad.org (now retired) was set up as a proof of concept by researchers at
Virginia Bioinformatics Institute
The Biocomplexity Institute of Virginia Tech (formerly the Virginia Bioinformatics Institute) is a research organization specializing in bioinformatics, computational biology, and systems biology. The institute has more than 250 personnel, inclu ...
,
Virginia Tech. Using the website, users could design genes by repeatedly replacing high-level genetic constructs with lower level genetic constructs, and eventually with actual
DNA sequences.
On August 31, 2009, the
National Science Foundation granted a three-year $1,421,725 grant to Dr. Jean Peccoud, an associate professor at the
Virginia Bioinformatics Institute
The Biocomplexity Institute of Virginia Tech (formerly the Virginia Bioinformatics Institute) is a research organization specializing in bioinformatics, computational biology, and systems biology. The institute has more than 250 personnel, inclu ...
at
Virginia Tech, for the development of GenoCAD. GenoCAD was and continues to be developed b
GenoFAB, Inc. a company founded by Peccoud (currently
CSO and acting
CEO
A chief executive officer (CEO), also known as a central executive officer (CEO), chief administrator officer (CAO) or just chief executive (CE), is one of a number of corporate executives charged with the management of an organization especially ...
), who was also one of the authors of the originating study.
Source code for GenoCAD was originally released on
SourceForge in December 2009.
GenoCAD version 2.0 was released in November 2011 and included the ability to simulate the behavior of the designed genetic code. This feature was a result of a collaboration with the team behind
COPASI
COPASI (COmplex PAthway SImulator) is an open-source software application for creating and solving mathematical models of biological processes such as metabolic networks, cell-signaling pathways, regulatory networks, infectious diseases, and many ...
.
In April, 2015, Peccoud and colleagues published a library of biological parts, called GenoLIB, that can be incorporated into the GenoCAD platform.
Goals
The four aims of the project are to develop a:
#computer language to represent the structure of synthetic DNA molecules used in
E.coli
''Escherichia coli'' (),Wells, J. C. (2000) Longman Pronunciation Dictionary. Harlow ngland Pearson Education Ltd. also known as ''E. coli'' (), is a Gram-negative, facultative anaerobic, rod-shaped, coliform bacterium of the genus ''Esche ...
,
yeast,
mice
A mouse ( : mice) is a small rodent. Characteristically, mice are known to have a pointed snout, small rounded ears, a body-length scaly tail, and a high breeding rate. The best known mouse species is the common house mouse (''Mus musculus' ...
, and
Arabidopsis thaliana
''Arabidopsis thaliana'', the thale cress, mouse-ear cress or arabidopsis, is a small flowering plant native to Eurasia and Africa. ''A. thaliana'' is considered a weed; it is found along the shoulders of roads and in disturbed land.
A winter a ...
cells
#compiler capable of translating DNA sequences into mathematical models in order to predict the encoded phenotype
#collaborative workflow environment which allow to share parts, designs, fabrication resource
#means to forward the results to the user community through an external advisory board, an annual user conference, and outreach to industry
Features
The main features of GenoCAD can be organized into three main categories.
[
]
* Management of genetic sequences: The purpose of this group of features is to help users identify, within large collections of genetic parts, the parts needed for a project and to organize them in project-specific libraries.
**''Genetic parts'': Parts have a unique identifier, a name and a more general description. They also have a
DNA sequence
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
. Parts are associated with a
grammar and assigned to a parts category such a
promoter,
gene, etc.
** ''Parts libraries'': Collections of parts are organized in libraries. In some cases part libraries correspond to parts imported from a single source such as another
sequence database. In other cases, libraries correspond to the parts used for a particular design project. Parts can be moved from one library to another through a temporary storage area called the cart (analogous to e-commerce shopping carts).
** ''Searching parts'': Users can search the parts database using the
Lucene search engine. Basic and advanced search modes are available. Users can develop complex queries and save them for future reuse.
** ''Importing/Exporting parts'': Parts can be imported and exported individually or as entire libraries using standard file formats (e.g.,
GenBank,
tab delimited,
FASTA,
SBML).
* Combining sequences into genetic constructs: The purpose of this group of features is to streamline the process of combining genetic parts into designs compliant with a specific design strategy.
** ''Point-and-click design tool'': This
wizard guides the user through a series of design decisions that determine the design structure and the selection of parts included in the design.
** ''Design management'': Designs can be saved in the user
workspace. Design statuses are regularly updated to warn users of the consequences of editing parts on previously saved designs.
** ''Exporting designs'': Designs can be exported using standard file formats (e.g.,
GenBank,
tab delimited,
FASTA).
** ''Design safety'': Designs are protected from some types of errors by forcing the user to follow the appropriate design strategy.
** ''Simulation'': Sequences designed in GenoCAD can be simulated to display chemical production in the resulting cell.
* User workspace: Users can personalize their
workspace by adding parts to the GenoCAD database, creating specialized libraries corresponding to specific design projects, and saving designs at different stages of development.
Theoretical foundation
GenoCAD is rooted in the theory of
formal languages; in particular, the design rules describing how to combine different kinds of parts and form
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
s.
[
]
A context free grammar can be defined by its terminals, variables, start variable and substitution rules.
In GenoCAD, the terminals of the grammar are sequences of
DNA that perform a particular biological purpose (e.g. a
promoter). The variables are less homogeneous: they can represent longer sequences that have multiple functions or can represent a section of DNA that can contain one of multiple different sequences of DNA but perform the same function (e.g. a variable represents the set of promoters). GenoCAD includes built in substitution rules to ensure that the DNA sequence is biologically viable. Users can also define their own sets of rules for other purposes.
Designing a sequence of DNA in GenoCAD is much like creating a derivation in a context free grammar. The user starts with the start variable and repeatedly selects a variable and a substitution for it until only terminals are left.
Alternatives
The most common alternatives to GenoCAD are Proto, GEC and EuGene
References
{{Reflist
External links
GenoCAD.comProject pageon
SourceForge
Tutorials and FAQsPeccoud Lab
Synthetic biology
Free bioinformatics software
Systems biology
Biotechnology