The Open Regulatory Annotation Database (also known as ORegAnno) is designed to promote community-based curation of regulatory information. Specifically, the database contains information about
regulatory regions
A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the Gene expression, expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living o ...
,
transcription factor
In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
binding sites, regulatory variants, and
haplotypes.
Overview
Data Management
For each entry, cross-references are maintained t
EnsEMBLdbSNPEntrez Genethe NCBI Taxonomy databasean
PubMed The information within ORegAnno is regularly mapped and provided as
UCSC Genome Browsertrack. Furthermore, each entry is associated with its experimental evidence, embedded as a
Evidence Ontologywithin ORegAnno. This allows the researcher to analyze regulatory data using their own conditions as to the suitability of the supporting evidence.
Software and data access
The project is
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
- all data and all software that is produced in the project can be freely accessed and used.
Database contents
As of December 20, 2006, ORegAnno contained 4220 regulatory sequences (excluding deprecated records) for 2190 transcription factor binding sites, 1853 regulatory regions (enhancers, promoters, etc.), 170 regulatory polymorphisms, and 7 regulatory haplotypes for 17 different organisms (predominantly
Drosophila melanogaster
''Drosophila melanogaster'' is a species of fly (an insect of the Order (biology), order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly", "pomace fly" ...
,
Homo sapiens
Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are Hominidae, great apes characterized by their Prehistory of nakedness and clothing ...
,
Mus musculus
The house mouse (''Mus musculus'') is a small mammal of the rodent family Muridae, characteristically having a pointed snout, large rounded ears, and a long and almost hairless tail. It is one of the most abundant species of the genus ''Mus (genu ...
,
Caenorhabditis elegans
''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a Hybrid word, blend of the Greek ''caeno-'' (recent), ''r ...
, and
Rattus norvegicus
''Rattus'' is a genus of muroid rodents, all typically called rats. However, the term rat can also be applied to rodent species outside of this genus.
Species and description
The best-known ''Rattus'' species are the black rat (''R. rattus' ...
in that order). These records were obtained by manual curation of 828 publications by 45 ORegAnno users from the gene regulation community. The ORegAnno publication queue contained 4215 publications of which 858 were closed, 34 were in progress (open status), and 3321 were awaiting annotation (pending status). ORegAnno is continually updated and therefore current database contents should be obtained fro
www.oreganno.org
RegCreative Jamboree 2006
The RegCreative jamboree was stimulated by a community initiative to curate in perpetuity the genomic sequences which have been experimentally determined to control gene expression. This objective is of fundamental importance to evolutionary analysis and translational research as regulatory mechanisms are widely implicated in species-specific adaptation and the etiology of disease. This initiative culminated in the formation of an international consortium of like-minded scientists dedicated to accomplishing this task. The RegCreative jamboree was the first opportunity for these groups to meet to be able to accurately assess the current state of knowledge in gene regulation and to begin to develop standards by which to curate regulatory information.
In total, 44 researchers attended the workshop from 9 different countries and 23 institutions. Funding was also obtained from ENFIN, the BioSapiens Network, FWO Research Foundation, Genome Canada and Genome British Columbia.
The specific outcomes of the RegCreative meeting to date are:
* Prior to the RegCreative Jamboree, attendees were asked to participate in an interannotator agreement assessment. Two ORegAnno mirrors were established with identical sets of publications to be annotated in their queue. In total, 33 redundant annotations from 18 publications were collected. (79 annotations for 31 papers and 60 annotations for 21 papers were collected on servers 1 and 2, respectively.) This effort was used as a baseline from which to establish annotator efficiency.
* Hands-on annotation activities occurred during the first 2 days of the 3-day workshop. In total, 39 researchers contributed 184 TFBS and 317 Regulatory Regions from 96 papers. Many of these researchers were also trained on the ORegAnno system, significantly increasing its experienced-user community. The contribution of these annotations to individual species was 339 annotations in
Homo sapiens
Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are Hominidae, great apes characterized by their Prehistory of nakedness and clothing ...
, 42 annotations in
Mus musculus
The house mouse (''Mus musculus'') is a small mammal of the rodent family Muridae, characteristically having a pointed snout, large rounded ears, and a long and almost hairless tail. It is one of the most abundant species of the genus ''Mus (genu ...
, 72 annotations in
Drosophila melanogaster
''Drosophila melanogaster'' is a species of fly (an insect of the Order (biology), order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly", "pomace fly" ...
, 24 annotations in
Ciona intestinalis, 14 annotations in
Rattus norvegicus
''Rattus'' is a genus of muroid rodents, all typically called rats. However, the term rat can also be applied to rodent species outside of this genus.
Species and description
The best-known ''Rattus'' species are the black rat (''R. rattus' ...
, 6 annotations in Halocynthia roretzi, 2 annotations in Ciona savignyi and 2 annotations in
HIV. Within these annotations, one new dataset was added to ORegAnno; 274 human enhancers were programmatically annotated by Maximillian Haessler, Institute Alfred Fessard, from Visel et al., Nucleic Acids Research, 2006. In total, 130 scientific studies were examined in depth. The annotated papers were pre-selected from expert-curated publications in the ORegAnno queue that had full-text available through
HighWire Press.
* There exists an immediate need for improved data standardization and development of associated ontologies. Specifically, this should include the open access development and integration of transcription factor naming conventions and sequence, cell type, cell line, tissue, and evidence ontologies. The groundwork for addressing and prioritizing these needs was accomplished in several ways during the meeting:
** Transcription factor naming issues were addressed through discussion of integration of transcription factor prediction pipelines, such as DBD or flyTF, which have been supplemented with manual curation versus solely manual curated implementations like TFcat.
*
Marc Halfon University at Buffalo, led a breakout session to improve th
Sequence Ontologyfrom existing ORegAnno and REDfly database conventions within the framework being developed as part of the
Open Biomedical Ontologies. A preliminary version of these improvements can be found on th
ORegAnno wiki
** Learning-based ontology development was widely regarded as an essential feature of the annotation process. Such that, annotators are not restricted from annotating based on the limitations of the controlled vocabulary and that these exceptions can be used to further develop the backbone ontologies.
** Ontology development should be decentralized from the ORegAnno annotation framework. Specifically, it is planned that the ORegAnno evidence ontology will be removed and made available to broader community development.
** Renewed focus on integrating species-specific resources with annotation framework.
* A specific focus of the workshop was addressing the role of
text mining in facilitating regulatory annotation. Sessions were led by Dr. Lynette Hirschman, MITRE, and Dr. Martin Krallinger, CNIO, to formulate where text-mining can help. A short term object of text-mining based analyses was formulated around both populating the ORegAnno queue and using the expert-curated portion of the ORegAnno queue to validate text-mining-based publication acquisition. The latter objectives are being led by Dr.
Stein Aerts, University of Leuven.
References
*
*
* {{cite journal, vauthors=Lesurf R, Cotto KC, Wang G, Griffith M, Kasaian K, Jones SJ, Montgomery SB, Griffith OL, ((Open Regulatory Annotation Consortium)) , title=ORegAnno 3.0: a community-driven resource for curated regulatory annotation., journal=Nucleic Acids Research, year=2016, volume=44, issue=D1, pages=D126-32, pmid=26578589, doi=10.1093/nar/gkv1203, pmc=4702855
External links
ORegAnnoRegCreative Jamboree 2006
Biological databases
Gene expression