HOME

TheInfoList



OR:

The Open Regulatory Annotation Database (also known as ORegAnno) is designed to promote community-based curation of regulatory information. Specifically, the database contains information about
regulatory regions A regulatory sequence is a segment of a nucleic acid molecule which is capable of increasing or decreasing the expression of specific genes within an organism. Regulation of gene expression is an essential feature of all living organisms and vir ...
,
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The fu ...
binding sites In biochemistry and molecular biology, a binding site is a region on a macromolecule such as a protein that binds to another molecule with specificity. The binding partner of the macromolecule is often referred to as a ligand. Ligands may inclu ...
, regulatory variants, and
haplotypes A haplotype (haploid genotype) is a group of alleles in an organism that are inherited together from a single parent. Many organisms contain genetic material ( DNA) which is inherited from two parents. Normally these organisms have their DNA org ...
.


Overview


Data Management

For each entry, cross-references are maintained t
EnsEMBLdbSNPEntrez Genethe NCBI Taxonomy database
an
PubMed
The information within ORegAnno is regularly mapped and provided as
UCSC Genome Browser
track. Furthermore, each entry is associated with its experimental evidence, embedded as a
Evidence Ontology
within ORegAnno. This allows the researcher to analyze regulatory data using their own conditions as to the suitability of the supporting evidence.


Software and data access

The project is
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
- all data and all software that is produced in the project can be freely accessed and used.


Database contents

As of December 20, 2006, ORegAnno contained 4220 regulatory sequences (excluding deprecated records) for 2190 transcription factor binding sites, 1853 regulatory regions (enhancers, promoters, etc.), 170 regulatory polymorphisms, and 7 regulatory haplotypes for 17 different organisms (predominantly
Drosophila melanogaster ''Drosophila melanogaster'' is a species of fly (the taxonomic order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly" or "pomace fly". Starting with Ch ...
,
Homo sapiens Humans (''Homo sapiens'') are the most abundant and widespread species of primate, characterized by bipedalism and exceptional cognitive skills due to a large and complex brain. This has enabled the development of advanced tools, culture, ...
,
Mus musculus Mus or MUS may refer to: Abbreviations * MUS, the NATO country code for Mauritius * MUS, the IATA airport code for Minami Torishima Airport * MUS, abbreviation for the Centre for Modern Urban Studies on Campus The Hague, Leiden University, Neth ...
,
Caenorhabditis elegans ''Caenorhabditis elegans'' () is a free-living transparent nematode about 1 mm in length that lives in temperate soil environments. It is the type species of its genus. The name is a blend of the Greek ''caeno-'' (recent), ''rhabditis'' (ro ...
, and
Rattus norvegicus ''Rattus'' is a genus of muroid rodents, all typically called rats. However, the term rat can also be applied to rodent species outside of this genus. Species and description The best-known ''Rattus'' species are the black rat (''R. rattus'') ...
in that order). These records were obtained by manual curation of 828 publications by 45 ORegAnno users from the gene regulation community. The ORegAnno publication queue contained 4215 publications of which 858 were closed, 34 were in progress (open status), and 3321 were awaiting annotation (pending status). ORegAnno is continually updated and therefore current database contents should be obtained fro
www.oreganno.org


RegCreative Jamboree 2006

The RegCreative jamboree was stimulated by a community initiative to curate in perpetuity the genomic sequences which have been experimentally determined to control gene expression. This objective is of fundamental importance to evolutionary analysis and translational research as regulatory mechanisms are widely implicated in species-specific adaptation and the etiology of disease. This initiative culminated in the formation of an international consortium of like-minded scientists dedicated to accomplishing this task. The RegCreative jamboree was the first opportunity for these groups to meet to be able to accurately assess the current state of knowledge in gene regulation and to begin to develop standards by which to curate regulatory information. In total, 44 researchers attended the workshop from 9 different countries and 23 institutions. Funding was also obtained from ENFIN, the BioSapiens Network, FWO Research Foundation, Genome Canada and Genome British Columbia. The specific outcomes of the RegCreative meeting to date are: * Prior to the RegCreative Jamboree, attendees were asked to participate in an interannotator agreement assessment. Two ORegAnno mirrors were established with identical sets of publications to be annotated in their queue. In total, 33 redundant annotations from 18 publications were collected. (79 annotations for 31 papers and 60 annotations for 21 papers were collected on servers 1 and 2, respectively.) This effort was used as a baseline from which to establish annotator efficiency. * Hands-on annotation activities occurred during the first 2 days of the 3-day workshop. In total, 39 researchers contributed 184 TFBS and 317 Regulatory Regions from 96 papers. Many of these researchers were also trained on the ORegAnno system, significantly increasing its experienced-user community. The contribution of these annotations to individual species was 339 annotations in
Homo sapiens Humans (''Homo sapiens'') are the most abundant and widespread species of primate, characterized by bipedalism and exceptional cognitive skills due to a large and complex brain. This has enabled the development of advanced tools, culture, ...
, 42 annotations in
Mus musculus Mus or MUS may refer to: Abbreviations * MUS, the NATO country code for Mauritius * MUS, the IATA airport code for Minami Torishima Airport * MUS, abbreviation for the Centre for Modern Urban Studies on Campus The Hague, Leiden University, Neth ...
, 72 annotations in
Drosophila melanogaster ''Drosophila melanogaster'' is a species of fly (the taxonomic order Diptera) in the family Drosophilidae. The species is often referred to as the fruit fly or lesser fruit fly, or less commonly the "vinegar fly" or "pomace fly". Starting with Ch ...
, 24 annotations in
Ciona intestinalis ''Ciona intestinalis'' (sometimes known by the common name of vase tunicate) is an ascidian (sea squirt), a tunicate with very soft tunic. Its Latin name literally means "pillar of intestines", referring to the fact that its body is a soft, trans ...
, 14 annotations in
Rattus norvegicus ''Rattus'' is a genus of muroid rodents, all typically called rats. However, the term rat can also be applied to rodent species outside of this genus. Species and description The best-known ''Rattus'' species are the black rat (''R. rattus'') ...
, 6 annotations in Halocynthia roretzi, 2 annotations in Ciona savignyi and 2 annotations in
HIV The human immunodeficiency viruses (HIV) are two species of ''Lentivirus'' (a subgroup of retrovirus) that infect humans. Over time, they cause acquired immunodeficiency syndrome (AIDS), a condition in which progressive failure of the immune ...
. Within these annotations, one new dataset was added to ORegAnno; 274 human enhancers were programmatically annotated by Maximillian Haessler, Institute Alfred Fessard, from Visel et al., Nucleic Acids Research, 2006. In total, 130 scientific studies were examined in depth. The annotated papers were pre-selected from expert-curated publications in the ORegAnno queue that had full-text available through
HighWire Press HighWire is an internet hosting service in the United States specialising in academic and scholarly publications. HighWire-hosted publishers collectively make over 2 million articles available (out of 7.5 million articles) freely accessible. His ...
. * There exists an immediate need for improved data standardization and development of associated ontologies. Specifically, this should include the open access development and integration of transcription factor naming conventions and sequence, cell type, cell line, tissue, and evidence ontologies. The groundwork for addressing and prioritizing these needs was accomplished in several ways during the meeting: ** Transcription factor naming issues were addressed through discussion of integration of transcription factor prediction pipelines, such as DBD or flyTF, which have been supplemented with manual curation versus solely manual curated implementations like TFcat. *
Marc Halfon
University at Buffalo, led a breakout session to improve th
Sequence Ontology
from existing ORegAnno and REDfly database conventions within the framework being developed as part of the
Open Biomedical Ontologies The Open Biological and Biomedical Ontologies (OBO) Foundry is a group of people dedicated to build and maintain ontologies related to the life sciences. The OBO Foundry establishes a set of principles for ontology development for creating a su ...
. A preliminary version of these improvements can be found on th
ORegAnno wiki
** Learning-based ontology development was widely regarded as an essential feature of the annotation process. Such that, annotators are not restricted from annotating based on the limitations of the controlled vocabulary and that these exceptions can be used to further develop the backbone ontologies. ** Ontology development should be decentralized from the ORegAnno annotation framework. Specifically, it is planned that the ORegAnno evidence ontology will be removed and made available to broader community development. ** Renewed focus on integrating species-specific resources with annotation framework. * A specific focus of the workshop was addressing the role of
text mining Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
in facilitating regulatory annotation. Sessions were led by Dr. Lynette Hirschman, MITRE, and Dr. Martin Krallinger, CNIO, to formulate where text-mining can help. A short term object of text-mining based analyses was formulated around both populating the ORegAnno queue and using the expert-curated portion of the ORegAnno queue to validate text-mining-based publication acquisition. The latter objectives are being led by Dr. Stein Aerts, University of Leuven.


References

* * * {{cite journal, vauthors=Lesurf R, Cotto KC, Wang G, Griffith M, Kasaian K, Jones SJ, Montgomery SB, Griffith OL, ((Open Regulatory Annotation Consortium)) , title=ORegAnno 3.0: a community-driven resource for curated regulatory annotation., journal=Nucleic Acids Research, year=2016, volume=44, issue=D1, pages=D126-32, pmid=26578589, doi=10.1093/nar/gkv1203, pmc=4702855


External links


ORegAnnoRegCreative Jamboree 2006
Biological databases Gene expression