BioCompute Object
   HOME

TheInfoList



OR:

The BioCompute Object (BCO) Project is a community-driven initiative to build a framework for standardizing and sharing computations and analyses generated from
High-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
(HTS -- also referred to as
next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation ...
or massively parallel sequencing). The project has since bee
standardized
as IEEE 2791-2020, and the project files are maintained in a
open source repository
Th
July 22nd, 2020 edition
of the Federal Register announced that the
FDA The United States Food and Drug Administration (FDA or US FDA) is a federal agency of the Department of Health and Human Services. The FDA is responsible for protecting and promoting public health through the control and supervision of food ...
now supports the use of BioCompute (officially known as IEEE 2791-2020) in regulatory submissions, and the inclusion of the standard in the Data Standards Catalog for the submission of HTS data i
NDAs, ANDAs, BLAs, and INDs
to
CBER The Center for Biologics Evaluation and Research (CBER) is one of six main centers for the U.S. Food and Drug Administration (FDA), which is a part of the U.S. Department of Health and Human Services. The current Director of CBER is Peter Marks ...
, CDER, and CFSAN.

Originally started as a collaborative contract between the
George Washington University The George Washington University (GW or GWU) is a Private university, private University charter#Federal, federally chartered research university in Washington, D.C. Chartered in 1821 by the United States Congress, GWU is the largest Higher educat ...
and the
Food and Drug Administration The United States Food and Drug Administration (FDA or US FDA) is a List of United States federal agencies, federal agency of the United States Department of Health and Human Services, Department of Health and Human Services. The FDA is respon ...
, the project has grown to include over 20 universities, biotechnology companies, public-private partnerships and pharmaceutical companies including Seven Bridges and
Harvard Medical School Harvard Medical School (HMS) is the graduate medical school of Harvard University and is located in the Longwood Medical Area of Boston, Massachusetts. Founded in 1782, HMS is one of the oldest medical schools in the United States and is consi ...
. The BCO aims to ease the exchange of HTS workflows between various organizations, such as the FDA, pharmaceutical companies, contract research organizations, bioinformatic platform providers, and academic researchers. Due to the sensitive nature of regulatory filings, few direct references to material can be published. However, the project is currently funded to train FDA Reviewers and administrators to read and interpret BCOs, and currently has 4 publications either submitted or nearly submitted.


Background

One of the biggest challenges in bioinformatics is documenting and sharing scientific workflows in such a way that the computation and its results can be peer-reviewed or reliably reproduced. Bioinformatic pipelines typically use multiple pieces of software, each of which typically has multiple versions available, multiple input parameters, multiple outputs, and possibly platform-specific configurations. As with experimental parameters in a laboratory protocol, small changes in computational parameters may have a large impact on the scientific validity of the results. The BioCompute Framework provides an object oriented design from which a BCO that contains details of a pipeline and how it was used can be constructed, digitally signed, and shared. The BioCompute concept was originally developed to satisfy FDA regulatory research and review needs for evaluation, validation, and verification of genomics data. However, the Biocompute Framework follows FAIR Data Principles and can be used broadly to provide communication and
interoperability Interoperability is a characteristic of a product or system to work with other products or systems. While the term was initially defined for information technology or systems engineering services to allow for information exchange, a broader defi ...
between different platforms, industries, scientists and regulators


Utility

As a standardization for genomic data, BioCompute Objects are mostly useful to three groups of users: 1) academic researchers carrying out new genetic experiments, 2) pharma/biotech companies that wish to submit work to the FDA for regulatory review, and 3) clinical settings (hospitals and labs) that offer genetic tests and
personalized medicine Personalized medicine, also referred to as precision medicine, is a medical model that separates people into different groups—with medical decisions, practices, interventions and/or products being tailored to the individual patient based on the ...
. The utility to academic researchers is the ability to reproduce experimental data more accurately and with less uncertainty. The utility to entities wishing to submit work to the FDA is a streamlined approach, again with less uncertainty and with the ability to more accurately reproduce work. For clinical settings, it is critical that HTS data and clinical metadata be transmitted in an accurate way, ideally in a standardized way that is readable by any stakeholder, including regulatory partners.


Format

The BioCompute Object is in json format and, at a minimum, contains all the software versions and parameters necessary to evaluate or verify a computational pipeline. It may also contain input data as files or links, reference genomes, or executable Docker components. A BioCompute Object can be integrated with HL7 FHIR as a Provenance Resource. Multiple joint implementations are also under development that leverage BCO's report-centric format, including CWL (one of which is part of an active government funded public contract with a cofounder of CWL to pilot and generate documentation for a joint BCO-CWL, as well as examples) and RO.


BCO Consortium

The BioCompute Object working group facilitates a means for different stakeholders to provide input on current practices on the BCO. This working group was formed during preparation for th
2017 HTS Computational Standards for Regulatory Sciences Workshop
and was initially made up of the workshop participants. There has been a continual growth of the BCO working group as a direct result of the interaction between a variety of stakeholders from all interested communities in standardization of computational HTS data processing. The Public-Private partnerships formed between universities, private genomic data companies, software platforms, government and regulatory institutions have been an easy point of entry for new individuals or institutions into the BCO project to participate in the discussion of best practices for the objects.


Implementations

The simple R package biocompute can create, validate, and export BioCompute Objects. Th
Genomics Compliance Suite
is a Shiny app that offers similar features to regular expressions found in all modern text editors. There are several internally developed open source software packages and web applications that implement the BioCompute specification, three of which have been deployed in a publicly accessible AWS EC2
cloud In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may ...
. These include an instance of the
High-performance Integrated Virtual Environment ThHigh-performance Integrated Virtual Environment(HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data ...
, th
BioCompute Portal
ref name="bco_editor"> (a form-based web application that can create and edit BioCompute Objects based on the IEEE-2791-2020
standard Standard may refer to: Symbols * Colours, standards and guidons, kinds of military signs * Standard (emblem), a type of a large symbol or emblem used for identification Norms, conventions or requirements * Standard (metrology), an object th ...
, and a BioCompute compliant instance of Galaxy.


References

{{Reflist


External links


Official WebsiteIEEE 2791-2020 open source project
Bioinformatics software Interoperability JSON DNA sequencing