HOME

TheInfoList



OR:

Bioconductor is a
free Free may refer to: Concept * Freedom, having the ability to do something, without having to obey anyone/anything * Freethought, a position that beliefs should be formed only on the basis of logic, reason, and empiricism * Emancipate, to procur ...
,
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
and open development software project for the analysis and comprehension of
genomic Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
data generated by
wet lab A wet lab, or experimental lab, is a type of laboratory where it is necessary to handle various types of chemicals and potential "wet" hazards, so the room has to be carefully designed, constructed, and controlled to avoid spillage and contamination ...
experiments in
molecular biology Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and physi ...
. Bioconductor is based primarily on the
statistical Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industria ...
R programming language R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinform ...
, but does contain contributions in other programming languages. It has two releases each year that follow the semiannual releases of R. At any one time there is a
release version Release may refer to: * Art release, the public distribution of an artistic production, such as a film, album, or song * Legal release, a legal instrument * News release, a communication directed at the news media * Release (ISUP), a code to ident ...
, which corresponds to the released version of R, and a development version, which corresponds to the development version of R. Most users will find the release version appropriate for their needs. In addition there are many
genome annotation DNA annotation or genome annotation is the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. An annotation (irrespective of the context) is a note added by way of explanati ...
packages available that are mainly, but not solely, oriented towards different types of
microarray A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silicon t ...
s. While computational methods continue to be developed to interpret biological data, the Bioconductor project is an open source software repository that hosts a wide range of statistical tools developed in the R programming environment. Utilizing a rich array of statistical and graphical features in R, many Bioconductor packages have been developed to meet various data analysis needs. The use of these packages provides a basic understanding of the R programming / command language. As a result, R and Bioconductor packages, which have a strong computing background, are used by most biologists who will benefit significantly from their ability to analyze datasets. All these results provide biologists with easy access to the analysis of genomic data without requiring programmin
expertise
The project was started in the Fall of 2001 and is overseen by the Bioconductor core team, based primarily at the
Fred Hutchinson Cancer Research Center The Fred Hutchinson Cancer Center, formerly known as the Fred Hutchinson Cancer Research Center and also known as Fred Hutch or The Hutch, is a cancer research institute established in 1975 in Seattle, Washington. History The center grew out o ...
, with other members coming from international institutions.


Packages

Most Bioconductor components are distributed as
R packages R packages are extensions to the R statistical programming language. R packages contain code, data, and documentation in a standardised collection format that can be installed by users of R, typically via a centralised software repository such as ...
, which are add-on modules for R. Initially most of the Bioconductor software packages focused on the analysis of single channel
Affymetrix Affymetrix is now Applied Biosystems, a brand of DNA microarray products sold by Thermo Fisher Scientific that originated with an American biotechnology research and development and manufacturing company of the same name. The Santa Clara, Califor ...
and two or more channel
cDNA In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a speci ...
/ Oligo microarrays. As the project has matured, the functional scope of the software packages broadened to include the analysis of all types of genomic data, such as SAGE,
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
, or SNP data.


Goals

The broad goals of the projects are to: * Provide widespread access to a broad range of powerful
statistical Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industria ...
and
graphical Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture, ...
methods for the analysis of genomic data. * Facilitate the inclusion of biological metadata in the analysis of genomic data, e.g. literature data from
PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain the ...
, annotation data from LocusLink/Entrez. * Provide a common
software platform A computing platform or digital platform is an environment in which a piece of software is executed. It may be the hardware or the operating system (OS), even a web browser and associated application programming interfaces, or other underlying s ...
that enables the rapid
development Development or developing may refer to: Arts *Development hell, when a project is stuck in development *Filmmaking, development phase, including finance and budgeting *Development (music), the process thematic material is reshaped *Photographi ...
and
deployment Deployment may refer to: Engineering and software Concepts * Blue-green deployment, a method of installing changes to a web, app, or database server by swapping alternating production and staging servers * Continuous deployment, a software en ...
of plug-able,
scalable Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
, and
interoperable Interoperability is a characteristic of a product or system to work with other products or systems. While the term was initially defined for information technology or systems engineering services to allow for information exchange, a broader defi ...
software. * Further scientific understanding by producing high-quality documentation and reproducible research. * Train researchers on computational and statistical methods for the analysis of genomic data.


Main features

* Documentation and reproducible research. Each Bioconductor package contains at least one vignette, which is a document that provides a textual, task-oriented description of the package's functionality. These vignettes come in several forms. Many are simple "
How-to The Linux Documentation Project (LDP) is a dormant an all-volunteer project that maintains a large collection of GNU and Linux-related documentation and publishes the collection online. It began as a way for hackers to share their documentation wi ...
"s that are designed to demonstrate how a particular task can be accomplished with that package's software. Others provide a more thorough overview of the package or might even discuss general issues related to the package. In the future, the Bioconductor project is looking towards providing vignettes that are not specifically tied to a package, but rather are demonstrating more complex concepts. As with all aspects of the Bioconductor project, users are encouraged to participate in this effort. * Statistical and graphical methods. The Bioconductor project aims to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data. Analysis packages are available for: pre-processing
Affymetrix Affymetrix is now Applied Biosystems, a brand of DNA microarray products sold by Thermo Fisher Scientific that originated with an American biotechnology research and development and manufacturing company of the same name. The Santa Clara, Califor ...
and Illumina,
cDNA In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to express a speci ...
array data; identifying differentially expressed genes; graph theoretical analyses; plotting genomic data. In addition, the R package system itself provides implementations for a broad range of state-of-the-art
statistical Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industria ...
and
graphical Graphics () are visual images or designs on some surface, such as a wall, canvas, screen, paper, or stone, to inform, illustrate, or entertain. In contemporary usage, it includes a pictorial representation of data, as in design and manufacture, ...
techniques, including
linear Linearity is the property of a mathematical relationship (''function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear r ...
and
non-linear In mathematics and science, a nonlinear system is a system in which the change of the output is not proportional to the change of the input. Nonlinear problems are of interest to engineers, biologists, physicists, mathematicians, and many other ...
modeling,
cluster analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...
,
prediction A prediction (Latin ''præ-'', "before," and ''dicere'', "to say"), or forecast, is a statement about a future event or data. They are often, but not always, based upon experience or knowledge. There is no universal agreement about the exact ...
, resampling,
survival analysis Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysi ...
, and
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
analysis. *
Genome annotation DNA annotation or genome annotation is the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. An annotation (irrespective of the context) is a note added by way of explanati ...
. The Bioconductor project provides software for associating microarray and other genomic data in real time to biological metadata from web databases such as
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
, LocusLink and
PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain the ...
(annotate package). Functions are also provided for incorporating the results of statistical analysis in HTML reports with links to annotation WWW resources. Software tools are available for assembling and processing genomic annotation data, from databases such as
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
, the
Gene Ontology Consortium The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and ge ...
, LocusLink,
UniGene UniGene was a NCBI database of the transcriptome and thus, despite the name, not primarily a database for genes. Each entry is a set of transcripts that appear to stem from the same transcription locus (i.e. gene or expressed pseudogene). Informat ...
, the UCSC Human Genome Project and others with the AnnotationDbi package. Data packages are distributed to provide mappings between different probe identifiers (e.g. Affy IDs, LocusLink,
PubMed PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain the ...
). Customized annotation libraries can also be assembled. *
Open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
. The Bioconductor project has a commitment to full open source discipline, with distribution via a
SourceForge.net SourceForge is a web service that offers software consumers a centralized online location to control and manage open-source software projects and research business software. It provides source code repository hosting, bug tracking, mirroring ...
-like platform. All contributions are expected to exist under an
open source license An open-source license is a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified and/or shared under defined terms and conditions. This allows end users and commercial compan ...
such as Artistic 2.0,
GPL2 The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general us ...
, or
BSD The Berkeley Software Distribution or Berkeley Standard Distribution (BSD) is a discontinued operating system based on Research Unix, developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berk ...
. There are many different reasons why open-source software is beneficial to the analysis of microarray data and to
computational biology Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has fo ...
in general. The reasons include: ** To provide full access to
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s and their implementation ** To facilitate software improvements through bug fixing and plug-ins ** To encourage good scientific computing and statistical practice by providing appropriate tools and instruction ** To provide a workbench of tools that allow researchers to explore and expand the methods used to analyze biological data ** To ensure that the international
scientific community The scientific community is a diverse network of interacting scientists. It includes many " sub-communities" working on particular scientific fields, and within particular institutions; interdisciplinary and cross-institutional activities are als ...
is the owner of the
software tools A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can ...
needed to carry out research ** To lead and encourage commercial support and development of those tools that are successful ** To promote
reproducible research Reproducibility, also known as replicability and repeatability, is a major principle underpinning the scientific method. For the findings of a study to be reproducible means that results obtained by an experiment or an observational study or in a ...
by providing open and accessible tools with which to carry out that research (reproducible research is distinct from independent verification) * Open development.
Users Ancient Egyptian roles * User (ancient Egyptian official), an ancient Egyptian nomarch (governor) of the Eighth Dynasty * Useramen, an ancient Egyptian vizier also called "User" Other uses * User (computing), a person (or software) using an ...
are encouraged to become developers, either by contributing Bioconductor compliant packages or documentation. Additionally Bioconductor provides a mechanism for linking together different groups with common goals to foster
collaboration Collaboration (from Latin ''com-'' "with" + ''laborare'' "to labor", "to work") is the process of two or more people, entities or organizations working together to complete a task or achieve a goal. Collaboration is similar to cooperation. Most ...
on software, possibly at the level of shared development.


Milestones

Each release of Bioconductor is developed to work best with a chosen version of R. In addition to bugfixes and updates, a new release typically adds packages. The table below maps a Bioconductor release to a R version and shows the number of available Bioconductor software packages for that release.


Resources

* * * *


See also

*
Computational biology Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has fo ...
*
Bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
*
List of open source bioinformatics software This is a list of computer software which is made for bioinformatics and released under open-source software licenses with articles in Wikipedia. See also * List of sequence alignment software * List of open-source healthcare software * List o ...
*
List of sequence alignment software This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins. Database searc ...
*
R (programming language) R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinform ...
*
DNA microarray A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to ...
*
Affymetrix Affymetrix is now Applied Biosystems, a brand of DNA microarray products sold by Thermo Fisher Scientific that originated with an American biotechnology research and development and manufacturing company of the same name. The Santa Clara, Califor ...
, a microarray technology platform


References



External links

*
The R Project
GNU GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
R is a programming language for statistical computing.
Bioconductor Releases
* The community of the
Debian GNU/Linux Debian (), also known as Debian GNU/Linux, is a Linux distribution composed of free and open-source software, developed by the community-supported Debian Project, which was established by Ian Murdock on August 16, 1993. The first version of Deb ...
distribution strives towards a
automated building of BioConductor packages
{{Webarchive, url=https://web.archive.org/web/20070811135011/http://wiki.debian.org/AliothPkgBioc , date=2007-08-11 for their distribution
BioKnoppix
an

are projects extending
Knoppix KNOPPIX ( ) is an operating system based on Debian designed to be run directly from a CD / DVD (Live CD) or a USB flash drive (Live USB), one of the first live operating system distributions (just after Yggdrasil Linux). Knoppix was developed b ...
that have contributed bootable
Debian GNU/Linux Debian (), also known as Debian GNU/Linux, is a Linux distribution composed of free and open-source software, developed by the community-supported Debian Project, which was established by Ian Murdock on August 16, 1993. The first version of Deb ...
CDs providing BioConductor installations. Free bioinformatics software Free R (programming language) software Science software for macOS Science software for Windows Science software for Linux