HOME

TheInfoList



OR:

Galaxy is a scientific workflow,
data integration Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial (such as when two similar companies ...
, and data and analysis persistence and publishing platform that aims to make
computational biology Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has fo ...
accessible to research scientists that do not have
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as ana ...
or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general
bioinformatics workflow management system A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics. Ther ...
.


Functionality

Galaxy is a
scientific workflow system A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application. Applications Distribute ...
. These systems provide a means to build multi-step computational analyses akin to a recipe. They typically provide a
graphical user interface The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inste ...
for specifying what data to operate on, what steps to take, and what order to do them in. Galaxy is also a
data integration Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial (such as when two similar companies ...
platform for biological data. It supports data uploads from the user's computer, by URL, and directly from many online resources (such as the
UCSC Genome Browser The UCSC Genome Browser is an online and downloadable genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate spec ...
,
BioMart BioMart is a community-driven project to provide a single point of access to distributed research data. The BioMart project contributes open source software and data services to the international scientific community. Although the BioMart software ...
and InterMine). Galaxy supports a range of widely used biological data formats, and translation between those formats. Galaxy provides a web interface to many text manipulation utilities, enabling researchers to do their own custom reformatting and manipulation without having to do any programming. Galaxy includes ''interval manipulation'' utilities for doing set theoretic operations (e.g.
intersection In mathematics, the intersection of two or more objects is another object consisting of everything that is contained in all of the objects simultaneously. For example, in Euclidean geometry, when two lines in a plane are not parallel, their i ...
,
union Union commonly refers to: * Trade union, an organization of workers * Union (set theory), in mathematics, a fundamental operation on sets Union may also refer to: Arts and entertainment Music * Union (band), an American rock group ** ''Un ...
, ...) on intervals. Many biological file formats include genomic interval data (a frame of reference, e.g.,
chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins are ...
or
contig A contig (from ''contiguous'') is a set of overlapping DNA segments that together represent a consensus region of DNA.Gregory, S. ''Contig Assembly''. Encyclopedia of Life Sciences, 2005. In bottom-up sequencing projects, a contig refers to ov ...
name, and start and stop positions), allowing these data to be integrated. Galaxy was originally written for biological data analysis, particularly
genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
. The set of available tools has been greatly expanded over the years and Galaxy is now also used for
gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. The ...
,
genome assembly In bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology might not be able to 'read' whole genomes in on ...
,
proteomics Proteomics is the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions such as the formation of structural fibers of muscle tissue, enzymatic digestion of food, or synthesis and replication of DNA. In ...
,
epigenomics Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell. Epigen ...
,
transcriptomics Transcriptomics technologies are the techniques used to study an organism's transcriptome, the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. He ...
and host of other disciplines in the life sciences. The platform itself is actually domain agnostic and can be applied, in theory, to any scientific domain, such as
cheminformatics Cheminformatics (also known as chemoinformatics) refers to use of physical chemistry theory with computer and information science techniques—so called "''in silico''" techniques—in application to a range of descriptive and prescriptive problem ...
. For example, Galaxy servers exist for image analysis,
computational chemistry Computational chemistry is a branch of chemistry that uses computer simulation to assist in solving chemical problems. It uses methods of theoretical chemistry, incorporated into computer programs, to calculate the structures and properties of m ...
and drug design, cosmology, climate modeling, social science, and linguistics. Finally, Galaxy also supports data and analysis persistence and publishing. See
Reproducibility Reproducibility, also known as replicability and repeatability, is a major principle underpinning the scientific method. For the findings of a study to be reproducible means that results obtained by an experiment or an observational study or in a ...
and Transparency below.


Project Goals

Galaxy is "an open, web-based platform for performing accessible, reproducible, and transparent genomic science."


Accessibility

Computational biology Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has fo ...
is a specialized domain that often requires knowledge of
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as ana ...
. Galaxy aims to give biomedical researchers access to computational biology without also requiring them to understand computer programming. Galaxy does this by stressing a simple user interface over the ability to build complex workflows. This design choice makes it relatively easy to build typical analyses, but more difficult to build complex workflows that include, for example, looping constructs. (See
Apache Taverna Apache Taverna was an open source software tool for designing and executing workflows, initially created by the myGrid project under the name ''Taverna Workbench'', then a project under the Apache incubator. Taverna allowed users to integrate many ...
for an example of a data-driven workflow system that supports looping.)


Reproducibility

Reproducibility Reproducibility, also known as replicability and repeatability, is a major principle underpinning the scientific method. For the findings of a study to be reproducible means that results obtained by an experiment or an observational study or in a ...
is a key goal of science: When scientific results are published the publications should include enough information that others can repeat the experiment and get the same results. There have been many recent efforts to extend this goal from the bench (the "
wet lab A wet lab, or experimental lab, is a type of laboratory where it is necessary to handle various types of chemicals and potential "wet" hazards, so the room has to be carefully designed, constructed, and controlled to avoid spillage and contamination ...
") to computational experiments (the "
dry lab A dry lab is a laboratory where the nature of the experiments does not involve significant risk. This is in contrast to a wet lab where it is necessary to handle various types of chemicals and biological hazards. An example of a dry lab is one whe ...
") as well. This has proved to be a more difficult task than initially expected. Galaxy supports reproducibility by capturing sufficient information about every step in a computational analysis, so that the analysis can be repeated, exactly, at any point in the future. This includes keeping track of all input, intermediate, and final datasets, as well as the parameters provided to, and the order of each step of the analysis.


Transparency

Galaxy supports transparency in scientific research by enabling researchers to share any of their '' Galaxy Objects'' either publicly, or with specific individuals. Shared items can be examined in detail, rerun at will and copied and modified to test hypotheses.


Galaxy Objects: Histories, Workflows, Datasets and Pages

Galaxy ''objects'' are anything that can be saved, persisted, and shared in Galaxy: ; Histories: : ''Histories'' are computational analyses (recipes) run with specified input datasets, computational steps and parameters. Histories include all intermediate and output datasets as well. ; Workflows: : ''Workflows'' are computational analyses that specify all the steps (and parameters) in the analysis, but none of the data. Workflows are used to run the same analysis against multiple sets of input data. ; Datasets: : ''Datasets'' includes any input, intermediate, or output dataset, used or produced in an analysis. ; Pages: : Histories, workflows and datasets can include user-provided annotation. Galaxy ''Pages'' enables the creation of a virtual paper that describes the how and why of the overall experiment. Tight integration of Pages with Histories, Workflows, and Datasets supports this goal.


Availability

Galaxy is available: # As a free public web server, supported by the Galaxy Project. This server includes many bioinformatics tools that are widely useful in many areas of genomics research. Users can create logins, and save ''histories'', ''workflows'', and ''datasets'' on the server. These saved items can also be shared with others. # As
open-source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
that can be downloaded, installed and customized to address specific needs. Galaxy can be installed locally or using a
computing cloud Cloud computing is the on-demand availability of computer system resources, especially data storage ( cloud storage) and computing power, without direct active management by the user. Large clouds often have functions distributed over mul ...
. # Public web servers hosted by other organizations. Several organizations with their own Galaxy installation have also opted to make those servers available to others.


Implementation

Galaxy is
open-source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
implemented using the
Python programming language Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically-typed and garbage-collected. It supports multiple programming p ...
. It is developed by the Galaxy team at
Penn State #Redirect Pennsylvania State University The Pennsylvania State University (Penn State or PSU) is a Public university, public Commonwealth System of Higher Education, state-related Land-grant university, land-grant research university with campu ...
,
Johns Hopkins University Johns Hopkins University (Johns Hopkins, Hopkins, or JHU) is a private university, private research university in Baltimore, Maryland. Founded in 1876, Johns Hopkins is the oldest research university in the United States and in the western hem ...
,
Oregon Health & Science University Oregon Health & Science University (OHSU) is a public research university focusing primarily on health sciences with a main campus, including two hospitals, in Portland, Oregon. The institution was founded in 1887 as the University of Oregon Medi ...
, and the Galaxy Community. Galaxy is extensible, as new command line tools can be integrated and shared within th
Galaxy ToolShed
An example of extending Galaxy i
Galaxy-P
from the University of Minnesota Supercomputing Institute, which is customized as a data analysis platform for
mass spectrometry Mass spectrometry (MS) is an analytical technique that is used to measure the mass-to-charge ratio of ions. The results are presented as a ''mass spectrum'', a plot of intensity as a function of the mass-to-charge ratio. Mass spectrometry is use ...
-based proteomics.


Community

Galaxy is an open source project and the community includes users, organizations that install their own instance, Galaxy developers, and bioinformatics tool developers. The Galaxy project has mailing lists, a community hub, and annual meetings.


See also

*
Bioinformatics workflow management systems A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics. Ther ...


References

{{Reflist, 30em


External links


Galaxy Community Hub

Download and install locally or on the cloud

Free public Galaxy server, hosted by Galaxy Project

List of other public Galaxy servers

Project statistics
Bioinformatics software Free software projects Workflow applications 2005 software