HOME

TheInfoList



OR:

UGENE is computer
software Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications. The history of software is closely tied to the development of digital comput ...
for
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
. It helps biologists to analyze various
biological Biology is the scientific study of life and living organisms. It is a broad natural science that encompasses a wide range of fields and unifying principles that explain the structure, function, growth, origin, evolution, and distribution of ...
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinians, Augustinian ...
data, such as
sequences In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is call ...
, annotations, multiple alignments,
phylogenetic tree A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. In ...
s, NGS assemblies, and others. UGENE integrates dozens of well-known biological tools, algorithms, and original tools in the context of
genomics Genomics is an interdisciplinary field of molecular biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, ...
,
evolutionary biology Evolutionary biology is the subfield of biology that studies the evolutionary processes such as natural selection, common descent, and speciation that produced the diversity of life on Earth. In the 1930s, the discipline of evolutionary biolo ...
,
virology Virology is the Scientific method, scientific study of biological viruses. It is a subfield of microbiology that focuses on their detection, structure, classification and evolution, their methods of infection and exploitation of host (biology), ...
, and other branches of life science. UGENE works on
personal computer A personal computer, commonly referred to as PC or computer, is a computer designed for individual use. It is typically used for tasks such as Word processor, word processing, web browser, internet browsing, email, multimedia playback, and PC ...
operating systems such as
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
,
macOS macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
, or
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
. It is released as
free and open-source software Free and open-source software (FOSS) is software available under a license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term encompassing free ...
, under a
GNU General Public License The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
(GPL) version 2. The data can be stored both locally and on shared/networked storage. The
graphical user interface A graphical user interface, or GUI, is a form of user interface that allows user (computing), users to human–computer interaction, interact with electronic devices through Graphics, graphical icon (computing), icons and visual indicators such ...
(GUI) provides access to pre-built tools so users with no
computer programming Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...
experience can access those tools easily. UGENE also has a
command-line interface A command-line interface (CLI) is a means of interacting with software via command (computing), commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user ...
to execute Workflows. Using UGENE Workflow Designer, it is possible to streamline a multi-step analysis. The workflow consists of blocks such as data readers, blocks executing embedded tools and algorithms, and data writers. Blocks can be created with command line tools or a script. A set of sample workflows is available in the Workflow Designer, to annotate sequences, convert data formats, analyze NGS data, etc. To improve performance, UGENE uses
multi-core processor A multi-core processor (MCP) is a microprocessor on a single integrated circuit (IC) with two or more separate central processing units (CPUs), called ''cores'' to emphasize their multiplicity (for example, ''dual-core'' or ''quad-core''). Ea ...
s (CPUs) and
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...
s (GPUs) to optimize a few algorithms.


Key features

The software supports the following features: * Create, edit, and annotate
nucleic acid Nucleic acids are large biomolecules that are crucial in all cells and viruses. They are composed of nucleotides, which are the monomer components: a pentose, 5-carbon sugar, a phosphate group and a nitrogenous base. The two main classes of nuclei ...
and
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
sequences In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is call ...
* Fast search in a sequence * Multiple sequence alignment: Clustal W and O,
MUSCLE Muscle is a soft tissue, one of the four basic types of animal tissue. There are three types of muscle tissue in vertebrates: skeletal muscle, cardiac muscle, and smooth muscle. Muscle tissue gives skeletal muscles the ability to muscle contra ...
, Kalign, MAFFT, T-Coffee * Create and use shared storage, e.g., lab database * Search through
online database In computing, a database is an organized collection of Data (computing), data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, Application software, applications, and ...
s:
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is lo ...
(NCBI),
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). This structural data is obtained a ...
(PDB), UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, DAS servers * Local and NCBI Genbank BLAST search *
Open reading frame In molecular biology, reading frames are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible reading frames ...
finder *
Restriction enzyme A restriction enzyme, restriction endonuclease, REase, ENase or'' restrictase '' is an enzyme that cleaves DNA into fragments at or near specific recognition sites within molecules known as restriction sites. Restriction enzymes are one class o ...
finder with integrated REBASE restriction enzymes list * Integrated Primer3 package for PCR primer design *
Plasmid A plasmid is a small, extrachromosomal DNA molecule within a cell that is physically separated from chromosomal DNA and can replicate independently. They are most commonly found as small circular, double-stranded DNA molecules in bacteria and ...
construction and annotation *
Cloning Cloning is the process of producing individual organisms with identical genomes, either by natural or artificial means. In nature, some organisms produce clones through asexual reproduction; this reproduction of an organism by itself without ...
in silico In biology and other experimental sciences, an ''in silico'' experiment is one performed on a computer or via computer simulation software. The phrase is pseudo-Latin for 'in silicon' (correct ), referring to silicon in computer chips. It was c ...
by designing of cloning vectors * Genome mapping of short reads with
Bowtie The bow tie or dicky bow is a type of neckwear, distinguishable from a necktie because it does not drape down the shirt placket, but is tied just underneath a winged collar. A modern bow tie is tied using a common shoelace knot, which is also ...
, BWA, and UGENE Genome Aligner * Visualize next generation sequencing data (BAM files) using UGENE Assembly Browser * Variant calling with SAMtools *
RNA-Seq RNA-Seq (named as an abbreviation of RNA sequencing) is a technique that uses next-generation sequencing to reveal the presence and quantity of RNA molecules in a biological sample, providing a snapshot of gene expression in the sample, also k ...
data analysis with Tuxedo pipeline (TopHat, Cufflinks, etc.) *
ChIP-seq ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with Massively parallel signature sequencing, massively parallel DNA sequencing to identify t ...
data analysis with Cistrome pipeline (MACS, CEAS, etc.) * Raw NGS data processing * HMMER 2 and 3 packages integration *
Chromatogram In chemical analysis, chromatography is a laboratory technique for the Separation process, separation of a mixture into its components. The mixture is dissolved in a fluid solvent (gas or liquid) called the ''mobile phase'', which carries it ...
viewer * Search for
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
binding site In biochemistry and molecular biology, a binding site is a region on a macromolecule such as a protein that binds to another molecule with specificity. The binding partner of the macromolecule is often referred to as a ligand. Ligands may includ ...
s ( TFBS) with weight matrix an
SITECON
algorithms * Search for
direct Direct may refer to: Mathematics * Directed set, in order theory * Direct limit of (pre), sheaves * Direct sum of modules, a construction in abstract algebra which combines several vector spaces Computing * Direct access (disambiguation), ...
, inverted, and
tandem Tandem, or in tandem, is an arrangement in which two or more animals, machines, or people are lined up one behind another, all facing in the same direction. ''Tandem'' can also be used more generally to refer to any group of persons or objects w ...
repeats in
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
sequences * Local
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural biology, structural, or evolutionary relationships between ...
with optimized Smith-Waterman algorithm * Build (using integrated PHYLIP neighbor joining, MrBayes, or PhyML Maximum Likelihood) and edit
phylogenetic tree A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. In ...
s * Combine various algorithms into custom
workflow Workflow is a generic term for orchestrated and repeatable patterns of activity, enabled by the systematic organization of resources into processes that transform materials, provide services, or process information. It can be depicted as a seque ...
s with UGENE Workflow Designer * Contigs assembly with CAP3 * 3D structure viewer for files in
Protein Data Bank The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids, which is overseen by the Worldwide Protein Data Bank (wwPDB). This structural data is obtained a ...
(PDB) and Molecular Modeling Database (MMDB) formats, anaglyph view support * Predict
protein secondary structure Protein secondary structure is the local spatial conformation of the polypeptide backbone excluding the side chains. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occu ...
with GOR IV and PSIPRED algorithms * Construct dot plots for
nucleic acid sequence A nucleic acid sequence is a succession of Nucleobase, bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the orde ...
s *
mRNA In molecular biology, messenger ribonucleic acid (mRNA) is a single-stranded molecule of RNA that corresponds to the genetic sequence of a gene, and is read by a ribosome in the process of Protein biosynthesis, synthesizing a protein. mRNA is ...
alignment with Spidey * Search for complex signals with ExpertDiscovery * Search for a pattern of various algorithms' results in a
nucleic acid sequence A nucleic acid sequence is a succession of Nucleobase, bases within the nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. This succession is denoted by a series of a set of five different letters that indicate the orde ...
with UGENE Query Designer * PCR in silico for primer designing and mapping * Spade de novo assembler


Sequence View

The Sequence View is used to visualize, analyze and modify
nucleic acid Nucleic acids are large biomolecules that are crucial in all cells and viruses. They are composed of nucleotides, which are the monomer components: a pentose, 5-carbon sugar, a phosphate group and a nitrogenous base. The two main classes of nuclei ...
or
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
sequences. Depending on the sequence type and the options selected, the following views can be present in the Sequence View window: * 3D structure view * Circular view *
Chromatogram In chemical analysis, chromatography is a laboratory technique for the Separation process, separation of a mixture into its components. The mixture is dissolved in a fluid solvent (gas or liquid) called the ''mobile phase'', which carries it ...
view * Graphs View: GC-content, AG-content, and other * Dot plot view


Alignment Editor

The Alignment Editor allows working with multiple
nucleic acid Nucleic acids are large biomolecules that are crucial in all cells and viruses. They are composed of nucleotides, which are the monomer components: a pentose, 5-carbon sugar, a phosphate group and a nitrogenous base. The two main classes of nuclei ...
or
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
sequences - aligning them, editing the alignment, analyzing it, storing the
consensus sequence In molecular biology and bioinformatics, the consensus sequence (or canonical sequence) is the calculated sequence of most frequent residues, either nucleotide or amino acid, found at each position in a sequence alignment. It represents the result ...
, building a phylogenetic tree, and so on.


Phylogenetic Tree Viewer

The Phylogenetic Tree Viewer helps to visualize and edit phylogenetic trees. It is possible to synchronize a tree with the corresponding multiple alignment used to build the tree.


Assembly Browser

The ''Assembly Browser'' project was started in 2010 as an entry for Illumina iDEA Challenge 2011. The browser allows users to visualize and browse large (up to hundreds of millions of short reads) next generation sequence assemblies. It supports SAM, BAM (the binary version of SAM), and ACE formats. Before browsing assembly data in UGENE, an input file is converted to a UGENE database file automatically. This approach has its pros and cons. The pros are that this allows viewing the whole assembly, navigating in it, and going to well-covered regions rapidly. The cons are that a conversion may take time for a large file, and needs enough disk space to store the database.


Workflow Designer

''UGENE Workflow Designer'' allows creating and running complex computational
workflow Workflow is a generic term for orchestrated and repeatable patterns of activity, enabled by the systematic organization of resources into processes that transform materials, provide services, or process information. It can be depicted as a seque ...
schemas. The distinguishing feature of Workflow Designer, relative to other bioinformatics workflow management systems is that workflows are executed on a local computer. It helps to avoid data transfer issues, whereas other tools’ reliance on remote file storage and internet connectivity does not. The elements that a workflow consists of correspond to the bulk of algorithms integrated into UGENE. Using Workflow Designer also allows creating custom workflow elements. The elements can be based on a command-line tool or a script. Workflows are stored in a special text format. This allows their reuse, and transfer between users. A workflow can be run using the graphical interface or launched from the command line. The graphical interface also allows controlling the workflow execution, storing the parameters, and so on. There is an embedded library of workflow samples to convert, filter, and annotate data, with several pipelines to analyze NGS data developed in collaboration with NIH NIAID. A wizard is available for each workflow sample.


Supported biological data formats

*
Sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
s and
annotation An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented Marginalia, in the margin of book page ...
s: FASTA (.fa),
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a par ...
(.gb), EMBL (.emb), GFF (.gff) * Multiple sequence alignments: Clustal (.aln), MSF (.msf),
Stockholm Stockholm (; ) is the Capital city, capital and List of urban areas in Sweden by population, most populous city of Sweden, as well as the List of urban areas in the Nordic countries, largest urban area in the Nordic countries. Approximately ...
(.sto), Nexus (.nex) * 3D structures: PDB (.pdb), MMDB (.prt) *
Chromatogram In chemical analysis, chromatography is a laboratory technique for the Separation process, separation of a mixture into its components. The mixture is dissolved in a fluid solvent (gas or liquid) called the ''mobile phase'', which carries it ...
s: ABIF (.abi), SCF (.scf) * Short reads: Sequence Alignment/Map(SAM) (.sam), binary version of SAM (.bam), ACE (.ace), FASTQ (.fastq) *
Phylogenetic tree A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. In ...
s: Newick (.nwk), PHYLIP (.phy) * Other formats: Bairoch (
enzyme An enzyme () is a protein that acts as a biological catalyst by accelerating chemical reactions. The molecules upon which enzymes may act are called substrate (chemistry), substrates, and the enzyme converts the substrates into different mol ...
s info), HMM ( HMMER profiles), PWM and PFM ( position matrices), SNP and VCF4 (genome variations)


Release cycle

UGENE is primarily developed by Unipro LLC with headquarters in Akademgorodok of Novosibirsk, Russia. Each
iteration Iteration is the repetition of a process in order to generate a (possibly unbounded) sequence of outcomes. Each repetition of the process is a single iteration, and the outcome of each iteration is then the starting point of the next iteration. ...
lasts about 1–2 months, followed by a new
release Release may refer to: * Art release, the public distribution of an artistic production, such as a film, album, or song * Legal release, a legal instrument * News release, a communication directed at the news media * Release (ISUP), a code to i ...
. Development snapshots may also be downloaded. The features to include in each release are mostly initiated by users.


See also

* Sequence alignment software *
Bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
*
Computational biology Computational biology refers to the use of techniques in computer science, data analysis, mathematical modeling and Computer simulation, computational simulations to understand biological systems and relationships. An intersection of computer sci ...
* List of open source bioinformatics software


References


External links

* * , UniPro
UGENE podcast



UGENE forum

Лучший свободный проект России , Журнал Linux Format - все о Linux по-русски


Phylogenetics software Computational science Free science software Free software programmed in C++ Russian inventions