UGENE is computer software for

bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...

. It works on personal computer operating systems such as Windows, macOS, or Linux. It is released as

free and open-source software Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...

, under a GNU General Public License (GPL) version 2. UGENE helps biologists to analyze various

biological Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditary in ...

genetics data, such as sequences, annotations, multiple alignments,

phylogenetic tree A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...

s, NGS assemblies, and others. The data can be stored both locally (on a personal computer) and on a shared storage (e.g., a lab database). UGENE integrates dozens of well-known biological tools, algorithms, and original tools in the context of

genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...

, evolutionary biology, virology, and other branches of life science. UGENE provides a graphical user interface (GUI) for the pre-built tools so biologists with no computer programming skills can access those tools more easily. Using UGENE Workflow Designer, it is possible to streamline a multi-step analysis. The workflow consists of blocks such as data readers, blocks executing embedded tools and algorithms, and data writers. Blocks can be created with command line tools or a script. A set of sample workflows is available in the Workflow Designer, to annotate sequences, convert data formats, analyze NGS data, etc. Beside the graphical interface, UGENE also has a

command-line interface A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...

. Workflows may also be executed thereby. To improve performance, UGENE uses multi-core processors (CPUs) and graphics processing units (GPUs) to optimize a few algorithms.

Key features

The software supports the following features: * Create, edit, and annotate

nucleic acid Nucleic acids are biopolymers, macromolecules, essential to all known forms of life. They are composed of nucleotides, which are the monomers made of three components: a 5-carbon sugar, a phosphate group and a nitrogenous base. The two main cl ...

and protein sequences * Fast search in a sequence * Multiple sequence alignment: Clustal W and O,

MUSCLE Skeletal muscles (commonly referred to as muscles) are organs of the vertebrate muscular system and typically are attached by tendons to bones of a skeleton. The muscle cells of skeletal muscles are much longer than in the other types of muscl ...

, Kalign, MAFFT,

T-Coffee T-Coffee (Tree-based Consistency Objective Function for Alignment Evaluation) is a multiple sequence alignment software using a progressive approach. It generates a library of pairwise alignments to guide the multiple sequence alignment. It can al ...

* Create and use shared storage, e.g., lab database * Search through

online database An online database is a database accessible from a local network or the Internet, as opposed to one that is stored locally on an individual computer or its attached storage (such as a CD). Online databases are hosted on websites, made available as s ...

s: National Center for Biotechnology Information (NCBI), Protein Data Bank (PDB), UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, DAS servers * Local and NCBI Genbank BLAST search * Open reading frame finder * Restriction enzyme finder with integrated REBASE restriction enzymes list * Integrated Primer3 package for PCR primer design *

Plasmid A plasmid is a small, extrachromosomal DNA molecule within a cell that is physically separated from chromosomal DNA and can replicate independently. They are most commonly found as small circular, double-stranded DNA molecules in bacteria; how ...

construction and annotation * Cloning

in silico In biology and other experimental sciences, an ''in silico'' experiment is one performed on computer or via computer simulation. The phrase is pseudo-Latin for 'in silicon' (correct la, in silicio), referring to silicon in computer chips. It ...

by designing of cloning vectors * Genome mapping of short reads with Bowtie, BWA, and UGENE Genome Aligner * Visualize next generation sequencing data (BAM files) using UGENE Assembly Browser * Variant calling with SAMtools *

RNA-Seq RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing c ...

data analysis with Tuxedo pipeline (TopHat, Cufflinks, etc.) * ChIP-seq data analysis with Cistrome pipeline (MACS, CEAS, etc.) * Raw NGS data processing *

HMMER HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy. Its general usage is to identify homologous protein or nucleotide sequences, and to perform sequence alignments. It detects homology by comparing ...

2 and 3 packages integration *

Chromatogram In chemical analysis, chromatography is a laboratory technique for the separation of a mixture into its components. The mixture is dissolved in a fluid solvent (gas or liquid) called the ''mobile phase'', which carries it through a system (a ...

viewer * Search for transcription factor binding sites (

TFBS In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The fu ...

) with weight matrix an
SITECON
algorithms * Search for

direct Direct may refer to: Mathematics * Directed set, in order theory * Direct limit of (pre), sheaves * Direct sum of modules, a construction in abstract algebra which combines several vector spaces Computing * Direct access (disambiguation), a ...

, inverted, and tandem

repeats A rerun or repeat is a rebroadcast of an episode of a radio or television program. There are two types of reruns – those that occur during a hiatus, and those that occur when a program is syndicated. Variations In the United Kingdom, the word ...

in DNA sequences * Local

sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Alig ...

with optimized Smith-Waterman algorithm * Build (using integrated PHYLIP neighbor joining, MrBayes, or PhyML Maximum Likelihood) and edit

s * Combine various algorithms into custom workflows with UGENE Workflow Designer * Contigs assembly with CAP3 * 3D structure viewer for files in Protein Data Bank (PDB) and

Molecular Modeling Database The Molecular Modeling Database (MMDB) is a database of experimentally determined three-dimensional biomolecular structures and hosted by the National Center for Biotechnology Information. See also * Protein structure Protein structure is th ...

(MMDB) formats, anaglyph view support * Predict

protein secondary structure Protein secondary structure is the three dimensional form of ''local segments'' of proteins. The two most common secondary structural elements are alpha helices and beta sheets, though beta turns and omega loops occur as well. Secondary structure ...

with GOR IV and PSIPRED algorithms * Construct dot plots for

nucleic acid sequence A nucleic acid sequence is a succession of Nucleobase, bases signified by a series of a set of five different letters that indicate the order of nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. By convention, sequence ...

s * mRNA alignment with Spidey * Search for complex signals with ExpertDiscovery * Search for a pattern of various algorithms' results in a

with UGENE Query Designer * PCR in silico for primer designing and mapping * Spade de novo assembler

Sequence View

The Sequence View is used to visualize, analyze and modify

or protein sequences. Depending on the sequence type and the options selected, the following views can be present in the Sequence View window: * 3D structure view * Circular view *

view * Graphs View: GC-content, AG-content, and other * Dot plot view

Alignment Editor

The Alignment Editor allows working with multiple

or protein sequences - aligning them, editing the alignment, analyzing it, storing the

consensus sequence In molecular biology and bioinformatics, the consensus sequence (or canonical sequence) is the calculated order of most frequent residues, either nucleotide or amino acid, found at each position in a sequence alignment. It serves as a simplified r ...

, building a phylogenetic tree, and so on.

Phylogenetic Tree Viewer

The Phylogenetic Tree Viewer helps to visualize and edit phylogenetic trees. It is possible to synchronize a tree with the corresponding multiple alignment used to build the tree.

Assembly Browser

The ''Assembly Browser'' project was started in 2010 as an entry for Illumina iDEA Challenge 2011. The browser allows users to visualize and browse large (up to hundreds of millions of short reads) next generation sequence assemblies. It supports SAM, BAM (the binary version of SAM), and ACE formats. Before browsing assembly data in UGENE, an input file is converted to a UGENE database file automatically. This approach has its pros and cons. The pros are that this allows viewing the whole assembly, navigating in it, and going to well-covered regions rapidly. The cons are that a conversion may take time for a large file, and needs enough disk space to store the database.

Workflow Designer

''UGENE Workflow Designer'' allows creating and running complex computational workflow schemas. The distinguishing feature of Workflow Designer, relative to other

bioinformatics workflow management systems A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics. Ther ...

is that workflows are executed on a local computer. It helps to avoid data transfer issues, whereas other tools’ reliance on remote file storage and internet connectivity does not. The elements that a workflow consists of correspond to the bulk of algorithms integrated into UGENE. Using Workflow Designer also allows creating custom workflow elements. The elements can be based on a command-line tool or a script. Workflows are stored in a special text format. This allows their reuse, and transfer between users. A workflow can be run using the graphical interface or launched from the command line. The graphical interface also allows controlling the workflow execution, storing the parameters, and so on. There is an embedded library of workflow samples to convert, filter, and annotate data, with several pipelines to analyze NGS data developed in collaboration with NIH NIAID. A wizard is available for each workflow sample.

Supported biological data formats

* Sequences and annotations: FASTA (.fa), GenBank (.gb),

EMBL The European Molecular Biology Laboratory (EMBL) is an intergovernmental organization dedicated to molecular biology research and is supported by 27 member states, two prospect states, and one associate member state. EMBL was created in 1974 and ...

(.emb), GFF (.gff) * Multiple sequence alignments: Clustal (.aln), MSF (.msf),

Stockholm Stockholm () is the Capital city, capital and List of urban areas in Sweden by population, largest city of Sweden as well as the List of urban areas in the Nordic countries, largest urban area in Scandinavia. Approximately 980,000 people liv ...

(.sto), Nexus (.nex) * 3D structures: PDB (.pdb),

MMDB The Molecular Modeling Database (MMDB) is a database of experimentally determined three-dimensional biomolecular structures and hosted by the National Center for Biotechnology Information. See also * Protein structure Protein structure is th ...

(.prt) *

s: ABIF (.abi), SCF (.scf) * Short reads: Sequence Alignment/Map(SAM) (.sam), binary version of SAM (.bam), ACE (.ace), FASTQ (.fastq) *

Phylogenetic tree A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...

s: Newick (.nwk), PHYLIP (.phy) * Other formats: Bairoch ( enzymes info), HMM (

profiles), PWM and PFM ( position matrices), SNP and VCF4 (genome variations)

Release cycle

UGENE is primarily developed by Unipro LLC with headquarters in Akademgorodok of Novosibirsk, Russia. Each iteration lasts about 1–2 months, followed by a new

release Release may refer to: * Art release, the public distribution of an artistic production, such as a film, album, or song * Legal release, a legal instrument * News release, a communication directed at the news media * Release (ISUP), a code to ident ...

. Development snapshots may also be downloaded. The features to include in each release are mostly initiated by users.

References

External links

* * , UniPro
UGENE podcast

UGENE forum

Лучший свободный проект России , Журнал Linux Format - все о Linux по-русски

Phylogenetics software Computational science Free science software Free software programmed in C++ Russian inventions