Orange (software)
   HOME

TheInfoList



OR:

Orange is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
data visualization Data and information visualization (data viz or info viz) is an interdisciplinary field that deals with the graphic representation of data and information. It is a particularly efficient way of communicating when the data or information is num ...
,
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
and data mining toolkit. It features a
visual programming In computing, a visual programming language (visual programming system, VPL, or, VPS) is any programming language that lets users create programs by manipulating program elements ''graphically'' rather than by specifying them ''textually''. A VP ...
front-end for explorative rapid qualitative
data analysis Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, enco ...
and interactive data
visualization Visualization or visualisation may refer to: *Visualization (graphics), the physical or imagining creation of images, diagrams, or animations to communicate a message * Data visualization, the graphic representation of data * Information visualiz ...
.


Description

Orange is a component-based
visual programming In computing, a visual programming language (visual programming system, VPL, or, VPS) is any programming language that lets users create programs by manipulating program elements ''graphically'' rather than by specifying them ''textually''. A VP ...
software package for
data visualization Data and information visualization (data viz or info viz) is an interdisciplinary field that deals with the graphic representation of data and information. It is a particularly efficient way of communicating when the data or information is num ...
,
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, data mining, and
data analysis Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, enco ...
. Orange components are called widgets and they range from simple data visualization, subset selection, and preprocessing, to empirical evaluation of learning
algorithms In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing c ...
and
predictive modeling Predictive modelling uses statistics to predict outcomes. Most often the event one wants to predict is in the future, but predictive modelling can be applied to any type of unknown event, regardless of when it occurred. For example, predictive mod ...
. Visual programming is implemented through an interface in which workflows are created by linking predefined or user-designed widgets, while advanced users can use Orange as a Python library for data manipulation and widget alteration.


Software

Orange is an open-source software package released under
GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...
. Versions up to 3.0 include core components in
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
with wrappers in Python are available on
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
. From version 3.0 onwards, Orange uses common Python open-source libraries for scientific computing, such as numpy,
scipy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal ...
and
scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector m ...
, while its graphical user interface operates within the
cross-platform In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software r ...
Qt framework. The default installation includes a number of machine learning, preprocessing and data visualization algorithms in 6 widget sets (data, visualize, classify, regression, evaluate and unsupervised). Additional functionalities are available as add-ons (bioinformatics, data fusion and text-mining). Orange is supported on
macOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
,
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
and
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
and can also be installed from the Python Package Index repository (''pip install Orange3'').


Features

Orange consists of a canvas
interface Interface or interfacing may refer to: Academic journals * ''Interface'' (journal), by the Electrochemical Society * ''Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Linguistics'' * '' Inte ...
onto which the user places widgets and creates a data analysis workflow. Widgets offer basic functionalities such as reading the data, showing a data table, selecting features, training predictors, comparing learning algorithms, visualizing data elements, etc. The user can interactively explore visualizations or feed the selected subset into other widgets. *Canvas: graphical front-end for data analysis *Widgets: **Data: widgets for data input, data filtering, sampling, imputation, feature manipulation and
feature selection In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construc ...
**Visualize: widgets for common visualization (box plot, histograms, scatter plot) and multivariate visualization (mosaic display, sieve diagram). **Classify: a set of
supervised machine learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
algorithms for classification **Regression: a set of supervised machine learning algorithms for regression **Evaluate: cross-validation, sampling-based procedures, reliability estimation and scoring of prediction methods **Unsupervised:
unsupervised learning Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and t ...
algorithms for clustering (k-means, hierarchical clustering) and data projection techniques (multidimensional scaling, principal component analysis, correspondence analysis). **Add-ons: ***Associate: widgets for mining frequent itemsets and
association rule learning Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.Pi ...
***Bioinformatics: widgets for gene set analysis, enrichment, and access to pathway libraries ***Data fusion: widgets for fusing different data sets, collective matrix factorization, and exploration of latent factors ***Educational: widgets for teaching machine learning concepts, such as
k-means clustering ''k''-means clustering is a method of vector quantization, originally from signal processing, that aims to partition ''n'' observations into ''k'' clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or ...
,
polynomial regression In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable ''x'' and the dependent variable ''y'' is modelled as an ''n''th degree polynomial in ''x''. Polynomial regression fi ...
,
stochastic gradient descent Stochastic gradient descent (often abbreviated SGD) is an iterative method for optimizing an objective function with suitable smoothness properties (e.g. differentiable or subdifferentiable). It can be regarded as a stochastic approximation of ...
, ... ***Geo: widgets for working with
geospatial data Geographic data and information is defined in the ISO/TC 211 series of standards as data and information having an implicit or explicit association with a location relative to Earth (a geographic location or geographic position). It is also call ...
***Image analytics: widgets for working with images and ImageNet embeddings ***Network: widgets for graph and
network analysis Network analysis can refer to: * Network theory, the analysis of relations through mathematical graphs ** Social network analysis, network theory applied to social relations * Network analysis (electrical circuits) See also *Network planning and ...
***Text mining: widgets for
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
and
text mining Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
***Time series: widgets for
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
analysis and modeling ***Spectroscopy: widgets for analyzing and visualization of (hyper)spectral datasets


Objectives

The program provides a platform for experiment selection, recommendation systems, and predictive modeling and is used in
biomedicine Biomedicine (also referred to as Western medicine, mainstream medicine or conventional medicine)
,
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
, genomic research, and teaching. In science, it is used as a platform for testing new machine learning algorithms and for implementing new techniques in
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinian friar wor ...
and bioinformatics. In education, it was used for teaching machine learning and data mining methods to students of biology, biomedicine, and informatics.


Extensions

Various projects build on Orange either by extending the core components with add-ons or using only the Orange Canvas to exploit the implemented visual programming features and GUI. * OASYS — ORange SYnchrotron Suite L. Rebuffi, M. Sanchez del Rio, Proc. SPIE 10388, 103880S (2017). https://doi.org/10.1117/12.2274263 * scOrange — single cell biostatistics * Quasar — data analysis in natural sciences


History

* In 1996, the
University of Ljubljana The University of Ljubljana ( sl, Univerza v Ljubljani, , la, Universitas Labacensis), often referred to as UL, is the oldest and largest university in Slovenia. It has approximately 39,000 enrolled students. History Beginnings Although certain ...
and
Jožef Stefan Institute The Jožef Stefan Institute (IJS, JSI) ( sl, Institut "Jožef Stefan") is the largest research institute in Slovenia. The main research areas are physics, chemistry, molecular biology, biotechnology, information technologies, physics, reactor ph ...
started development of ML*, a machine learning framework in
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
. * In 1997,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
bindings were developed for ML*, which together with emerging Python modules formed a joint framework called Orange. * During the following years most major algorithms for data mining and machine learning have been developed either in C++ (Orange's core) or in Python modules. * In 2002, first prototypes to create a flexible graphical user interface were designed, using Pmw Python megawidgets. * In 2003, graphical user interface was redesigned and re-developed for Qt framework using
PyQt PyQt is a Python binding of the cross-platform GUI toolkit Qt, implemented as a Python plug-in. PyQt is free software developed by the British firm Riverbank Computing. It is available under similar terms to Qt versions older than 4.5; this mea ...
Python bindings. The visual programming framework was defined, and development of widgets (graphical components of data analysis pipeline) has begun. * In 2005, extensions for data analysis in
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
was created. * In 2008,
Mac OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
DMG and Fink-based installation packages were developed. * In 2009, over 100 widgets were created and maintained. * From 2009, Orange is in 2.0 beta and web site offers installation packages based on daily compiling cycle. * In 2012, new object hierarchy was imposed, replacing the old module-based structure. * In 2013, a major GUI redesign. * In 2015, Orange 3.0 is released. * In 2016, Orange is in version 3.3. The development uses monthly stable release cycle.


References


Further reading

* Demšar, Janez and Blaž Zupan,
Orange: Data Mining Fruitful and Fun - A Historical Perspective
', Informatica 37, pgs. 55–60, (2013). * *


External links

* {{Official website
OASYS

scOrange

Quasar
Applications of artificial intelligence Cross-platform free software Data mining and machine learning software Data visualization software Free plotting software Free science software Free software programmed in Python Numerical software Science software that uses Qt Software using the GPL license Time series software