HOME

TheInfoList



OR:

scikit-mutliflow (also known as skmultiflow) is a
free and open source software Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
library for multi-output/multi-label and
stream data In connection-oriented communication, a data stream is the transmission of a sequence of digitally encoded coherent signals to convey information. Typically, the transmitted symbols are grouped into a series of packets. Data streaming has beco ...
written in Python.


Overview

scikit-multiflow allows to easily design and run experiments and to extend existing stream learning algorithms. It features a collection of
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...
, regression, concept drift detection and
anomaly detection In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority o ...
algorithms. It also includes a set of data stream generators and evaluators. scikit-multiflow is designed to interoperate with Python's numerical and scientific libraries NumPy and
SciPy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal ...
and is compatible with
Jupyter Notebook Project Jupyter () is a project with goals to develop open-source software, open standards, and services for interactive computing across multiple programming languages. It was spun off from IPython in 2014 by Fernando Pérez and Brian Granger. ...
s.


Implementation

The scikit-multiflow library is implemented under the
open research Open research is research that is openly accessible and modifiable by others. The central theme of open research is to make clear accounts of research methods freely available via the internet, along with any data or results extracted or derived ...
principles and is currently distributed under the
BSD 3-clause license BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lice ...
. scikit-multiflow is mainly written in Python, and some core elements are written in
Cython Cython () is a programming language that aims to be a superset of the Python programming language, designed to give C-like performance with code that is written mostly in Python with optional additional C-inspired syntax. Cython is a compil ...
for performance. scikit-multiflow integrates with other Python libraries such as
Matplotlib Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wx ...
for plotting,
scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector ...
for incremental learning methods compatible with the stream learning setting,
Pandas Pediatric autoimmune neuropsychiatric disorders associated with streptococcal infections (PANDAS) is a controversial hypothetical diagnosis for a subset of children with rapid onset of obsessive-compulsive disorder (OCD) or tic disorders. Sy ...
for data manipulation, Numpy and
SciPy SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal ...
.


Components

The scikit-multiflow is composed of the following sub-packages: * anomaly_detection: anomaly detection methods. * data: data stream methods including methods for batch-to-stream conversion and generators. * drift_detection: methods for concept drift detection. * evaluation: evaluation methods for stream learning. * lazy: methods in which generalisation of the training data is delayed until a query is received, i.e., neighbours-based methods such as kNN. * meta: meta learning (also known as ensemble) methods. * neural_networks: methods based on
neural network A neural network is a network or neural circuit, circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up ...
s. * prototype: prototype-based learning methods. * rules: rule-based learning methods. * transform: perform data transformations. * trees: tree-based methods, e.g. Hoeffding trees which are a type of
decision tree A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains co ...
for data streams.


History

scikit-multiflow started as a collaboration between researchers at
Télécom Paris Télécom Paris (also known as ENST or Télécom or École nationale supérieure des télécommunications, also Télécom ParisTech until 2019) is a French public institution for higher education (''grande école'') and engineering research. Loca ...
(Institut Polytechnique de Paris) and
École Polytechnique École may refer to: * an elementary school in the French educational stages normally followed by secondary education establishments (collège and lycée) * École (river), a tributary of the Seine The Seine ( , ) is a river in northern Franc ...
. Development is currently carried by the
University of Waikato The University of Waikato ( mi, Te Whare Wānanga o Waikato), is a Public university, public research university in Hamilton, New Zealand, Hamilton, New Zealand established in 1964. An additional campus is located in Tauranga. The university perfo ...
, Télécom Paris, École Polytechnique and the open research community.


See also

*
Massive Online Analysis Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand. Description MOA is an open-source framework ...
(MOA) * MEKA


References


External links

* * {{SciPy ecosystem Data mining and machine learning software Free statistical software Free software programmed in Python Python (programming language) scientific libraries