HOME

TheInfoList



OR:

Data Stream Mining (also known as stream learning) is the process of extracting knowledge structures from continuous, rapid data records. A
data stream In connection-oriented communication, a data stream is the transmission of a sequence of digitally encoded coherent signals to convey information. Typically, the transmitted symbols are grouped into a series of packets. Data streaming has b ...
is an ordered sequence of instances that in many applications of data stream mining can be read only once or a small number of times using limited computing and storage capabilities. In many data stream mining applications, the goal is to predict the class or value of new instances in the data stream given some knowledge about the class membership or values of previous instances in the data stream. Machine learning techniques can be used to learn this prediction task from labeled examples in an automated fashion. Often, concepts from the field of
incremental learning In computer science, incremental learning is a method of machine learning in which input data is continuously used to extend the existing model's knowledge i.e. to further train the model. It represents a dynamic technique of supervised learning a ...
are applied to cope with structural changes, on-line learning and real-time demands. In many applications, especially operating within non-stationary environments, the distribution underlying the instances or the rules underlying their labeling may change over time, i.e. the goal of the prediction, the class to be predicted or the target value to be predicted, may change over time. This problem is referred to as
concept drift In predictive analytics and machine learning, concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become ...
. Detecting
concept drift In predictive analytics and machine learning, concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become ...
is a central issue to data stream mining. Other challenges that arise when applying machine learning to streaming data include: partially and delayed labeled data, recovery from concept drifts, and temporal dependencies. Examples of data streams include computer network traffic, phone conversations, ATM transactions, web searches, and sensor data. Data stream mining can be considered a subfield of data mining,
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, and
knowledge discovery Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must r ...
.


Software for data stream mining

*
MOA (Massive Online Analysis) Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand. Description MOA is an open-source framework ...
: free open-source software specific for mining data streams with concept drift developed in Java. It has several machine learning algorithms ( classification, regression, clustering, outlier detection and recommender systems). Also, it contains a prequential evaluation method, the EDDM concept drift methods, a reader of ARFF real datasets, and artificial stream generators as SEA concepts, STAGGER, rotating hyperplane, random tree, and random radius based functions. MOA supports bi-directional interaction with
Weka (machine learning) Waikato Environment for Knowledge Analysis (Weka), developed at the University of Waikato, New Zealand, is free software licensed under the GNU General Public License, and the companion software to the book "Data Mining: Practical Machine Learnin ...
. * scikit-multiflow: A machine learning framework for multi-output/multi-label and stream data implemented in Python. scikit-multiflow contains stream generators, stream learning methods for single-target and multi-target, concept drift detectors, evaluation and visualisation methods. (This software is discontinued)
StreamDM
StreamDM is an open source framework for big data stream mining that uses the Spark Streaming extension of the core Spark API. One advantage of StreamDM in comparison to existing frameworks is that it directly benefits from the Spark Streaming API, which handles much of the complex problems of the underlying data sources, such as out of order data and recovery from failures. *
RapidMiner RapidMiner is a data science platform designed for enterprises that analyses the collective impact of organizations’ employees, expertise and data. Rapid Miner's data science platform is intended to support many analytics users across a broad A ...
: commercial software for knowledge discovery, data mining, and machine learning also featuring data stream mining, learning time-varying concepts, and tracking drifting concept (if used in combination with its data stream mining plugin (formerly: Concept Drift plugin)) *RiverML: River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on streaming data.
GAENARI
C++ incremental decision tree. It continuously executes inserts and updates of chunked data sets. Rebuild support for concept drift issues.


Events


International Workshop on Ubiquitous Data Mining
held in conjunction with th
International Joint Conference on Artificial Intelligence (IJCAI)
in Beijing, China, August 3–5, 2013.
International Workshop on Knowledge Discovery from Ubiquitous Data Streams
held in conjunction with th
18th European Conference on Machine Learning (ECML) and the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD)
in Warsaw, Poland, in September 2007.
ACM Symposium on Applied Computing Data Streams Track
held in conjunction with th
2007 ACM Symposium on Applied Computing (SAC-2007)
in
Seoul Seoul (; ; ), officially known as the Seoul Special City, is the capital and largest metropolis of South Korea.Before 1972, Seoul was the ''de jure'' capital of the Democratic People's Republic of Korea (North Korea) as stated iArticle 103 of ...
,
Korea Korea ( ko, 한국, or , ) is a peninsular region in East Asia. Since 1945, it has been divided at or near the 38th parallel, with North Korea (Democratic People's Republic of Korea) comprising its northern half and South Korea (Republic o ...
, in March 2007.
IEEE International Workshop on Mining Evolving and Streaming Data (IWMESD 2006)
to be held in conjunction with th
2006 IEEE International Conference on Data Mining (ICDM-2006)
in
Hong Kong Hong Kong ( (US) or (UK); , ), officially the Hong Kong Special Administrative Region of the People's Republic of China (abbr. Hong Kong SAR or HKSAR), is a city and special administrative region of China on the eastern Pearl River Delta i ...
in December 2006.
Fourth International Workshop on Knowledge Discovery from Data Streams (IWKDDS)
to be held in conjunction with th
17th European Conference on Machine Learning (ECML) and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) (ECML/PKDD-2006)
in
Berlin Berlin ( , ) is the capital and List of cities in Germany by population, largest city of Germany by both area and population. Its 3.7 million inhabitants make it the European Union's List of cities in the European Union by population within ci ...
,
Germany Germany,, officially the Federal Republic of Germany, is a country in Central Europe. It is the second most populous country in Europe after Russia, and the most populous member state of the European Union. Germany is situated betwe ...
, in September 2006.


See also

*
Concept drift In predictive analytics and machine learning, concept drift means that the statistical properties of the target variable, which the model is trying to predict, change over time in unforeseen ways. This causes problems because the predictions become ...
* Data Mining *
Sequence mining Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed that the values are discrete, and thus time serie ...
*
Streaming Algorithm In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes (typically just one). In most models, these algorithms have access t ...
*
Stream processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views data streams, or sequences of events in time, as the central input and ou ...
*
Wireless sensor network Wireless sensor networks (WSNs) refer to networks of spatially dispersed and dedicated sensors that monitor and record the physical conditions of the environment and forward the collected data to a central location. WSNs can measure environmental c ...
*
Lambda architecture Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods. This approach to architecture attempts to balance latency, throughput, and fault ...


Books

* * * * * *


References

{{DEFAULTSORT:Data Stream Mining Data mining