Frequent pattern discovery (or FP discovery, FP mining, or Frequent itemset mining) is part of
knowledge discovery in databases,
Massive Online Analysis
Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand.
Description
MOA is an open-source framework ...
, and
data mining; it describes the task of finding the most frequent and relevant
pattern
A pattern is a regularity in the world, in human-made design, or in abstract ideas. As such, the elements of a pattern repeat in a predictable manner. A geometric pattern is a kind of pattern formed of geometric shapes and typically repeated li ...
s in large datasets.
The concept was first introduced for mining transaction databases.
[
Frequent patterns are defined as subsets (itemsets, subsequences, or substructures) that appear in a data set with frequency no less than a user-specified or auto-determined threshold.]
Techniques
Techniques for FP mining include:
* market basket analysis
Affinity analysis falls under the umbrella term of data mining which uncovers meaningful correlations between different entities according to their co-occurrence in a data set. In almost all systems and processes, the application of affinity anal ...
* cross-marketing A marketing co-operation or marketing cooperation is a partnership of at least two companies on the value chain level of marketing with the objective to tap the full potential of a market by bundling specific competences or resources. Other terms f ...
* catalog design
* clustering
* classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood.
Classification is the grouping of related facts into classes.
It may also refer to:
Business, organizat ...
* recommendation systems
A recommender system, or a recommendation system (sometimes replacing 'system' with a synonym such as platform or engine), is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular u ...
For the most part, FP discovery can be done using association rule learning
Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.Pi ...
with particular algorithms Eclat, FP-growth
Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.Pi ...
and the Apriori algorithm
AprioriRakesh Agrawal and Ramakrishnan SrikanFast algorithms for mining association rules Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pages 487-499, Santiago, Chile, September 1994. is an algorithm for frequent ...
.
Other strategies include:
* Frequent subtree mining
*Structure mining
Structure mining or structured data mining is the process of finding and extracting useful information from semi-structured data sets. Graph mining, sequential pattern mining and molecule mining are special cases of structured data mining.
Descrip ...
*Sequential pattern mining
Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed that the values are discrete, and thus time serie ...
and respective specific techniques.
Implementations exist for various machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
systems or modules like MLlib for Apache Spark
Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Califor ...
.
References
{{reflist
Data mining
Cluster analysis