Frequent pattern discovery (or FP discovery, FP mining, or Frequent itemset mining) is part of knowledge discovery in databases,

Massive Online Analysis Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand. Description MOA is an open-source framework ...

, and data mining; it describes the task of finding the most frequent and relevant

pattern A pattern is a regularity in the world, in human-made design, or in abstract ideas. As such, the elements of a pattern repeat in a predictable manner. A geometric pattern is a kind of pattern formed of geometric shapes and typically repeated li ...

s in large datasets. The concept was first introduced for mining transaction databases. Frequent patterns are defined as subsets (itemsets, subsequences, or substructures) that appear in a data set with frequency no less than a user-specified or auto-determined threshold.

Techniques

Techniques for FP mining include: *

market basket analysis Affinity analysis falls under the umbrella term of data mining which uncovers meaningful correlations between different entities according to their co-occurrence in a data set. In almost all systems and processes, the application of affinity anal ...

cross-marketing A marketing co-operation or marketing cooperation is a partnership of at least two companies on the value chain level of marketing with the objective to tap the full potential of a market by bundling specific competences or resources. Other terms f ...

* catalog design * clustering *

classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...

recommendation systems A recommender system, or a recommendation system (sometimes replacing 'system' with a synonym such as platform or engine), is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular u ...

For the most part, FP discovery can be done using

association rule learning Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.Pi ...

with particular algorithms Eclat,

FP-growth Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.Pi ...

and the

Apriori algorithm AprioriRakesh Agrawal and Ramakrishnan SrikanFast algorithms for mining association rules Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pages 487-499, Santiago, Chile, September 1994. is an algorithm for frequent ...

. Other strategies include: * Frequent subtree mining *

Structure mining Structure mining or structured data mining is the process of finding and extracting useful information from semi-structured data sets. Graph mining, sequential pattern mining and molecule mining are special cases of structured data mining. Descrip ...

Sequential pattern mining Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed that the values are discrete, and thus time serie ...

and respective specific techniques. Implementations exist for various

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

systems or modules like MLlib for

Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Califor ...

References

{{reflist Data mining Cluster analysis