Data Stream Clustering

	Data Stream Clustering In computer science, data stream clustering is defined as the clustering of data that arrive continuously such as telephone records, multimedia data, financial transactions etc. Data stream clustering is usually studied as a streaming algorithm and the objective is, given a sequence of points, to construct a good clustering of the stream, using a small amount of memory and time. History Data stream clustering has recently attracted attention for emerging applications that involve large amounts of streaming data. For clustering, k-means is a widely used heuristic but alternate algorithms have also been developed such as k-medoids, CURE and the popular BIRCH. For data streams, one of the first results appeared in 1980 but the model was formalized in 1998. Definition The problem of data stream clustering is defined as: Input: a sequence of ''n'' points in metric space and an integer ''k''. Output: ''k'' centers in the set of the ''n'' points so as to minimize the sum of distan ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Computer Science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical disciplines (including the design and implementation of Computer architecture, hardware and Computer programming, software). Computer science is generally considered an area of research, academic research and distinct from computer programming. Algorithms and data structures are central to computer science. The theory of computation concerns abstract models of computation and general classes of computational problem, problems that can be solved using them. The fields of cryptography and computer security involve studying the means for secure communication and for preventing Vulnerability (computing), security vulnerabilities. Computer graphics (computer science), Computer graphics and computational geometry address the generation of images. Progr ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Local Search (optimization) In computer science, local search is a heuristic method for solving computationally hard optimization problems. Local search can be used on problems that can be formulated as finding a solution maximizing a criterion among a number of candidate solutions. Local search algorithms move from solution to solution in the space of candidate solutions (the ''search space'') by applying local changes, until a solution deemed optimal is found or a time bound is elapsed. Local search algorithms are widely applied to numerous hard computational problems, including problems from computer science (particularly artificial intelligence), mathematics, operations research, engineering, and bioinformatics. Examples of local search algorithms are WalkSAT, the 2-opt algorithm for the Traveling Salesman Problem and the Metropolis–Hastings algorithm. Examples Some problems where local search has been applied are: # The vertex cover problem, in which a solution is a vertex cover of a graph, and th ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	BIRCH A birch is a thin-leaved deciduous hardwood tree of the genus ''Betula'' (), in the family Betulaceae, which also includes alders, hazels, and hornbeams. It is closely related to the beech-oak family Fagaceae. The genus ''Betula'' contains 30 to 60 known taxa of which 11 are on the IUCN 2011 Red List of Threatened Species. They are a typically rather short-lived pioneer species widespread in the Northern Hemisphere, particularly in northern areas of temperate climates and in boreal climates. Description Birch species are generally small to medium-sized trees or shrubs, mostly of northern temperate and boreal climates. The simple leaves are alternate, singly or doubly serrate, feather-veined, petiolate and stipulate. They often appear in pairs, but these pairs are really borne on spur-like, two-leaved, lateral branchlets. The fruit is a small samara, although the wings may be obscure in some species. They differ from the alders (''Alnus'', another genus in the family) in th ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	C2ICM(incremental Clustering) C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C" comes from the same letter as "G". The Semites named it gimel. The sign is possibly adapted from an Egyptian hieroglyph for a staff sling, which may have been the meaning of the name ''gimel''. Another possibility is that it depicted a camel, the Semitic name for which was ''gamal''. Barry B. Powell, a specialist in the history of writing, states "It is hard to imagine how gimel = "camel" can be derived from the picture of a camel (it may show his hump, or his head and neck!)". In the Etruscan language, plosive consonants had no contrastive voicing, so the Greek ' Γ' (Gamma) was adopted into the Etruscan alphabet to represent . Already in the Western Greek alphabet, Gamma first took a '' form in Early Etruscan, then '' in Classical Etru ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Category Utility Category utility is a measure of "category goodness" defined in and . It attempts to maximize both the probability that two objects in the same category have attribute values in common, and the probability that objects from different categories have different attribute values. It was intended to supersede more limited measures of category goodness such as " cue validity" (; ) and "collocation index" . It provides a normative information-theoretic measure of the ''predictive advantage'' gained by the observer who possesses knowledge of the given category structure (i.e., the class labels of instances) over the observer who does ''not'' possess knowledge of the category structure. In this sense the motivation for the category utility measure is similar to the information gain metric used in decision tree learning. In certain presentations, it is also formally equivalent to the mutual information, as discussed below. A review of category utility in its probabilistic incarnation, with ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Decision Tree Learning Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of observations. Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. Decision trees are among the most popular machine learning algorithms given their intelligibility and simplicity. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data (but the resulting classification tree can be an input for decision making). General Dec ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Hierarchical Clustering In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two categories: * Agglomerative: This is a " bottom-up" approach: Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. * Divisive: This is a "top-down" approach: All observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram. The standard algorithm for hierarchical agglomerative clustering (HAC) has a time complexity of \mathcal(n^3) and requires \Omega(n^2) memory, which makes it too slow for even medium data sets. However, for some special cases, optimal efficient agglomerative methods (of c ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Cobweb (clustering) COBWEB is an incremental system for hierarchical conceptual clustering. COBWEB was invented by Professor Douglas H. Fisher, currently at Vanderbilt University. COBWEB incrementally organizes observations into a classification tree Classification chart or classification tree is a synopsis of the classification scheme, designed to illustrate the structure of any particular field. Overview Classification is the process in which ideas and objects are recognized, differentia .... Each node in a classification tree represents a class (concept) and is labeled by a probabilistic concept that summarizes the attribute-value distributions of objects classified under the node. This classification tree can be used to predict missing attributes or the class of a new object. There are four basic operations COBWEB employs in building the classification tree. Which operation is selected depends on the category utility of the classification achieved by applying it. The operations are: * Mergi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Cluster Analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistics, statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small Distance function, distances between cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-object ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Divide-and-conquer Algorithm In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem. The divide-and-conquer technique is the basis of efficient algorithms for many problems, such as sorting (e.g., quicksort, merge sort), multiplying large numbers (e.g., the Karatsuba algorithm), finding the closest pair of points, syntactic analysis (e.g., top-down parsers), and computing the discrete Fourier transform (FFT). Designing efficient divide-and-conquer algorithms can be difficult. As in mathematical induction, it is often necessary to generalize the problem to make it amenable to a recursive solution. The correctness of a divide-and-conquer algorithm is usually proved by mathematical induction, and its computational cost is o ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]