DBSCAN

picture info	DBSCAN Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature. In 2014, the algorithm was awarded the test of time award (an award given to algorithms which have received substantial attention in theory and practice) at the leading data mining conference, ACM SIGKDD. , the follow-up paper "DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN" appears in the list of the 8 most downloaded articles of the prestigious ACM Transactions on Database Sy ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Data Clustering Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances between cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. T ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	OPTICS Algorithm Ordering points to identify the clustering structure (OPTICS) is an algorithm for finding density-based clusters in spatial data. It was presented by Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel and Jörg Sander. Its basic idea is similar to DBSCAN, but it addresses one of DBSCAN's major weaknesses: the problem of detecting meaningful clusters in data of varying density. To do so, the points of the database are (linearly) ordered such that spatially closest points become neighbors in the ordering. Additionally, a special distance is stored for each point that represents the density that must be accepted for a cluster so that both points belong to the same cluster. This is represented as a dendrogram. Basic idea Like DBSCAN, OPTICS requires two parameters: , which describes the maximum distance (radius) to consider, and , describing the number of points required to form a cluster. A point is a ''core point'' if at least points are found within its -neighborhood N_\vareps ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	SUBCLU SUBCLU is an algorithm for clustering high-dimensional data by Karin Kailing, Hans-Peter Kriegel and Peer Kröger.Karin Kailing, Hans-Peter Kriegel and Peer Kröger. Density-Connected Subspace Clustering for High-Dimensional Data'. In: ''Proc. SIAM Int. Conf. on Data Mining (SDM'04)'', pp. 246-257, 2004. It is a subspace clustering algorithm that builds on the density-based clustering algorithm DBSCAN. SUBCLU can find clusters in axis-parallel subspaces, and uses a bottom-up, greedy strategy to remain efficient. Approach SUBCLU uses a monotonicity criteria: if a cluster is found in a subspace S, then each subspace T \subseteq S also contains a cluster. However, a cluster C \subseteq DB in subspace S is not necessarily a cluster in T \subseteq S, since clusters are required to be maximal, and more objects might be contained in the cluster in T that contains C. However, a density-connected set in a subspace S is also a density-connected set in T \subseteq S. This ''downward-clo ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Hans-Peter Kriegel Hans-Peter Kriegel (1 October 1948, Germany) is a German computer scientist and professor at the Ludwig Maximilian University of Munich and leading the Database Systems Group in the Department of Computer Science. He was previously professor at the University of Würzburg and the University of Bremen after habilitation at the Technical University of Dortmund and doctorate from Karlsruhe Institute of Technology. Research His most important contributions are the database index structures R-tree, X-tree and IQ-Tree, the cluster analysis algorithms DBSCAN, OPTICS and SUBCLU and the anomaly detection method Local Outlier Factor (LOF). His research is focused around correlation clustering, high-dimensional data indexing and analysis, spatial data mining and spatial data management as well as multimedia databases. His research group developed a software framework titled ELKI that is designed for the parallel research of index structures, data mining algorithms and their inter ... [...More Info...] [...Related Items...] OR:* [Wikipedia] [Google] [Baidu]
picture info	Spectral Clustering In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset. In application to image segmentation, spectral clustering is known as segmentation-based object categorization. Definitions Given an enumerated set of data points, the similarity matrix may be defined as a symmetric matrix A, where A_\geq 0 represents a measure of the similarity between data points with indices i and j. The general approach to spectral clustering is to use a standard clustering method (there are many such methods, ''k''-means is discussed below) on relevant eigenvectors of a Laplacian matrix of A. There are many different ways to define a Laplacian which have different mathematical interpretations, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Martin Ester Martin Ester (born November 5, 1958) is a Canadian-German Full Professor of Computing Science at Simon Fraser University. His research focuses on researcher data mining and machine learning. Career After earning his MS.c., Ester worked for Swissair before earning a position at the University of Munich as an Assistant Professor in 1993. Three years later, in 1996, Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu proposed a data clustering algorithm called " Density-based spatial clustering of applications with noise" (DBSCAN). Their proposal won the 2014 KDD Test of Time Award for "outstanding papers from past KDD Conferences beyond the last decade that have had an important impact on the data mining research community." A few years later, Ester moved to Vancouver and accepted a position at Simon Fraser University. In 2009, Ester was selected to become an Associate Editor of the IEEE Transactions on Knowledge and Data Engineering. Between 2010 and 2015, Ester served as the ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Clustering High-dimensional Data Clustering high-dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Such high-dimensional spaces of data are often encountered in areas such as medicine, where DNA microarray technology can produce many measurements at once, and the clustering of text documents, where, if a word-frequency vector is used, the number of dimensions equals the size of the vocabulary. Problems Four problems need to be overcome for clustering in high-dimensional data: * Multiple dimensions are hard to think in, impossible to visualize, and, due to the exponential growth of the number of possible values with each dimension, complete enumeration of all subspaces becomes intractable with increasing dimensionality. This problem is known as the curse of dimensionality. * The concept of distance becomes less precise as the number of dimensions grows, since the distance between any two points in a given dataset converges. The discrimination of the near ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Springer-Verlag Springer Science+Business Media, commonly known as Springer, is a German multinational publishing company of books, e-books and peer-reviewed journals in science, humanities, technical and medical (STM) publishing. Originally founded in 1842 in Berlin, it expanded internationally in the 1960s, and through mergers in the 1990s and a sale to venture capitalists it fused with Wolters Kluwer and eventually became part of Springer Nature in 2015. Springer has major offices in Berlin, Heidelberg, Dordrecht, and New York City. History Julius Springer founded Springer-Verlag in Berlin in 1842 and his son Ferdinand Springer grew it from a small firm of 4 employees into Germany's then second largest academic publisher with 65 staff in 1872.Chronology ". Springer Science+Business Media. In 1964, Springer expanded its business internationall ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Great-circle Distance The great-circle distance, orthodromic distance, or spherical distance is the distance along a great circle. It is the shortest distance between two points on the surface of a sphere, measured along the surface of the sphere (as opposed to a straight line through the sphere's interior). The distance between two points in Euclidean space is the length of a straight line between them, but on the sphere there are no straight lines. In spaces with curvature, straight lines are replaced by geodesics. Geodesics on the sphere are circles on the sphere whose centers coincide with the center of the sphere, and are called 'great circles'. The determination of the great-circle distance is part of the more general problem of great-circle navigation, which also computes the azimuths at the end points and intermediate way-points. Through any two points on a sphere that are not antipodal points (directly opposite each other), there is a unique great circle. The two points separate the grea ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Anomaly Detection In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behaviour. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data. Anomaly detection finds application in many domains including cyber security, medicine, machine vision, statistics, neuroscience, law enforcement and financial fraud to name only a few. Anomalies were initially searched for clear rejection or omission from the data to aid statistical analysis, for example to compute the mean or standard deviation. They were also removed to better predictions from models such as linear regression, and more recently their removal aids the performance of machine learning algorithms. However, i ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Nearest Neighbor Graph The nearest neighbor graph (NNG) is a directed graph defined for a set of points in a metric space, such as the Euclidean distance in the plane. The NNG has a vertex for each point, and a directed edge from ''p'' to ''q'' whenever ''q'' is a nearest neighbor of ''p'', a point whose distance from ''p'' is minimum among all the given points other than ''p'' itself. In many uses of these graphs, the directions of the edges are ignored and the NNG is defined instead as an undirected graph. However, the nearest neighbor relation is not a symmetric one, i.e., ''p'' from the definition is not necessarily a nearest neighbor for ''q''. In theoretical discussions of algorithms a kind of general position is often assumed, namely, the nearest (k-nearest) neighbor is unique for each object. In implementations of the algorithms it is necessary to bear in mind that this is not always the case. For situations in which it is necessary to make the nearest neighbor for each object unique, the set ' ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]