In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the ''k''-nearest neighbors algorithm (''k''-NN) is a
non-parametric
Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distri ...
supervised learning
Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
method first developed by
Evelyn Fix
Evelyn Fix (January 27, 1904 – December 30, 1965) was a statistician. She was born in Duluth, Minnesota and earned her A.B. in mathematics at the University of Minnesota in 1924. One year later she earned at M.S. in education and became a high ...
and
Joseph Hodges in 1951, and later expanded by
Thomas Cover.
It is used for
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood.
Classification is the grouping of related facts into classes.
It may also refer to:
Business, organizat ...
and
regression. In both cases, the input consists of the ''k'' closest training examples in a
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
. The output depends on whether ''k''-NN is used for classification or regression:
:* In ''k-NN classification'', the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its ''k'' nearest neighbors (''k'' is a positive
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
, typically small). If ''k'' = 1, then the object is simply assigned to the class of that single nearest neighbor.
:* In ''k-NN regression'', the output is the property value for the object. This value is the average of the values of ''k'' nearest neighbors. If ''k'' = 1, then the output is simply assigned to the value of that single nearest neighbor.
''k''-NN is a type of
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood.
Classification is the grouping of related facts into classes.
It may also refer to:
Business, organizat ...
where the function is only approximated locally and all computation is deferred until function evaluation. Since this algorithm relies on distance for classification, if the features represent different physical units or come in vastly different scales then
normalizing the training data can improve its accuracy dramatically.
Both for classification and regression, a useful technique can be to assign weights to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/''d'', where ''d'' is the distance to the neighbor.
The neighbors are taken from a set of objects for which the class (for ''k''-NN classification) or the object property value (for ''k''-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.
A peculiarity of the ''k''-NN algorithm is that it is sensitive to the local structure of the data.
Statistical setting
Suppose we have pairs
taking values in
, where is the class label of , so that
for
(and probability distributions
). Given some norm
on
and a point
, let
be a reordering of the training data such that
.
Algorithm
The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the
feature vector
In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...
s and class labels of the training samples.
In the classification phase, ''k'' is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the ''k'' training samples nearest to that query point.
A commonly used distance metric for
continuous variables is
Euclidean distance
In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points.
It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...
. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or
Hamming distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chan ...
). In the context of gene expression microarray data, for example, ''k''-NN has been employed with correlation coefficients, such as Pearson and Spearman, as a metric. Often, the classification accuracy of ''k''-NN can be improved significantly if the distance metric is learned with specialized algorithms such as
Large Margin Nearest Neighbor or
Neighbourhood components analysis
Neighbourhood components analysis is a supervised learning method for Statistical classification, classifying multivariate statistics, multivariate data into distinct classes according to a given metric (mathematics), distance metric over the data. ...
.
A drawback of the basic "majority voting" classification occurs when the class distribution is skewed. That is, examples of a more frequent class tend to dominate the prediction of the new example, because they tend to be common among the ''k'' nearest neighbors due to their large number.
[
] One way to overcome this problem is to weight the classification, taking into account the distance from the test point to each of its ''k'' nearest neighbors. The class (or value, in regression problems) of each of the ''k'' nearest points is multiplied by a weight proportional to the inverse of the distance from that point to the test point. Another way to overcome skew is by abstraction in data representation. For example, in a
self-organizing map
A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher dimensional data set while preserving the t ...
(SOM), each node is a representative (a center) of a cluster of similar points, regardless of their density in the original training data. ''K''-NN can then be applied to the SOM.
Parameter selection
The best choice of ''k'' depends upon the data; generally, larger values of ''k'' reduces effect of the noise on the classification, but make boundaries between classes less distinct. A good ''k'' can be selected by various
heuristic
A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, ...
techniques (see
hyperparameter optimization In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the va ...
). The special case where the class is predicted to be the class of the closest training sample (i.e. when ''k'' = 1) is called the nearest neighbor algorithm.
The accuracy of the ''k''-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance. Much research effort has been put into
selecting or
scaling
Scaling may refer to:
Science and technology
Mathematics and physics
* Scaling (geometry), a linear transformation that enlarges or diminishes objects
* Scale invariance, a feature of objects or laws that do not change if scales of length, energ ...
features to improve classification. A particularly popular approach is the use of
evolutionary algorithm
In computational intelligence (CI), an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduct ...
s to optimize feature scaling. Another popular approach is to scale features by the
mutual information
In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such ...
of the training data with the training classes.
In binary (two class) classification problems, it is helpful to choose ''k'' to be an odd number as this avoids tied votes. One popular way of choosing the empirically optimal ''k'' in this setting is via bootstrap method.
[
]
The -nearest neighbor classifier
The most intuitive nearest neighbour type classifier is the one nearest neighbour classifier that assigns a point to the class of its closest neighbour in the feature space, that is
.
As the size of training data set approaches infinity, the one nearest neighbour classifier guarantees an error rate of no worse than twice the
Bayes error rate In statistical classification, Bayes error rate is the lowest possible error rate for any classifier of a random outcome (into, for example, one of two categories) and is analogous to the irreducible error.K. Tumer, K. (1996) "Estimating the Bayes e ...
(the minimum achievable error rate given the distribution of the data).
The weighted nearest neighbour classifier
The -nearest neighbour classifier can be viewed as assigning the nearest neighbours a weight
and all others weight. This can be generalised to weighted nearest neighbour classifiers. That is, where the th nearest neighbour is assigned a weight
, with
. An analogous result on the strong consistency of weighted nearest neighbour classifiers also holds.
[
]
Let
denote the weighted nearest classifier with weights
. Subject to regularity conditions, which in asymptotic theory are conditional variables which require assumptions to differentiate among parameters with some criteria. On the class distributions the excess risk has the following asymptotic expansion
:
for constants
and
where
and
.
The optimal weighting scheme
, that balances the two terms in the display above, is given as follows: set
,
:
for
and
:
for
.
With optimal weights the dominant term in the asymptotic expansion of the excess risk is
. Similar results are true when using a
bagged nearest neighbour classifier.
Properties
''k''-NN is a special case of a
variable-bandwidth, kernel density "balloon" estimator with a uniform
kernel
Kernel may refer to:
Computing
* Kernel (operating system), the central component of most operating systems
* Kernel (image processing), a matrix used for image convolution
* Compute kernel, in GPGPU programming
* Kernel method, in machine learnin ...
.
The naive version of the algorithm is easy to implement by computing the distances from the test example to all stored examples, but it is computationally intensive for large training sets. Using an approximate
nearest neighbor search
Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function ...
algorithm makes ''k-''NN computationally tractable even for large data sets. Many nearest neighbor search algorithms have been proposed over the years; these generally seek to reduce the number of distance evaluations actually performed.
''k-''NN has some strong
consistency
In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...
results. As the amount of data approaches infinity, the two-class ''k-''NN algorithm is guaranteed to yield an error rate no worse than twice the
Bayes error rate In statistical classification, Bayes error rate is the lowest possible error rate for any classifier of a random outcome (into, for example, one of two categories) and is analogous to the irreducible error.K. Tumer, K. (1996) "Estimating the Bayes e ...
(the minimum achievable error rate given the distribution of the data).
Various improvements to the ''k''-NN speed are possible by using proximity graphs.
For multi-class ''k-''NN classification,
Cover
Cover or covers may refer to:
Packaging
* Another name for a lid
* Cover (philately), generic term for envelope or package
* Album cover, the front of the packaging
* Book cover or magazine cover
** Book design
** Back cover copy, part of co ...
and
Hart
Hart often refers to:
* Hart (deer)
Hart may also refer to:
Organizations
* Hart Racing Engines, a former Formula One engine manufacturer
* Hart Skis, US ski manufacturer
* Hart Stores, a Canadian chain of department stores
* Hart's Reptile Wo ...
(1967) prove an upper bound error rate of
:
where
is the Bayes error rate (which is the minimal error rate possible),
is the ''k-''NN error rate, and is the number of classes in the problem. For
and as the Bayesian error rate
approaches zero, this limit reduces to "not more than twice the Bayesian error rate".
Error rates
There are many results on the error rate of the nearest neighbour classifiers.
The -nearest neighbour classifier is strongly (that is for any joint distribution on
)
consistent
In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent i ...
provided
diverges and
converges to zero as
.
Let
denote the nearest neighbour classifier based on a training set of size . Under certain regularity conditions, the
excess risk yields the following asymptotic expansion
[
]
::
for some constants
and
.
The choice
offers a trade off between the two terms in the above display, for which the
-nearest neighbour error converges to the Bayes error at the optimal (
minimax
Minimax (sometimes MinMax, MM or saddle point) is a decision rule used in artificial intelligence, decision theory, game theory, statistics, and philosophy for ''mini''mizing the possible loss for a worst case (''max''imum loss) scenario. When de ...
) rate
.
Metric learning
The K-nearest neighbor classification performance can often be significantly improved through (
supervised) metric learning. Popular algorithms are
neighbourhood components analysis
Neighbourhood components analysis is a supervised learning method for Statistical classification, classifying multivariate statistics, multivariate data into distinct classes according to a given metric (mathematics), distance metric over the data. ...
and
large margin nearest neighbor. Supervised metric learning algorithms use the label information to learn a new
metric
Metric or metrical may refer to:
* Metric system, an internationally adopted decimal system of measurement
* An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement
Mathematics
In mathem ...
or
pseudo-metric.
Feature extraction
When the input data to an algorithm is too large to be processed and it is suspected to be redundant (e.g. the same measurement in both feet and meters) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called
feature extraction
In machine learning, pattern recognition, and image processing, feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative and non-redundant, facilitating the subsequent learning a ...
. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. Feature extraction is performed on raw data prior to applying ''k''-NN algorithm on the transformed data in
feature space
In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating and independent features is a crucial element of effective algorithms in pattern r ...
.
An example of a typical
computer vision
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
computation pipeline for
face recognition
A facial recognition system is a technology capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and wo ...
using ''k''-NN including feature extraction and dimension reduction pre-processing steps (usually implemented with
OpenCV
OpenCV (''Open Source Computer Vision Library'') is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage then Itseez (which was later acquired by In ...
):
#
Haar face detection
#
Mean-shift
Mean shift is a non-parametric feature-space mathematical analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm. Application domains include cluster analysis in computer vision and image processing. ...
tracking analysis
#
PCA or
Fisher LDA projection into feature space, followed by ''k''-NN classification
Dimension reduction
For high-dimensional data (e.g., with number of dimensions more than 10)
dimension reduction
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...
is usually performed prior to applying the ''k''-NN algorithm in order to avoid the effects of the
curse of dimensionality
The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The ...
.
The
curse of dimensionality
The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. The ...
in the ''k''-NN context basically means that
Euclidean distance
In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points.
It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...
is unhelpful in high dimensions because all vectors are almost equidistant to the search query vector (imagine multiple points lying more or less on a circle with the query point at the center; the distance from the query to all data points in the search space is almost the same).
Feature extraction
In machine learning, pattern recognition, and image processing, feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative and non-redundant, facilitating the subsequent learning a ...
and dimension reduction can be combined in one step using
principal component analysis
Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and ...
(PCA),
linear discriminant analysis
Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features ...
(LDA), or
canonical correlation analysis
In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors ''X'' = (''X''1, ..., ''X'n'') and ''Y' ...
(CCA) techniques as a pre-processing step, followed by clustering by ''k''-NN on
feature vectors in reduced-dimension space. This process is also called low-dimensional
embedding
In mathematics, an embedding (or imbedding) is one instance of some mathematical structure contained within another instance, such as a group that is a subgroup.
When some object X is said to be embedded in another object Y, the embedding is gi ...
.
For very-high-dimensional datasets (e.g. when performing a similarity search on live video streams, DNA data or high-dimensional
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
) running a fast approximate ''k''-NN search using
locality sensitive hashing In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Since ...
, "random projections", "sketches" or other high-dimensional similarity search techniques from the
VLDB toolbox might be the only feasible option.
Decision boundary
Nearest neighbor rules in effect implicitly compute the
decision boundary
__NOTOC__
In a statistical-classification problem with two classes, a decision boundary or decision surface is a hypersurface that partitions the underlying vector space into two sets, one for each class. The classifier will classify all the point ...
. It is also possible to compute the decision boundary explicitly, and to do so efficiently, so that the computational complexity is a function of the boundary complexity.
Data reduction
Data reduction Data reduction is the transformation of numerical or alphabetical digital information derived empirically or experimentally into a corrected, ordered, and simplified form. The purpose of data reduction can be two-fold: reduce the number of data rec ...
is one of the most important problems for work with huge data sets. Usually, only some of the data points are needed for accurate classification. Those data are called the ''prototypes'' and can be found as follows:
# Select the ''class-outliers'', that is, training data that are classified incorrectly by ''k''-NN (for a given ''k'')
# Separate the rest of the data into two sets: (i) the prototypes that are used for the classification decisions and (ii) the ''absorbed points'' that can be correctly classified by ''k''-NN using prototypes. The absorbed points can then be removed from the training set.
Selection of class-outliers
A training example surrounded by examples of other classes is called a class outlier. Causes of class outliers include:
* random error
* insufficient training examples of this class (an isolated example appears instead of a cluster)
* missing important features (the classes are separated in other dimensions which we don't know)
* too many training examples of other classes (unbalanced classes) that create a "hostile" background for the given small class
Class outliers with ''k''-NN produce noise. They can be detected and separated for future analysis. Given two natural numbers, ''k>r>0'', a training example is called a ''(k,r)''NN class-outlier if its ''k'' nearest neighbors include more than ''r'' examples of other classes.
Condensed Nearest Neighbor for data reduction
Condensed nearest neighbor (CNN, the ''
Hart
Hart often refers to:
* Hart (deer)
Hart may also refer to:
Organizations
* Hart Racing Engines, a former Formula One engine manufacturer
* Hart Skis, US ski manufacturer
* Hart Stores, a Canadian chain of department stores
* Hart's Reptile Wo ...
algorithm'') is an algorithm designed to reduce the data set for ''k''-NN classification. It selects the set of prototypes ''U'' from the training data, such that 1NN with ''U'' can classify the examples almost as accurately as 1NN does with the whole data set.
Given a training set ''X'', CNN works iteratively:
# Scan all elements of ''X'', looking for an element ''x'' whose nearest prototype from ''U'' has a different label than ''x''.
# Remove ''x'' from ''X'' and add it to ''U''
# Repeat the scan until no more prototypes are added to ''U''.
Use ''U'' instead of ''X'' for classification. The examples that are not prototypes are called "absorbed" points.
It is efficient to scan the training examples in order of decreasing border ratio.
[Mirkes, Evgeny M.]
''KNN and Potential Energy: applet''
University of Leicester, 2011 The border ratio of a training example ''x'' is defined as
:
where is the distance to the closest example ''y'' having a different color than ''x'', and is the distance from ''y'' to its closest example ''x' '' with the same label as ''x''.
The border ratio is in the interval
,1because never exceeds . This ordering gives preference to the borders of the classes for inclusion in the set of prototypes ''U''. A point of a different label than ''x'' is called external to ''x''. The calculation of the border ratio is illustrated by the figure on the right. The data points are labeled by colors: the initial point is ''x'' and its label is red. External points are blue and green. The closest to ''x'' external point is ''y''. The closest to ''y'' red point is ''x' ''. The border ratio is the attribute of the initial point ''x''.
Below is an illustration of CNN in a series of figures. There are three classes (red, green and blue). Fig. 1: initially there are 60 points in each class. Fig. 2 shows the 1NN classification map: each pixel is classified by 1NN using all the data. Fig. 3 shows the 5NN classification map. White areas correspond to the unclassified regions, where 5NN voting is tied (for example, if there are two green, two red and one blue points among 5 nearest neighbors). Fig. 4 shows the reduced data set. The crosses are the class-outliers selected by the (3,2)NN rule (all the three nearest neighbors of these instances belong to other classes); the squares are the prototypes, and the empty circles are the absorbed points. The left bottom corner shows the numbers of the class-outliers, prototypes and absorbed points for all three classes. The number of prototypes varies from 15% to 20% for different classes in this example. Fig. 5 shows that the 1NN classification map with the prototypes is very similar to that with the initial data set. The figures were produced using the Mirkes applet.
File:Data3classes.png, Fig. 1. The dataset.
File:Map1NN.png, Fig. 2. The 1NN classification map.
File:Map5NN.png, Fig. 3. The 5NN classification map.
File:ReducedDataSet.png, Fig. 4. The CNN reduced dataset.
File:Map1NNReducedDataSet.png, Fig. 5. The 1NN classification map based on the CNN extracted prototypes.
''k''-NN regression
In ''k''-NN regression, the ''k''-NN algorithm is used for estimating continuous variables. One such algorithm uses a weighted average of the ''k'' nearest neighbors, weighted by the inverse of their distance. This algorithm works as follows:
# Compute the Euclidean or
Mahalanobis distance from the query example to the labeled examples.
# Order the labeled examples by increasing distance.
# Find a heuristically optimal number ''k'' of nearest neighbors, based on
RMSE
The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed. The RMSD represents ...
. This is done using cross validation.
# Calculate an inverse distance weighted average with the ''k''-nearest multivariate neighbors.
''k''-NN outlier
The distance to the ''k''th nearest neighbor can also be seen as a local density estimate and thus is also a popular outlier score in
anomaly detection
In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority ...
. The larger the distance to the ''k''-NN, the lower the local density, the more likely the query point is an outlier.
Although quite simple, this outlier model, along with another classic data mining method,
local outlier factor
In anomaly detection, the local outlier factor (LOF) is an algorithm proposed by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander in 2000 for finding anomalous data points by measuring the local deviation of a given data poin ...
, works quite well also in comparison to more recent and more complex approaches, according to a large scale experimental analysis.
Validation of results
A
confusion matrix
In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a su ...
or "matching matrix" is often used as a tool to validate the accuracy of ''k''-NN classification. More robust statistical methods such as
likelihood-ratio test
In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after im ...
can also be applied.
See also
*
Nearest centroid classifier
In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation. When applied ...
*
Closest pair of points problem
The closest pair of points problem or closest pair problem is a problem of computational geometry: given n points in metric space, find a pair of points with the smallest distance between them. The closest pair problem for points in the Euclidean ...
References
Further reading
*
* {{cite book , title=Nearest-Neighbor Methods in Learning and Vision , editor=Shakhnarovich, Gregory , editor2=Darrell, Trevor , editor3=Indyk, Piotr , publisher=
MIT Press
The MIT Press is a university press affiliated with the Massachusetts Institute of Technology (MIT) in Cambridge, Massachusetts (United States). It was established in 1962.
History
The MIT Press traces its origins back to 1926 when MIT publish ...
, year=2005 , isbn=978-0262195478
Classification algorithms
Search algorithms
Machine learning algorithms
Statistical classification
Nonparametric statistics