HOME

TheInfoList



OR:

In
Euclidean geometry Euclidean geometry is a mathematical system attributed to ancient Greek mathematics, Greek mathematician Euclid, which he described in his textbook on geometry: the ''Euclid's Elements, Elements''. Euclid's approach consists in assuming a small ...
, linear separability is a property of two sets of points. This is most easily visualized in two dimensions (the
Euclidean plane In mathematics, the Euclidean plane is a Euclidean space of dimension two. That is, a geometric setting in which two real quantities are required to determine the position of each point ( element of the plane), which includes affine notions of ...
) by thinking of one set of points as being colored blue and the other set of points as being colored red. These two sets are ''linearly separable'' if there exists at least one line in the plane with all of the blue points on one side of the line and all the red points on the other side. This idea immediately generalizes to higher-dimensional Euclidean spaces if the line is replaced by a
hyperplane In geometry, a hyperplane is a subspace whose dimension is one less than that of its ''ambient space''. For example, if a space is 3-dimensional then its hyperplanes are the 2-dimensional planes, while if the space is 2-dimensional, its hyper ...
. The problem of determining if a pair of sets is linearly separable and finding a separating hyperplane if they are, arises in several areas. In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, classifying certain types of data is a problem for which good algorithms exist that are based on this concept.


Mathematical definition

Let X_ and X_ be two sets of points in an ''n''-dimensional Euclidean space. Then X_ and X_ are ''linearly separable'' if there exist ''n'' + 1 real numbers w_, w_,..,w_, k, such that every point x \in X_ satisfies \sum^_ w_x_ > k and every point x \in X_ satisfies \sum^_ w_x_ < k, where x_ is the i-th component of x. Equivalently, two sets are linearly separable precisely when their respective
convex hull In geometry, the convex hull or convex envelope or convex closure of a shape is the smallest convex set that contains it. The convex hull may be defined either as the intersection of all convex sets containing a given subset of a Euclidean space ...
s are disjoint (colloquially, do not overlap). In simple 2D, it can also be imagined that the set of points under a linear transformation collapses into a line, on which there exists a value, k, greater than which one set of points will fall into, and lesser than which the other set of points fall.


Examples

Three non-
collinear In geometry, collinearity of a set of points is the property of their lying on a single line. A set of points with this property is said to be collinear (sometimes spelled as colinear). In greater generality, the term has been used for aligned ...
points in two classes ('+' and '-') are always linearly separable in two dimensions. This is illustrated by the three examples in the following figure (the all '+' case is not shown, but is similar to the all '-' case): However, not all sets of four points, no three collinear, are linearly separable in two dimensions. The following example would need ''two'' straight lines and thus is not linearly separable: Notice that three points which are collinear and of the form "+ ⋅⋅⋅ — ⋅⋅⋅ +" are also not linearly separable.


Linear separability of Boolean functions in ''n'' variables

A
Boolean function In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set (usually , or ). Alternative names are switching function, used especially in older computer science literature, and truth function ( ...
in ''n'' variables can be thought of as an assignment of ''0'' or ''1'' to each vertex of a Boolean
hypercube In geometry, a hypercube is an ''n''-dimensional analogue of a square () and a cube (). It is a closed, compact, convex figure whose 1- skeleton consists of groups of opposite parallel line segments aligned in each of the space's dimensions, ...
in ''n'' dimensions. This gives a natural division of the vertices into two sets. The Boolean function is said to be ''linearly separable'' provided these two sets of points are linearly separable. The number of distinct Boolean functions is 2^where ''n'' is the number of variables passed into the function.


Support vector machines

Classifying data is a common task in
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
. Suppose some data points, each belonging to one of two sets, are given and we wish to create a model that will decide which set a ''new'' data point will be in. In the case of
support vector machine In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratorie ...
s, a data point is viewed as a ''p''-dimensional vector (a list of ''p'' numbers), and we want to know whether we can separate such points with a (''p'' − 1)-dimensional
hyperplane In geometry, a hyperplane is a subspace whose dimension is one less than that of its ''ambient space''. For example, if a space is 3-dimensional then its hyperplanes are the 2-dimensional planes, while if the space is 2-dimensional, its hyper ...
. This is called a
linear classifier In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by making a classification decision based on the val ...
. There are many hyperplanes that might classify (separate) the data. One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the two sets. So we choose the hyperplane so that the distance from it to the nearest data point on each side is maximized. If such a hyperplane exists, it is known as the ''
maximum-margin hyperplane In geometry, the hyperplane separation theorem is a theorem about disjoint convex sets in ''n''-dimensional Euclidean space. There are several rather similar versions. In one version of the theorem, if both these sets are closed and at least one ...
'' and the linear classifier it defines is known as a ''maximum
margin classifier In machine learning, a margin classifier is a classifier which is able to give an associated distance from the decision boundary for each example. For instance, if a linear classifier (e.g. perceptron or linear discriminant analysis) is used, the ...
''. More formally, given some training data \mathcal, a set of ''n'' points of the form :\mathcal = \left\_^n where the ''y''''i'' is either 1 or −1, indicating the set to which the point \mathbf_i belongs. Each \mathbf_i is a ''p''-dimensional
real Real may refer to: Currencies * Brazilian real (R$) * Central American Republic real * Mexican real * Portuguese real * Spanish real * Spanish colonial real Music Albums * ''Real'' (L'Arc-en-Ciel album) (2000) * ''Real'' (Bright album) (2010) ...
vector. We want to find the maximum-margin hyperplane that divides the points having y_i=1 from those having y_i=-1. Any hyperplane can be written as the set of points \mathbf satisfying : \mathbf\cdot\mathbf - b=0, where \cdot denotes the
dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an algebra ...
and the (not necessarily normalized)
normal vector In geometry, a normal is an object such as a line, ray, or vector that is perpendicular to a given object. For example, the normal line to a plane curve at a given point is the (infinite) line perpendicular to the tangent line to the curve at ...
to the hyperplane. The parameter \tfrac determines the offset of the hyperplane from the origin along the normal vector . If the training data are linearly separable, we can select two hyperplanes in such a way that they separate the data and there are no points between them, and then try to maximize their distance.


See also

*
Hyperplane separation theorem In geometry, the hyperplane separation theorem is a theorem about disjoint convex sets in ''n''-dimensional Euclidean space. There are several rather similar versions. In one version of the theorem, if both these sets are closed and at least on ...
*
Kirchberger's theorem Kirchberger's theorem is a theorem in discrete geometry, on linear separability. The two-dimensional version of the theorem states that, if a finite set of red and blue points in the Euclidean plane has the property that, for every four points, the ...
*
Perceptron In machine learning, the perceptron (or McCulloch-Pitts neuron) is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belon ...
*
Vapnik–Chervonenkis dimension Vapnik–Chervonenkis theory, the Vapnik–Chervonenkis (VC) dimension is a measure of the capacity (complexity, expressive power, richness, or flexibility) of a set of functions that can be learned by a statistical binary classification algorithm ...


References

{{reflist Geometry Convex analysis Machine learning