Vapnik–Chervonenkis Dimension

	Vapnik–Chervonenkis Dimension In Vapnik–Chervonenkis theory, the Vapnik–Chervonenkis (VC) dimension is a measure of the size (capacity, complexity, expressive power, richness, or flexibility) of a class of sets. The notion can be extended to classes of binary functions. It is defined as the cardinality of the largest set of points that the algorithm can shatter, which means the algorithm can always learn a perfect classifier for any labeling of at least one configuration of those data points. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis. Informally, the capacity of a classification model is related to how complicated it can be. For example, consider the thresholding of a high- degree polynomial: if the polynomial evaluates above zero, that point is classified as positive, otherwise as negative. A high-degree polynomial can be wiggly, so that it can fit a given set of training points well. But one can expect that the classifier will make errors on other points, because it is too wi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Vapnik–Chervonenkis Theory Vapnik–Chervonenkis theory (also known as VC theory) was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view. Introduction VC theory covers at least four parts (as explained in ''The Nature of Statistical Learning Theory''): Theory of consistency of learning processes What are (necessary and sufficient) conditions for consistency of a learning process based on the empirical risk minimization principle? Nonasymptotic theory of the rate of convergence of learning processes *How fast is the rate of convergence of the learning process? Theory of controlling the generalization ability of learning processes *How can one control the rate of convergence (the generalization ability) of the learning process? Theory of constructing learning machines How can one construct algorithms that can control the generalization abil ... [...More Info...] [...Related Items...] OR:** [Wikipedia] [Google] [Baidu]
picture info	Probabilistic Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an event is to occur."Kendall's Advanced Theory of Statistics, Volume 1: Distribution Theory", Alan Stuart and Keith Ord, 6th ed., (2009), .William Feller, ''An Introduction to Probability Theory and Its Applications'', vol. 1, 3rd ed., (1968), Wiley, . This number is often expressed as a percentage (%), ranging from 0% to 100%. A simple example is the tossing of a fair (unbiased) coin. Since the coin is fair, the two outcomes ("heads" and "tails") are both equally probable; the probability of "heads" equals the probability of "tails"; and since no other outcomes are possible, the probability of either "heads" or "tails" is 1/2 (which could also be written as 0.5 or 50%). These concepts have been given an axiomatic mathematical formaliza ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Sigmoid Function A sigmoid function is any mathematical function whose graph of a function, graph has a characteristic S-shaped or sigmoid curve. A common example of a sigmoid function is the logistic function, which is defined by the formula :\sigma(x) = \frac = \frac = 1 - \sigma(-x). Other sigmoid functions are given in the #Examples, Examples section. In some fields, most notably in the context of artificial neural networks, the term "sigmoid function" is used as a synonym for "logistic function". Special cases of the sigmoid function include the Gompertz curve (used in modeling systems that saturate at large values of ''x'') and the ogee curve (used in the spillway of some dams). Sigmoid functions have domain of all real numbers, with return (response) value commonly monotonically increasing but could be decreasing. Sigmoid functions most often show a return value (''y'' axis) in the range 0 to 1. Another commonly used range is from −1 to 1. A wide variety of sigmoid functions ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Sign Function In mathematics, the sign function or signum function (from '' signum'', Latin for "sign") is a function that has the value , or according to whether the sign of a given real number is positive or negative, or the given number is itself zero. In mathematical notation the sign function is often represented as \sgn x or \sgn (x). Definition The signum function of a real number x is a piecewise function which is defined as follows: \sgn x :=\begin -1 & \text x 0. \end The law of trichotomy states that every real number must be positive, negative or zero. The signum function denotes which unique category a number falls into by mapping it to one of the values , or which can then be used in mathematical expressions or further calculations. For example: \begin \sgn(2) &=& +1\,, \\ \sgn(\pi) &=& +1\,, \\ \sgn(-8) &=& -1\,, \\ \sgn(-\frac) &=& -1\,, \\ \sgn(0) &=& 0\,. \end Basic properties Any real number can be expressed as the product of its absolute value and its sig ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Directed Acyclic Graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one vertex to another, such that following those directions will never form a closed loop. A directed graph is a DAG if and only if it can be topologically ordered, by arranging the vertices as a linear ordering that is consistent with all edge directions. DAGs have numerous scientific and computational applications, ranging from biology (evolution, family trees, epidemiology) to information science (citation networks) to computation (scheduling). Directed acyclic graphs are also called acyclic directed graphs or acyclic digraphs. Definitions A graph is formed by vertices and by edges connecting pairs of vertices, where the vertices can be any kind of object that is connected in pairs by edges. In the case of a directed graph, each edg ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Neural Network A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perform complex tasks. There are two main types of neural networks. In neuroscience, a '' biological neural network'' is a physical structure found in brains and complex nervous systems – a population of nerve cells connected by synapses. In machine learning, an '' artificial neural network'' is a mathematical model used to approximate nonlinear functions. Artificial neural networks are used to solve artificial intelligence problems. In biology In the context of biology, a neural network is a population of biological neurons chemically connected to each other by synapses. A given neuron can be connected to hundreds of thousands of synapses. Each neuron sends and receives electrochemical signals called action potentials to its conne ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Boosting (machine Learning) In machine learning (ML), boosting is an Ensemble learning, ensemble metaheuristic for primarily reducing Bias–variance tradeoff, bias (as opposed to variance). It can also improve the Stability (learning theory), stability and accuracy of ML Statistical classification, classification and Regression analysis, regression algorithms. Hence, it is prevalent in supervised learning for converting weak learners to strong learners. The concept of boosting is based on the question posed by Michael Kearns (computer scientist), Kearns and Leslie Valiant, Valiant (1988, 1989):Michael Kearns(1988)''Thoughts on Hypothesis Boosting'' Unpublished manuscript (Machine Learning class project, December 1988) "Can a set of weak learners create a single strong learner?" A weak learner is defined as a Statistical classification, classifier that is only slightly correlated with the true classification. A strong learner is a classifier that is arbitrarily well-correlated with the true classification. R ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Finite Projective Plane In mathematics, a projective plane is a geometric structure that extends the concept of a plane (geometry), plane. In the ordinary Euclidean plane, two lines typically intersect at a single point, but there are some pairs of lines (namely, parallel lines) that do not intersect. A projective plane can be thought of as an ordinary plane equipped with additional "points at infinity" where parallel lines intersect. Thus ''any'' two distinct lines in a projective plane intersect at exactly one point. Renaissance artists, in developing the techniques of drawing in Perspective (graphical)#Renaissance, perspective, laid the groundwork for this mathematical topic. The archetypical example is the real projective plane, also known as the extended Euclidean plane. This example, in slightly different guises, is important in algebraic geometry, topology and projective geometry where it may be denoted variously by , RP2, or P2(R), among other notations. There are many other projective planes, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Symmetric Set Difference In mathematics, the symmetric difference of two sets, also known as the disjunctive union and set sum, is the set of elements which are in either of the sets, but not in their intersection. For example, the symmetric difference of the sets \ and \ is \. The symmetric difference of the sets ''A'' and ''B'' is commonly denoted by A \operatorname\Delta B (alternatively, A \operatorname\vartriangle B), A \oplus B, or A \ominus B. It can be viewed as a form of addition modulo 2. The power set of any set becomes an abelian group under the operation of symmetric difference, with the empty set as the neutral element of the group and every element in this group being its own inverse. The power set of any set becomes a Boolean ring, with symmetric difference as the addition of the ring and intersection as the multiplication of the ring. Properties The symmetric difference is equivalent to the union of both relative complements, that is: :A\, \Delta\,B = \left(A \setminus B\right) ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Sample-complexity Bounds The sample complexity of a machine learning algorithm represents the number of training-samples that it needs in order to successfully learn a target function. More precisely, the sample complexity is the number of training-samples that we need to supply to the algorithm, so that the function returned by the algorithm is within an arbitrarily small error of the best possible function, with probability arbitrarily close to 1. There are two variants of sample complexity: * The weak variant fixes a particular input-output distribution; * The strong variant takes the worst-case sample complexity over all input-output distributions. The No free lunch theorem, discussed below, proves that, in general, the strong sample complexity is infinite, i.e. that there is no algorithm that can learn the globally-optimal target function using a finite number of training samples. However, if we are only interested in a particular class of target functions (e.g., only linear functions) then the sam ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]