Chernoff's Inequality

	Chernoff's Inequality In probability theory, the Chernoff bound gives exponentially decreasing bounds on tail distributions of sums of independent random variables. Despite being named after Herman Chernoff, the author of the paper it first appeared in, the result is due to Herman Rubin. It is a sharper bound than the first- or second-moment-based tail bounds such as Markov's inequality or Chebyshev's inequality, which only yield power-law bounds on tail decay. However, the Chernoff bound requires the variates to be independent, a condition that is not required by either Markov's inequality or Chebyshev's inequality (although Chebyshev's inequality does require the variates to be pairwise independent). The Chernoff bound is related to the Bernstein inequalities, which were developed earlier, and to Hoeffding's inequality. The generic bound The generic Chernoff bound for a random variable is attained by applying Markov's inequality to . This gives a bound in terms of the moment-generating function ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Probability Theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event. Central subjects in probability theory include discrete and continuous random variables, probability distributions, and stochastic processes (which provide mathematical abstractions of non-deterministic or uncertain processes or measured quantities that may either be single occurrences or evolve over time in a random fashion). Although it is not possible to perfectly predict random events, much can be said about their behavior. Two major results in probability ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Wassily Hoeffding Wassily Hoeffding (June 12, 1914 – February 28, 1991) was a Finnish statistician and probabilist. Hoeffding was one of the founders of nonparametric statistics, in which Hoeffding contributed the idea and basic results on U-statistics. In probability theory, Hoeffding's inequality provides an upper bound on the probability for the sum of random variables to deviate from its expected value. Personal life Hoeffding was born in Mustamäki, Finland, (Gorkovskoye, Russia since 1940), although his place of birth is registered as St. Petersburg on his birth certificate. His father was an economist and a disciple of Peter Struve, the Russian social scientist and public figure. His paternal grandparents were Danish and his father's uncle was the Danish philosopher Harald Høffding. His mother, née Wedensky, had studied medicine. Both grandfathers had been engineers. In 1918 the family left Tsarskoye Selo for Ukraine and, after traveling through scenes of civil war, finally left Russ ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Randomized Algorithm A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random determined by the random bits; thus either the running time, or the output (or both) are random variables. One has to distinguish between algorithms that use the random input so that they always terminate with the correct answer, but where the expected running time is finite (Las Vegas algorithms, for example Quicksort), and algorithms which have a chance of producing an incorrect result (Monte Carlo algorithms, for example the Monte Carlo algorithm for the MFAS problem) or fail to produce a result either by signaling a failure or failing to terminate. In some cases, probabilistic algorithms are the only practical means of solving a problem. In common practice, randomized algor ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Probably Approximately Correct Learning In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.L. Valiant. A theory of the learnable.' Communications of the ACM, 27, 1984. In this framework, the learner receives samples and must select a generalization function (called the ''hypothesis'') from a certain class of possible functions. The goal is that, with high probability (the "probably" part), the selected function will have low generalization error (the "approximately correct" part). The learner must be able to learn the concept given any arbitrary approximation ratio, probability of success, or distribution of the samples. The model was later extended to treat noise (misclassified samples). An important innovation of the PAC framework is the introduction of computational complexity theory concepts to machine learning. In particular, the learner is expected to find efficient functions (tim ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Computational Learning Theory In computer science, computational learning theory (or just learning theory) is a subfield of artificial intelligence devoted to studying the design and analysis of machine learning algorithms. Overview Theoretical results in machine learning mainly deal with a type of inductive learning called supervised learning. In supervised learning, an algorithm is given samples that are labeled in some useful way. For example, the samples might be descriptions of mushrooms, and the labels could be whether or not the mushrooms are edible. The algorithm takes these previously labeled samples and uses them to induce a classifier. This classifier is a function that assigns labels to samples, including samples that have not been seen previously by the algorithm. The goal of the supervised learning algorithm is to optimize some measure of performance such as minimizing the number of mistakes made on new samples. In addition to performance bounds, computational learning theory studies the t ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Network Congestion Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking of new connections. A consequence of congestion is that an incremental increase in offered load leads either only to a small increase or even a decrease in network throughput. Network protocols that use aggressive retransmissions to compensate for packet loss due to congestion can increase congestion, even after the initial load has been reduced to a level that would not normally have induced network congestion. Such networks exhibit two stable states under the same level of load. The stable state with low throughput is known as congestive collapse. Networks use congestion control and congestion avoidance techniques to try to avoid collapse. These include: exponential backoff in protocols such as CSMA/CA in 802.11 and the similar CSMA/CD i ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Sparse Graph In mathematics, a dense graph is a graph in which the number of edges is close to the maximal number of edges (where every pair of vertices is connected by one edge). The opposite, a graph with only a few edges, is a sparse graph. The distinction of what constitutes a dense or sparse graph is ill-defined, and depends on context. The graph density of simple graphs is defined to be the ratio of the number of edges with respect to the maximum possible edges. For undirected simple graphs, the graph density is: :D = \frac = \frac For directed, simple graphs, the maximum possible edges is twice that of undirected graphs (as there are two directions to an edge) so the density is: :D = \frac = \frac where is the number of edges and is the number of vertices in the graph. The maximum number of edges for an undirected graph is = \frac2, so the maximal density is 1 (for complete graphs) and the minimal density is 0 . Upper density ''Upper density'' is an extension of the concept of g ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Routing Routing is the process of selecting a path for traffic in a network or between or across multiple networks. Broadly, routing is performed in many types of networks, including circuit-switched networks, such as the public switched telephone network (PSTN), and computer networks, such as the Internet. In packet switching networks, routing is the higher-level decision making that directs network packets from their source toward their destination through intermediate network nodes by specific packet forwarding mechanisms. Packet forwarding is the transit of network packets from one network interface to another. Intermediate nodes are typically network hardware devices such as routers, gateways, firewalls, or switches. General-purpose computers also forward packets and perform routing, although they have no specially optimized hardware for the task. The routing process usually directs forwarding on the basis of routing tables. Routing tables maintain a record of the routes to ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Packet (information Technology) In telecommunications and computer networking, a network packet is a formatted unit of data carried by a packet-switched network. A packet consists of control information and user data; the latter is also known as the ''payload''. Control information provides data for delivering the payload (e.g., source and destination network addresses, error detection codes, or sequencing information). Typically, control information is found in packet headers and trailers. In packet switching, the bandwidth of the transmission medium is shared between multiple communication sessions, in contrast to circuit switching, in which circuits are preallocated for the duration of one session and data is typically transmitted as a continuous bit stream. Terminology In the seven-layer OSI model of computer networking, ''packet'' strictly refers to a protocol data unit at layer 3, the network layer. A data unit at layer 2, the data link layer, is a ''frame''. In layer 4, the transport layer, the data u ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Set Balancing The set balancing problem in mathematics is the problem of dividing a set to two subsets that have roughly the same characteristics. It arises naturally in design of experiments. There is a group of subjects. Each subject has several features, which are considered binary. For example: each subject can be either young or old; either black or white; either tall or short; etc. The goal is to divide the subjects to two sub-groups: treatment group (T) and control group (C), such that for each feature, the number of subjects that have this feature in T is roughly equal to the number of subjects that have this feature in C. E.g., both groups should have roughly the same number of young people, the same number of black people, the same number of tall people, etc. Matrix representation Formally, the set balancing problem can be described as follows. m is the number of subjects in the general population. n is the number of potential features. The subjects are described by A, an n\times ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Hoeffding's Lemma In probability theory, Hoeffding's lemma is an inequality that bounds the moment-generating function of any bounded random variable. It is named after the Finnish–American mathematical statistician Wassily Hoeffding. The proof of Hoeffding's lemma uses Taylor's theorem and Jensen's inequality. Hoeffding's lemma is itself used in the proof of McDiarmid's inequality. Statement of the lemma Let ''X'' be any real-valued random variable such that a \leq X \leq b almost surely, i.e. with probability one. Then, for all \lambda \in \mathbb, :\mathbb \left e^ \right\leq \exp \Big(\lambda\mathbb \frac \Big), or equivalently, :\mathbb \left e^ \right\leq \exp \Big(\frac \Big). Proof Without loss of generality, by replacing X by X - \mathbb /math>, we can assume \mathbb = 0, so that a \leq 0 \leq b. Since e^ is a convex function of x, we have that for all x \in ,b/math>, :e^\leq \frace^+\frace^ So, :\begin \mathbb\left ^\right&\leq \frace^+\frace^\\ &= \frace^ + \frace ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Convex Function In mathematics, a real-valued function is called convex if the line segment between any two points on the graph of a function, graph of the function lies above the graph between the two points. Equivalently, a function is convex if its epigraph (mathematics), epigraph (the set of points on or above the graph of the function) is a convex set. A twice-differentiable function of a single variable is convex if and only if its second derivative is nonnegative on its entire domain. Well-known examples of convex functions of a single variable include the quadratic function x^2 and the exponential function e^x. In simple terms, a convex function refers to a function whose graph is shaped like a cup \cup, while a concave function's graph is shaped like a cap \cap. Convex functions play an important role in many areas of mathematics. They are especially important in the study of optimization problems where they are distinguished by a number of convenient properties. For instance, a st ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]