Conditional Expectation

picture info	Conditional Expectation In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – given that a certain set of "conditions" is known to occur. If the random variable can take on only a finite number of values, the “conditions” are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space. Depending on the context, the conditional expectation can be either a random variable or a function. The random variable is denoted E(X\mid Y) analogously to conditional probability. The function form is either denoted E(X\mid Y=y) or a separate function symbol such as f(y) is introduced with the meaning E(X\mid Y) = f(Y). Examples Example 1: Dice rolling Consider the roll of ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Probability Theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms of probability, axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure (mathematics), measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event (probability theory), event. Central subjects in probability theory include discrete and continuous random variables, probability distributions, and stochastic processes (which provide mathematical abstractions of determinism, non-deterministic or uncertain processes or measured Quantity, quantities that may either be single occurrences or evolve over time in a random fashion). Although it is not possible to perfectly p ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Indicator Function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\in A, and \mathbf_(x)=0 otherwise, where \mathbf_A is a common notation for the indicator function. Other common notations are I_A, and \chi_A. The indicator function of is the Iverson bracket of the property of belonging to ; that is, :\mathbf_(x)= \in A For example, the Dirichlet function is the indicator function of the rational numbers as a subset of the real numbers. Definition The indicator function of a subset of a set is a function \mathbf_A \colon X \to \ defined as \mathbf_A(x) := \begin 1 ~&\text~ x \in A~, \\ 0 ~&\text~ x \notin A~. \end The Iverson bracket provides the equivalent notation, \in A/math> or to be used instead of \mathbf_(x)\,. The function \mathbf_A is sometimes denoted , , , or even just . ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Residual (statistics) In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its " true value" (not necessarily observable). The error of an observation is the deviation of the observed value from the true value of a quantity of interest (for example, a population mean). The residual is the difference between the observed value and the '' estimated'' value of the quantity of interest (for example, a sample mean). The distinction is most important in regression analysis, where the concepts are sometimes called the regression errors and regression residuals and where they lead to the concept of studentized residuals. In econometrics, "errors" are also called disturbances. Introduction Suppose there is a series of observations from a univariate distribution and we want to estimate the mean of that distribution (the so-called location model). In this case, the errors are ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Hilbert Projection Theorem In mathematics, the Hilbert projection theorem is a famous result of convex analysis that says that for every vector x in a Hilbert space In mathematics, Hilbert spaces (named after David Hilbert) allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. Hilbert spaces arise natu ... H and every nonempty closed convex C \subseteq H, there exists a unique vector m \in C for which \, c - x\, is minimized over the vectors c \in C; that is, such that \, m - x\, \leq \, c - x\, for every c \in C. Finite dimensional case Some intuition for the theorem can be obtained by considering the first order condition of the optimization problem. Consider a finite dimensional real Hilbert space H with a subspace C and a point x. If m \in C is a or of the function N : C \to \R defined by N(c) := \, c - x\, (which is the same as the minimum point of c \mapsto \, c - x\, ^2), then der ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Dirac Distribution In mathematics, the Dirac delta distribution ( distribution), also known as the unit impulse, is a generalized function or distribution over the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one. The current understanding of the unit impulse is as a linear functional that maps every continuous function (e.g., f(x)) to its value at zero of its domain (f(0)), or as the weak limit of a sequence of bump functions (e.g., \delta(x) = \lim_ \frace^), which are zero over most of the real line, with a tall spike at the origin. Bump functions are thus sometimes called "approximate" or "nascent" delta distributions. The delta function was introduced by physicist Paul Dirac as a tool for the normalization of state vectors. It also has uses in probability theory and signal processing. Its validity was disputed until Laurent Schwartz developed the theory of distributions where it is defined as a linear form acting o ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Pushforward Measure In measure theory, a pushforward measure (also known as push forward, push-forward or image measure) is obtained by transferring ("pushing forward") a measure from one measurable space to another using a measurable function. Definition Given measurable spaces (X_1,\Sigma_1) and (X_2,\Sigma_2), a measurable mapping f\colon X_1\to X_2 and a measure \mu\colon\Sigma_1\to ,+\infty/math>, the pushforward of \mu is defined to be the measure f_(\mu)\colon\Sigma_2\to ,+\infty/math> given by :f_ (\mu) (B) = \mu \left( f^ (B) \right) for B \in \Sigma_. This definition applies ''mutatis mutandis'' for a signed or complex measure. The pushforward measure is also denoted as \mu \circ f^, f_\sharp \mu, f \sharp \mu, or f \# \mu. Main property: change-of-variables formula Theorem:Sections 3.6–3.7 in A measurable function ''g'' on ''X''2 is integrable with respect to the pushforward measure ''f''∗(''μ'') if and only if the composition g \circ f is integrable with respect to the measure ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Multicollinearity In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation, the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data set; it only affects calculations regarding individual predictors. That is, a multivariate regression model with collinear predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others. Note that in statements of the assumptions underlying regression analyses such as ordinary least squares, the phrase "no multicollinearity" usually refe ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Linear Regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called '' simple linear regression''; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models. Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, linear regressio ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Random Vector In probability, and statistics, a multivariate random variable or random vector is a list of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of ''an unspecified person'' from within a group would be a random vector. Normally each element of a random vector is a real number. Random vectors are often used as the underlying implementation of various types of aggregate random variables, e.g. a random matrix, random tree, random sequence, stochastic process, etc. More formally, a multivariate random variable is a column vector \mathbf = (X_1,\dots,X_n)^\mathsf (or ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Mean Squared Error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive (and not zero) is because of randomness or because the estimator does not account for information that could produce a more accurate estimate. In machine learning, specifically empirical risk minimization, MSE may refer to the ''empirical'' risk (the average loss on an observed data set), as an estimate of the true MSE (the true risk: the average loss on the actual population distribution). The MSE is a measure of the quality of an estimator. As it is derived from the square of Euclidean distance, it is always a positive value that decreases as the er ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]