Conditional Expected Value
In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space. Depending on the context, the conditional expectation can be either a random variable or a function. The random variable is denoted E(X\mid Y) analogously to conditional probability. The function form is either denoted E(X\mid Y=y) or a separate function symbol such as f(y) is introduced with the meaning E(X\mid Y) = f(Y). Examples Example 1: Dice rolling Consider the roll of a fair die and let ''A'' = 1 if the number is even (i.e., 2, 4, or 6) and ''A'' = 0 otherwise ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Probability Theory
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms of probability, axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure (mathematics), measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event (probability theory), event. Central subjects in probability theory include discrete and continuous random variables, probability distributions, and stochastic processes (which provide mathematical abstractions of determinism, non-deterministic or uncertain processes or measured Quantity, quantities that may either be single occurrences or evolve over time in a random fashion). Although it is no ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Indicator Function
In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , then the indicator function of is the function \mathbf_A defined by \mathbf_\!(x) = 1 if x \in A, and \mathbf_\!(x) = 0 otherwise. Other common notations are and \chi_A. The indicator function of is the Iverson bracket of the property of belonging to ; that is, \mathbf_(x) = \left x\in A\ \right For example, the Dirichlet function is the indicator function of the rational numbers as a subset of the real numbers. Definition Given an arbitrary set , the indicator function of a subset of is the function \mathbf_A \colon X \mapsto \ defined by \operatorname\mathbf_A\!( x ) = \begin 1 & \text x \in A \\ 0 & \text x \notin A \,. \end The Iverson bracket provides the equivalent notation \left x\in A\ \right/math> or that can be used instead of \mathbf_\ ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Residual (statistics)
In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true value" (not necessarily observable). The error of an observation is the deviation of the observed value from the true value of a quantity of interest (for example, a population mean). The residual is the difference between the observed value and the '' estimated'' value of the quantity of interest (for example, a sample mean). The distinction is most important in regression analysis, where the concepts are sometimes called the regression errors and regression residuals and where they lead to the concept of studentized residuals. In econometrics, "errors" are also called disturbances. Introduction Suppose there is a series of observations from a univariate distribution and we want to estimate the mean of that distribution (the so-called location model). In this case, the errors are ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Hilbert Projection Theorem
In mathematics, the Hilbert projection theorem is a famous result of convex analysis that says that for every vector x in a Hilbert space In mathematics, a Hilbert space is a real number, real or complex number, complex inner product space that is also a complete metric space with respect to the metric induced by the inner product. It generalizes the notion of Euclidean space. The ... H and every nonempty closed convex C \subseteq H, there exists a unique vector m \in C for which \, c - x\, is minimized over the vectors c \in C; that is, such that \, m - x\, \leq \, c - x\, for every c \in C. Finite dimensional case Some intuition for the theorem can be obtained by considering the first order condition of the optimization problem. Consider a finite dimensional real Hilbert space H with a subspace C and a point x. If m \in C is a or of the function N : C \to \R defined by N(c) := \, c - x\, (which is the same as the minimum point of c \mapsto \, c - x\, ^2), then deri ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Dirac Distribution
In mathematical analysis, the Dirac delta function (or distribution), also known as the unit impulse, is a generalized function on the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one. Thus it can be represented heuristically as \delta (x) = \begin 0, & x \neq 0 \\ , & x = 0 \end such that \int_^ \delta(x) dx=1. Since there is no function having this property, modelling the delta "function" rigorously involves the use of limits or, as is common in mathematics, measure theory and the theory of distributions. The delta function was introduced by physicist Paul Dirac, and has since been applied routinely in physics and engineering to model point masses and instantaneous impulses. It is called the delta function because it is a continuous analogue of the Kronecker delta function, which is usually defined on a discrete domain and takes values 0 and 1. The mathematical rigor of the delta function was ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Pushforward Measure
In measure theory, a pushforward measure (also known as push forward, push-forward or image measure) is obtained by transferring ("pushing forward") a measure from one measurable space to another using a measurable function. Definition Given measurable spaces (X_1,\Sigma_1) and (X_2,\Sigma_2), a measurable function f\colon X_1\to X_2 and a measure \mu\colon\Sigma_1\to ,+\infty/math>, the pushforward of \mu by f is defined to be the measure f_(\mu)\colon\Sigma_2\to ,+\infty/math> given by :f_ (\mu) (B) = \mu \left( f^ (B) \right) for B \in \Sigma_. This definition applies ''mutatis mutandis'' for a signed or complex measure. The pushforward measure is also denoted as \mu \circ f^, f_\sharp \mu, f \sharp \mu, or f \# \mu. Properties Change of variable formula Theorem:Theorem 3.6.1 in A measurable function ''g'' on ''X''2 is integrable with respect to the pushforward measure ''f''∗(''μ'') if and only if the composition g \circ f is integrable with respect to the measure ' ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Multicollinearity
In statistics, multicollinearity or collinearity is a situation where the predictors in a regression model are linearly dependent. Perfect multicollinearity refers to a situation where the predictive variables have an ''exact'' linear relationship. When there is perfect collinearity, the design matrix X has less than full rank, and therefore the moment matrix X^X cannot be inverted. In this situation, the parameter estimates of the regression are not well-defined, as the system of equations has infinitely many solutions. Imperfect multicollinearity refers to a situation where the predictive variables have a ''nearly'' exact linear relationship. Contrary to popular belief, neither the Gauss–Markov theorem nor the more common maximum likelihood justification for ordinary least squares relies on any kind of correlation structure between dependent predictors (although perfect collinearity can cause problems with some software). There is no justification for the pra ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Linear Regression
In statistics, linear regression is a statistical model, model that estimates the relationship between a Scalar (mathematics), scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A model with exactly one explanatory variable is a ''simple linear regression''; a model with two or more explanatory variables is a multiple linear regression. This term is distinct from multivariate linear regression, which predicts multiple correlated dependent variables rather than a single dependent variable. In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimation theory, estimated from the data. Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Random Vector
In probability, and statistics, a multivariate random variable or random vector is a list or vector of mathematical variables each of whose value is unknown, either because the value has not yet occurred or because there is imperfect knowledge of its value. The individual variables in a random vector are grouped together because they are all part of a single mathematical system — often they represent different properties of an individual statistical unit. For example, while a given person has a specific age, height and weight, the representation of these features of ''an unspecified person'' from within a group would be a random vector. Normally each element of a random vector is a real number. Random vectors are often used as the underlying implementation of various types of aggregate random variables, e.g. a random matrix, random tree, random sequence, stochastic process, etc. Formally, a multivariate random variable is a column vector \mathbf = (X_1,\dots,X_n)^\maths ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Mean Squared Error
In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the true value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive (and not zero) is because of randomness or because the estimator does not account for information that could produce a more accurate estimate. In machine learning, specifically empirical risk minimization, MSE may refer to the ''empirical'' risk (the average loss on an observed data set), as an estimate of the true MSE (the true risk: the average loss on the actual population distribution). The MSE is a measure of the quality of an estimator. As it is derived from the square of Euclidean distance, it is always a positive value that decreases as the erro ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Variance
In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by \sigma^2, s^2, \operatorname(X), V(X), or \mathbb(X). An advantage of variance as a measure of dispersion is that it is more amenable to algebraic manipulation than other measures of dispersion such as the expected absolute deviation; for example, the variance of a sum of uncorrelated random variables is equal to the sum of their variances. A disadvantage of the variance for practical applications is that, unlike the standard deviation, its units differ from the random variable, which is why the standard devi ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |