Proofs Involving Ordinary Least Squares
   HOME
*





Proofs Involving Ordinary Least Squares
The purpose of this page is to provide supplementary materials for the ordinary least squares article, reducing the load of the main article with mathematics and improving its accessibility, while at the same time retaining the completeness of exposition. Derivation of the normal equations Define the ith residual to be :r_i= y_i - \sum_^ X_\beta_j. Then the objective S can be rewritten :S = \sum_^m r_i^2. Given that ''S'' is convex, it is minimized when its gradient vector is zero (This follows by definition: if the gradient vector is not zero, there is a direction in which we can move to minimize it further – see maxima and minima.) The elements of the gradient vector are the partial derivatives of ''S'' with respect to the parameters: :\frac=2\sum_^m r_i\frac \qquad (j=1,2,\dots, n). The derivatives are :\frac=-X_. Substitution of the expressions for the residuals and the derivatives into the gradient equations gives :\frac = 2\sum_^ \left( y_i-\sum_^ X_\beta_k \right) ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Ordinary Least Squares
In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being observed) in the input dataset and the output of the (linear) function of the independent variable. Geometrically, this is seen as the sum of the squared distances, parallel to the axis of the dependent variable, between each data point in the set and the corresponding point on the regression surface—the smaller the differences, the better the model fits the data. The resulting estimator can be expressed by a simple formula, especially in the case of a simple linear regression, in which there is a single regressor on the right side of the regression equation. The OLS estimator is consiste ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Projection Matrix
In statistics, the projection matrix (\mathbf), sometimes also called the influence matrix or hat matrix (\mathbf), maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes the influence each response value has on each fitted value. The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation. Definition If the vector of response values is denoted by \mathbf and the vector of fitted values by \mathbf, :\mathbf = \mathbf \mathbf. As \mathbf is usually pronounced "y-hat", the projection matrix \mathbf is also named ''hat matrix'' as it "puts a hat on \mathbf". The element in the ''i''th row and ''j''th column of \mathbf is equal to the covariance between the ''j''th response value and the ''i''th fitted value, divided by the variance of the former: :p_ = \frac Application for residuals The formula for the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Independent Random Variables
Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independents (Oporto artist group), a Portuguese artist group historically linked to abstract art and to Fernando Lanhas, the central figure of Portuguese abstractionism Music Groups, labels, and genres * Independent music, a number of genres associated with independent labels * Independent record label, a record label not associated with a major label * Independent Albums, American albums chart Albums * ''Independent'' (Ai album), 2012 * ''Independent'' (Faze album), 2006 * ''Independent'' (Sacred Reich album), 1993 Songs * "Independent" (song), a 2007 song by Webbie * "Independent", a 2002 song by Ayumi Hamasaki from '' H'' News and media organizations * ''The Independent'', a British online newspaper. * ''The Malta Independent'', a Mal ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Chi-squared Distribution
In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squared distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals. This distribution is sometimes called the central chi-squared distribution, a special case of the more general noncentral chi-squared distribution. The chi-squared distribution is used in the common chi-squared tests for goodness of fit of an observed distribution to a theoretical one, the independence of two criteria of classification of qualitative data, and in confidence interval estimation for a population standard deviation of a normal distribution from a sample standard deviation. Many other statistical tests a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Expected Value Of
Expected may refer to: *Expectation (epistemic) * Expected value *Expected shortfall *Expected utility hypothesis *Expected return *Expected loss Expected loss is the sum of the values of all possible losses, each multiplied by the probability of that loss occurring. In bank lending (homes, autos, credit cards, commercial lending, etc.) the expected loss on a loan varies over time for a num ... ;See also * Unexpected (other) * Expected value (other) {{disambig ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Hessian Matrix
In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term "functional determinants". Definitions and properties Suppose f : \R^n \to \R is a function taking as input a vector \mathbf \in \R^n and outputting a scalar f(\mathbf) \in \R. If all second-order partial derivatives of f exist, then the Hessian matrix \mathbf of f is a square n \times n matrix, usually defined and arranged as follows: \mathbf H_f= \begin \dfrac & \dfrac & \cdots & \dfrac \\ .2ex \dfrac & \dfrac & \cdots & \dfrac \\ .2ex \vdots & \vdots & \ddots & \vdots \\ .2ex \dfrac & \dfrac & \cdots & \dfrac \end, or, by stating an equation for the coefficients using indices i and j, (\mathbf H_f)_ = \fra ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Maximum Likelihood Estimation
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference. If the likelihood function is differentiable, the derivative test for finding maxima can be applied. In some cases, the first-order conditions of the likelihood function can be solved analytically; for instance, the ordinary least squares estimator for a linear regression model maximizes the likelihood when all observed outcomes are assumed to have Normal distributions with the same variance. From the perspective of Bayesian inference, M ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Central Limit Theorem
In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions. This theorem has seen many changes during the formal development of probability theory. Previous versions of the theorem date back to 1811, but in its modern general form, this fundamental result in probability theory was precisely stated as late as 1920, thereby serving as a bridge between classical and modern probability theory. If X_1, X_2, \dots, X_n, \dots are random samples drawn from a population with overall mean \mu and finite variance and if \bar_n is the sample mean of t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Continuous Mapping Theorem
In probability theory, the continuous mapping theorem states that continuous functions preserve limits even if their arguments are sequences of random variables. A continuous function, in Heine’s definition, is such a function that maps convergent sequences into convergent sequences: if ''xn'' → ''x'' then ''g''(''xn'') → ''g''(''x''). The ''continuous mapping theorem'' states that this will also be true if we replace the deterministic sequence with a sequence of random variables , and replace the standard notion of convergence of real numbers “→” with one of the types of convergence of random variables. This theorem was first proved by Henry Mann and Abraham Wald in 1943, and it is therefore sometimes called the Mann–Wald theorem. Meanwhile, Denis Sargan refers to it as the general transformation theorem. Statement Let , ''X'' be random elements defined on a metric space ''S''. Suppose a function (where ''S′'' is another metric space) has the set of discontinu ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Slutsky's Theorem
In probability theory, Slutsky’s theorem extends some properties of algebraic operations on convergent sequences of real numbers to sequences of random variables. The theorem was named after Eugen Slutsky. Slutsky's theorem is also attributed to Harald Cramér. Statement Let X_n, Y_n be sequences of scalar/vector/matrix random elements. If X_n converges in distribution to a random element X and Y_n converges in probability to a constant c, then * X_n + Y_n \ \xrightarrow\ X + c ; * X_nY_n \ \xrightarrow\ Xc ; * X_n/Y_n \ \xrightarrow\ X/c,   provided that ''c'' is invertible, where \xrightarrow denotes convergence in distribution. Notes: # The requirement that ''Yn'' converges to a constant is important — if it were to converge to a non-degenerate random variable, the theorem would be no longer valid. For example, let X_n \sim (0,1) and Y_n = -X_n. The sum X_n + Y_n = 0 for all values of ''n''. Moreover, Y_n \, \xrightarrow \, (-1,0), but X_n + Y_n does not converge ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Law Of Large Numbers
In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed. The LLN is important because it guarantees stable long-term results for the averages of some random events. For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. Importantly, the law applies (as the name indicates) only when a ''large number'' of observations are considered. There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be "balanced ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Maximum Likelihood Approach
In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given range (the ''local'' or ''relative'' extrema), or on the entire domain (the ''global'' or ''absolute'' extrema). Pierre de Fermat was one of the first mathematicians to propose a general technique, adequality, for finding the maxima and minima of functions. As defined in set theory, the maximum and minimum of a set are the greatest and least elements in the set, respectively. Unbounded infinite sets, such as the set of real numbers, have no minimum or maximum. Definition A real-valued function ''f'' defined on a domain ''X'' has a global (or absolute) maximum point at ''x''∗, if for all ''x'' in ''X''. Similarly, the function has a global (or absolute) minimum point at ''x''∗, if for all ''x'' in ''X''. The value of the function at a m ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]