Robust Regression
   HOME
*





Robust Regression
In robust statistics, robust regression seeks to overcome some limitations of traditional regression analysis. A regression analysis models the relationship between one or more independent variables and a dependent variable. Standard types of regression, such as ordinary least squares, have favourable properties if their underlying assumptions are true, but can give misleading results otherwise (i.e. are not robust to assumption violations). Robust regression methods are designed to limit the effect that violations of assumptions by the underlying data-generating process have on regression estimates. For example, least squares estimates for regression models are highly sensitive to outliers: an outlier with twice the error magnitude of a typical observation contributes four (two squared) times as much to the squared error loss, and therefore has more leverage over the regression estimates. The Huber loss function is a robust alternative to standard square error loss that reduces ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Robust Statistics
Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly. Introduction Robust statistics seek to provide methods that emulate popular statistical methods, but which are not unduly affected by outliers or other small departures from Statistical assumption, model assumptions. In statistics, classical estimation methods rely heavily on assumpti ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Stromberg2004
Stromberg, Strömberg, Strømberg, Stroemberg, or ''variant'' may refer to: Places Germany * Stromberg, Oelde, a town in Oelde * Stromberg (landscape), a region in Baden-Württemberg ** Stromberg-Heuchelberg Nature Park * Stromberg (Siebengebirge), a mountain peak now better known as Petersberg * Stromberg (Verbandsgemeinde), a collective municipality in the district of Bad Kreuznach in Rheinland-Pfalz ** Stromberg (Hunsrück), a town and the seat of the collective municipality People * Stromberg (surname), people with the surname of any spelling variant Corporate * Stromberg, a carburetor brand name used by Zenith Carburetters and by Bendix Corporation * Strömberg (company), a Finnish manufacturer of electronic products * Stromberg Guitars, an American company producing guitars, mainly for jazz musicians, between 1906 and 1955 * Stromberg-Carlson, an American manufacturer of telephone equipment, radios and television * Stromberg-Voisinet, manufacturer of musical instrument ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Mixture Distribution
In probability and statistics, a mixture distribution is the probability distribution of a random variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collection according to given probabilities of selection, and then the value of the selected random variable is realized. The underlying random variables may be random real numbers, or they may be random vectors (each having the same dimension), in which case the mixture distribution is a multivariate distribution. In cases where each of the underlying random variables is continuous, the outcome variable will also be continuous and its probability density function is sometimes referred to as a mixture density. The cumulative distribution function (and the probability density function if it exists) can be expressed as a convex combination (i.e. a weighted sum, with non-negative weights that sum to 1) of other distribution functions and density function ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Student's T-distribution
In probability and statistics, Student's ''t''-distribution (or simply the ''t''-distribution) is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in situations where the sample size is small and the population's standard deviation is unknown. It was developed by English statistician William Sealy Gosset under the pseudonym "Student". The ''t''-distribution plays a role in a number of widely used statistical analyses, including Student's ''t''-test for assessing the statistical significance of the difference between two sample means, the construction of confidence intervals for the difference between two population means, and in linear regression analysis. Student's ''t''-distribution also arises in the Bayesian analysis of data from a normal family. If we take a sample of n observations from a normal distribution, then the ''t''-distribution with \nu=n-1 degrees of freedom can be de ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Theil–Sen Estimator
In non-parametric statistics, the Theil–Sen estimator is a method for robustly fitting a line to sample points in the plane (simple linear regression) by choosing the median of the slopes of all lines through pairs of points. It has also been called Sen's slope estimator, slope selection, the single median method, the Kendall robust line-fit method, and the Kendall–Theil robust line. It is named after Henri Theil and Pranab K. Sen, who published papers on this method in 1950 and 1968 respectively,; and after Maurice Kendall because of its relation to the Kendall tau rank correlation coefficient. This estimator can be computed efficiently, and is insensitive to outliers. It can be significantly more accurate than non-robust simple linear regression (least squares) for skewed and heteroskedastic data, and competes well against least squares even for normally distributed data in terms of statistical power. It has been called "the most popular nonparametric technique for es ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Least Trimmed Squares
Least trimmed squares (LTS), or least trimmed sum of squares, is a robust statistical method that fits a function to a set of data whilst not being unduly affected by the presence of outliers. It is one of a number of methods for robust regression. Description of method Instead of the standard least squares method, which minimises the sum of squared residuals over ''n'' points, the LTS method attempts to minimise the sum of squared residuals over a subset, k, of those points. The unused n - k points do not influence the fit. In a standard least squares problem, the estimated parameter values β are defined to be those values that minimise the objective function ''S''(β) of squared residuals: :S = \sum_^n r_i(\beta)^2, where the residuals are defined as the differences between the values of the dependent variables (observations) and the model values: :r_i(\beta) = y_i - f(x_i, \beta), and where ''n'' is the overall number of data points. For a least trimmed squares analysis, ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Peter Rousseeuw
Peter J. Rousseeuw (born 13 October 1956) is a statistician known for his work on robust statistics and cluster analysis. He obtained his PhD in 1981 at the Vrije Universiteit Brussel, following research carried out at the ETH in Zurich, which led to a book on influence functions. Later he was professor at the Delft University of Technology, The Netherlands, at the University of Fribourg, Switzerland, and at the University of Antwerp, Belgium. Next he was a senior researcher at Renaissance Technologies. He then returned to Belgium as professor at KU Leuven, until becoming emeritus in 2022. His former PhD students include Annick Leroy, Hendrik Lopuhaä, Geert Molenberghs, Christophe Croux, Mia Hubert, Stefan Van Aelst, Tim Verdonck and Jakob Raymaekers. Research Rousseeuw has constructed and published many useful techniques. He proposed the Least Trimmed Squares method and S-estimators for robust regression, which can resist outliers in the data. He also introduced the Minimu ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Leverage (statistics)
In statistics and in particular in regression analysis, leverage is a measure of how far away the independent variable values of an observation are from those of the other observations. ''High-leverage points'', if any, are outliers with respect to the independent variables. That is, high-leverage points have no neighboring points in \mathbb^ space, where '''' is the number of independent variables in a regression model. This makes the fitted model likely to pass close to a high leverage observation. Hence high-leverage points have the potential to cause large changes in the parameter estimates when they are deleted i.e., to be influential points. Although an influential point will typically have high leverage, a high leverage point is not necessarily an influential point. The leverage is typically defined as the diagonal elements of the hat matrix. Definition and interpretations Consider the linear regression model _i = \boldsymbol_i^\boldsymbol+_i, i=1,\, 2,\ldots,\, n. That is ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




M-estimator
In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation. 48 samples of robust M-estimators can be found in a recent review study. More generally, an M-estimator may be defined to be a zero of an estimating function. This estimating function is often the derivative of another statistical function. For example, a maximum-likelihood estimate is the point where the derivative of the likelihood function with respect to the parameter is zero; thus, a maximum-likelihood estimator is a critical point of the score function. In many applications, such M-estimators can be thought of as estimating characteristics of the popula ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Least Absolute Deviations
Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute residuals (LAR), or least absolute values (LAV), is a statistical optimality criterion and a statistical optimization technique based minimizing the ''sum of absolute deviations'' (sum of absolute residuals or sum of absolute errors) or the ''L''1 norm of such values. It is analogous to the least squares technique, except that it is based on ''absolute values'' instead of squared values. It attempts to find a function which closely approximates a set of data by minimizing residuals between points generated by the function and corresponding data points. The LAD estimate also arises as the maximum likelihood estimate if the errors have a Laplace distribution. It was introduced in 1757 by Roger Joseph Boscovich. Formulation Suppose that the data set consists of the points (''x''''i'', ''y''''i'') with ''i'' = 1, 2, ..., ''n''. We want to find a function ''f'' such that f(x_i)\approx y_i. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


S-PLUS
S-PLUS is a commercial implementation of the S programming language sold by TIBCO Software Inc. It features object-oriented programming capabilities and advanced analytical algorithms. Due to the increasing popularity of the open source S successor R, TIBCO Software released thTIBCO Enterprise Runtime for R (TERR)as an alternative R interpreter. Historical timeline 1988: S-PLUS is first produced by a Seattle-based start-up company called Statistical Sciences, Inc. The founder and sole owner is R. Douglas Martin, professor of statistics at the University of Washington, Seattle. 1993: Statistical Sciences acquires the exclusive license to distribute S and merges with MathSoft, becoming the firm's Data Analysis Products Division (DAPD). 1995: S-PLUS 3.3 for Windows 95/NT. Matrix library, command history, Trellis graphics 1996: S-PLUS 3.4 for UNIX. Trellis graphics, (non-linear mixed effects) library, hexagonal binning, cluster methods. 1997: S-PLUS 4 for Windows. New GUI, i ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]