Working–Hotelling Procedure
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, particularly
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
, the Working–Hotelling procedure, named after
Holbrook Working Holbrook Working (February 5, 1895 – October 5, 1985) was an American professor of economics and statistics at Stanford University's Food Research Institute known for his contributions on hedging, on the theory of futures prices, on an early ...
and
Harold Hotelling Harold Hotelling (; September 29, 1895 – December 26, 1973) was an American mathematical statistician and an influential economic theorist, known for Hotelling's law, Hotelling's lemma, and Hotelling's rule in economics, as well as Hotelling's T ...
, is a method of simultaneous estimation in
linear regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
models. One of the first developments in simultaneous inference, it was devised by Working and Hotelling for the
simple linear regression In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x'' and ...
model in 1929.Miller (1966), p. 1 It provides a
confidence region In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an ''n''-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, al ...
for multiple mean responses, that is, it gives the upper and lower bounds of more than one value of a
dependent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
at several levels of the
independent variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
s at a certain
confidence level In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
. The resulting
confidence band A confidence band is used in statistical analysis to represent the uncertainty in an estimate of a curve or function based on limited or noisy data. Similarly, a prediction band is used to represent the uncertainty about the value of a new data-p ...
s are known as the Working–Hotelling–Scheffé confidence bands. Like the closely related
Scheffé's method In statistics, Scheffé's method, named after the American statistician Henry Scheffé, is a method for adjusting significance levels in a linear regression analysis to account for multiple comparisons. It is particularly useful in analysis of ...
in the
analysis of variance Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statisticia ...
, which considers all possible contrasts, the Working–Hotelling procedure considers all possible values of the independent variables; that is, in a particular regression model, the probability that all the Working–Hotelling confidence intervals cover the true value of the mean response is the
confidence coefficient In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown Statistical parameter, parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but ...
. As such, when only a small subset of the possible values of the independent variable is considered, it is more conservative and yields wider intervals than competitors like the
Bonferroni correction In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem. Background The method is named for its use of the Bonferroni inequalities. An extension of the method to confidence intervals was proposed by Oliv ...
at the same level of confidence. It outperforms the Bonferroni correction as more values are considered.


Statement


Simple linear regression

Consider a
simple linear regression In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x'' and ...
model Y = \beta_0 + \beta_1 X + \varepsilon, where Y is the response variable and X the explanatory variable, and let b_0 and b_1 be the
least-squares The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
estimates of \beta_0 and \beta_1 respectively. Then the least-squares estimate of the mean response E(Y_i) at the level X = x_i is \hat = b_0 + b_1 x_i . It can then be shown, assuming that the errors independently and identically follow the
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
, that an 1 - \alpha confidence interval of the mean response at a certain level of X is as follows: : \hat_i \in \left b_0 + b_1 x_i \pm t_ \sqrt\right where \left(\frac \sum_^n e_j^ \right) is the
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
and t_ denotes the upper \frac^\text
percentile In statistics, a ''k''-th percentile (percentile score or centile) is a score ''below which'' a given percentage ''k'' of scores in its frequency distribution falls (exclusive definition) or a score ''at or below which'' a given percentage falls ...
of
Student's t-distribution In probability and statistics, Student's ''t''-distribution (or simply the ''t''-distribution) is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in sit ...
with n-2
degrees of freedom Degrees of freedom (often abbreviated df or DOF) refers to the number of independent variables or parameters of a thermodynamic system. In various scientific fields, the word "freedom" is used to describe the limits to which physical movement or ...
. However, as multiple mean responses are estimated, the confidence level declines rapidly. To fix the confidence coefficient at 1 - \alpha, the Working–Hotelling approach employs an F-statistic:Miller (2014)Neter, Wasserman and Kutner, pp. 163–165 : \hat_i \in \left b_0 + b_1 x_i \pm W \sqrt\right where W^2 = 2F_ and F denotes the upper \alpha^\text percentile of the
F-distribution In probability theory and statistics, the ''F''-distribution or F-ratio, also known as Snedecor's ''F'' distribution or the Fisher–Snedecor distribution (after Ronald Fisher and George W. Snedecor) is a continuous probability distribution th ...
with (2, n-2) degrees of freedom. The confidence level of is 1 - \alpha over ''all'' values of X, i.e. x_i \in \mathbb.


Multiple linear regression

The Working–Hotelling confidence bands can be easily generalised to multiple linear regression. Consider a general linear model as defined in the linear regressions article, that is, : \mathbf = \mathbf\boldsymbol\beta + \boldsymbol\varepsilon, \, where : \mathbf = \begin Y_1 \\ Y_2 \\ \vdots \\ Y_n \end, \quad \mathbf = \begin \mathbf^_1 \\ \mathbf^_2 \\ \vdots \\ \mathbf^_n \end = \begin x_ & \cdots & x_ \\ x_ & \cdots & x_ \\ \vdots & \ddots & \vdots \\ x_ & \cdots & x_ \end, \boldsymbol\beta = \begin \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end, \quad \boldsymbol\varepsilon = \begin \varepsilon_1 \\ \varepsilon_2 \\ \vdots \\ \varepsilon_n \end. Again, it can be shown that the least-squares estimate of the mean response E(Y_i) = \mathbf^_i \boldsymbol\beta is \hat_i = \mathbf^_i \mathbf, where \mathbf consists of least-square estimates of the entries in \boldsymbol\beta, i.e. \mathbf = (\mathbf^ \mathbf)^ \mathbf^\mathbf. Likewise, it can be shown that a 1 - \alpha confidence interval for a single mean response estimate is as follows: : \hat_i \in \left \mathbf^_i \mathbf \pm t_ \sqrt)\right where \operatorname is the observed value of the mean squared error (Y^ Y - \mathbf^ X^ Y). The Working–Hotelling approach to multiple estimations is similar to that of simple linear regression, with only a change in the degrees of freedom: : \hat_i \in \left \mathbf^_i \mathbf \pm W \sqrt)\right where W^2 = 2F_.


Graphical representation

In the simple linear regression case, Working–Hotelling–Scheffé
confidence band A confidence band is used in statistical analysis to represent the uncertainty in an estimate of a curve or function based on limited or noisy data. Similarly, a prediction band is used to represent the uncertainty about the value of a new data-p ...
s, drawn by connecting the upper and lower limits of the mean response at every level, take the shape of
hyperbola In mathematics, a hyperbola (; pl. hyperbolas or hyperbolae ; adj. hyperbolic ) is a type of smooth curve lying in a plane, defined by its geometric properties or by equations for which it is the solution set. A hyperbola has two pieces, cal ...
s. In drawing, they are sometimes approximated by the Graybill–Bowden confidence bands, which are linear and hence easier to graph: : \beta_0 + \beta_1(x_i-\bar) \in \left b_0 + b_1(x_i-\bar) \pm m_ \cdot \left(\frac + \frac \right) \right/math> where m_denotes the upper \alpha^\text percentile of the Studentized maximum modulus distribution with two means and n - 2 degrees of freedom.


Numerical example

The same data in
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
are utilised in this example: : A simple linear regression model is fit to this data. The values of b_0 and b_1 have been found to be −39.06 and 61.27 respectively. The goal is to estimate the mean mass of women given their heights at the 95% confidence level. The value of W^2 was found to be F_ = 2.758828. It was also found that \bar = 1.651, \sum_^n e_j^= 7.490558, \operatorname = 0.5761968 and \sum_^n (x_j - \bar)^2 = 693.3726. Then, to predict the mean mass of all women of a particular height, the following Working–Hotelling–Scheffé band has been derived: : \hat_i \in \left -39.06 + 61.27 x_i \pm \sqrt\right which results in the graph on the left.


Comparison with other methods

The Working–Hotelling approach may give tighter or looser confidence limits compared to the
Bonferroni correction In statistics, the Bonferroni correction is a method to counteract the multiple comparisons problem. Background The method is named for its use of the Bonferroni inequalities. An extension of the method to confidence intervals was proposed by Oliv ...
. In general, for small families of statements, the Bonferroni bounds may be tighter, but when the number of estimated values increases, the Working–Hotelling procedure will yield narrower limits. This is because the confidence level of Working–Hotelling–Scheffé bounds is exactly 1 - \alpha when ''all'' values of the independent variables, i.e. x_i \in \mathbb, are considered. Alternatively, from an algebraic perspective, the critical value \pm \sqrt remains constant as the number estimates of increases, whereas the corresponding values in Bonferonni estimates, \pm t_, will be increasingly divergent as the number g of estimates increases. Therefore, the Working–Hotelling method is more suited for large-scale comparisons, whereas Bonferroni is preferred if only a few mean responses are to be estimated. In practice, both methods are usually used first and the narrower interval chosen.Neter, Wasserman and Kutner, pp. 244–245 Another alternative to the Working–Hotelling–Scheffé band is the Gavarian band, which is used when a confidence band is needed that maintains equal widths at all levels.Miller (1966), pp. 123–127 The Working–Hotelling procedure is based on the same principles as
Scheffé's method In statistics, Scheffé's method, named after the American statistician Henry Scheffé, is a method for adjusting significance levels in a linear regression analysis to account for multiple comparisons. It is particularly useful in analysis of ...
, which gives family confidence intervals for all possible contrasts.Westfall, Tobias and Wolfinger, pp. 277–280 Their proofs are almost identical. This is because both methods estimate linear combinations of mean response at all factor levels. However, the Working–Hotelling procedure does not deal with contrasts but with different levels of the independent variable, so there is no requirement that the coefficients of the parameters sum up to zero. Therefore, it has one more degree of freedom.


See also

*
Multiple comparisons In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values. The more inferences ...


Footnotes


Bibliography

* * * * * * {{DEFAULTSORT:Working-Hotelling procedure Multiple comparisons Regression analysis