T-statistics
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, the ''t''-statistic is the ratio of the departure of the estimated value of a parameter from its hypothesized value to its
standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
. It is used in
hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
via Student's ''t''-test. The ''t''-statistic is used in a ''t''-test to determine whether to support or reject the null hypothesis. It is very similar to the
z-score In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean ...
but with the difference that ''t''-statistic is used when the sample size is small or the population standard deviation is unknown. For example, the ''t''-statistic is used in estimating the
population mean In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypothe ...
from a
sampling distribution In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given random-sample-based statistic. If an arbitrarily large number of samples, each involving multiple observations (data points), were s ...
of
sample mean The sample mean (or "empirical mean") and the sample covariance are statistics computed from a Sample (statistics), sample of data on one or more random variables. The sample mean is the average value (or mean, mean value) of a sample (statistic ...
s if the population
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
is unknown. It is also used along with
p-value In null-hypothesis significance testing, the ''p''-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small ''p''-value means ...
when running hypothesis tests where the p-value tells us what the odds are of the results to have happened.


Definition and features

Let \hat\beta be an
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...
of parameter ''β'' in some
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repres ...
. Then a ''t''-statistic for this parameter is any quantity of the form : t_ = \frac, where ''β''0 is a non-random, known constant, which may or may not match the actual unknown parameter value ''β'', and \operatorname(\hat\beta) is the
standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...
of the estimator \hat\beta for ''β''. By default, statistical packages report ''t''-statistic with (these ''t''-statistics are used to test the significance of corresponding regressor). However, when ''t''-statistic is needed to test the hypothesis of the form , then a non-zero ''β''0 may be used. If \hat\beta is an
ordinary least squares In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the prin ...
estimator in the classical
linear regression model In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
(that is, with normally distributed and
homoscedastic In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The s ...
error terms), and if the true value of the parameter ''β'' is equal to ''β''0, then the
sampling distribution In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given random-sample-based statistic. If an arbitrarily large number of samples, each involving multiple observations (data points), were s ...
of the ''t''-statistic is the Student's ''t''-distribution with degrees of freedom, where ''n'' is the number of observations, and ''k'' is the number of regressors (including the intercept). In the majority of models, the estimator \hat\beta is
consistent In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent i ...
for ''β'' and is distributed asymptotically normally. If the true value of the parameter ''β'' is equal to ''β''0, and the quantity \operatorname(\hat\beta) correctly estimates the asymptotic variance of this estimator, then the ''t''-statistic will asymptotically have the standard normal distribution. In some models the distribution of the ''t''-statistic is different from the normal distribution, even asymptotically. For example, when a
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
with a
unit root In probability theory and statistics, a unit root is a feature of some stochastic processes (such as random walks) that can cause problems in statistical inference involving time series models. A linear stochastic process has a unit root if 1 is ...
is regressed in the
augmented Dickey–Fuller test In statistics, an augmented Dickey–Fuller test (ADF) tests the null hypothesis that a unit root is present in a time series sample. The alternative hypothesis is different depending on which version of the test is used, but is usually stationari ...
, the test ''t''-statistic will asymptotically have one of the Dickey–Fuller distributions (depending on the test setting).


Use

Most frequently, ''t'' statistics are used in Student's ''t''-tests, a form of
statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
, and in the computation of certain
confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
s. The key property of the ''t'' statistic is that it is a
pivotal quantity In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters (including nuisance parameters). A pivot quantity need ...
– while defined in terms of the sample mean, its sampling distribution does not depend on the population parameters, and thus it can be used regardless of what these may be. One can also divide a residual by the sample
standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...
: : g(x,X) = \frac to compute an estimate for the number of standard deviations a given sample is from the mean, as a sample version of a
z-score In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean ...
, the z-score requiring the population parameters.


Prediction

Given a normal distribution N(\mu, \sigma^2) with unknown mean and variance, the ''t''-statistic of a future observation X_, after one has made ''n'' observations, is an
ancillary statistic An ancillary statistic is a measure of a sample whose distribution (or whose pmf or pdf) does not depend on the parameters of the model. An ancillary statistic is a pivotal quantity that is also a statistic. Ancillary statistics can be used to ...
– a pivotal quantity (does not depend on the values of ''μ'' and ''σ''2) that is a statistic (computed from observations). This allows one to compute a frequentist
prediction interval In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are o ...
(a predictive
confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
), via the following t-distribution: : \frac \sim T^. Solving for X_ yields the prediction distribution : \overline_n + s_n \sqrt \cdot T^, from which one may compute predictive confidence intervals – given a probability ''p'', one may compute intervals such that 100''p''% of the time, the next observation X_ will fall in that interval.


History

The term "''t''-statistic" is abbreviated from "hypothesis test statistic". In statistics, the t-distribution was first derived as a
posterior distribution The posterior probability is a type of conditional probability that results from updating the prior probability with information summarized by the likelihood via an application of Bayes' rule. From an epistemological perspective, the posterior p ...
in 1876 by
Helmert Friedrich Robert Helmert (31 July 1843 – 15 June 1917) was a German geodesist and statistician with important contributions to the theory of errors. Career Helmert was born in Freiberg, Kingdom of Saxony. After schooling in Freiberg and D ...
and Lüroth. The t-distribution also appeared in a more general form as Pearson Type IV distribution in
Karl Pearson Karl Pearson (; born Carl Pearson; 27 March 1857 – 27 April 1936) was an English mathematician and biostatistician. He has been credited with establishing the discipline of mathematical statistics. He founded the world's first university st ...
's 1895 paper. However, the T-Distribution, also known as
Student's T Distribution In probability and statistics, Student's ''t''-distribution (or simply the ''t''-distribution) is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in situa ...
gets its name from
William Sealy Gosset William Sealy Gosset (13 June 1876 – 16 October 1937) was an English statistician, chemist and brewer who served as Head Brewer of Guinness and Head Experimental Brewer of Guinness and was a pioneer of modern statistics. He pioneered small sa ...
who was first to published the result in English in his 1908 paper titled "The Probable Error of a Mean" (in
Biometrika ''Biometrika'' is a peer-reviewed scientific journal published by Oxford University Press for thBiometrika Trust The editor-in-chief is Paul Fearnhead (Lancaster University). The principal focus of this journal is theoretical statistics. It was es ...
) using his pseudonym "Student" because his employer preferred their staff to use pen names when publishing scientific papers instead of their real name, so he used the name "Student" to hide his identity. Gosset worked at the
Guinness Brewery St. James's Gate Brewery is a brewery founded in 1759 in Dublin, Ireland, by Arthur Guinness. The company is now a part of Diageo, a company formed from the merger of Guinness and Grand Metropolitan in 1997. The main product of the brewery is ...
in
Dublin Dublin (; , or ) is the capital and largest city of Republic of Ireland, Ireland. On a bay at the mouth of the River Liffey, it is in the Provinces of Ireland, province of Leinster, bordered on the south by the Dublin Mountains, a part of th ...
,
Ireland Ireland ( ; ga, Éire ; Ulster Scots dialect, Ulster-Scots: ) is an island in the Atlantic Ocean, North Atlantic Ocean, in Northwestern Europe, north-western Europe. It is separated from Great Britain to its east by the North Channel (Grea ...
, and was interested in the problems of small samples – for example, the chemical properties of barley where sample sizes might be as few as 3. Hence a second version of the etymology of the term Student is that Guinness did not want their competitors to know that they were using the t-test to determine the quality of raw material. Although it was William Gosset after whom the term "Student" is penned, it was actually through the work of
Ronald Fisher Sir Ronald Aylmer Fisher (17 February 1890 – 29 July 1962) was a British polymath who was active as a mathematician, statistician, biologist, geneticist, and academic. For his work in statistics, he has been described as "a genius who a ...
that the distribution became well known as "Student's distribution" and "
Student's t-test A ''t''-test is any statistical hypothesis test in which the test statistic follows a Student's ''t''-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of ...
"


Related concepts

* ''z''-score (standardization): If the population parameters are known, then rather than computing the t-statistic, one can compute the z-score; analogously, rather than using a ''t''-test, one uses a ''z''-test. This is rare outside of
standardized testing A standardized test is a test that is administered and scored in a consistent, or "standard", manner. Standardized tests are designed in such a way that the questions and interpretations are consistent and are administered and scored in a predete ...
. *
Studentized residual In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. It is a form of a Student's ''t''-statistic, with the estimate of error varying between points. This is ...
: In
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
, the standard errors of the estimators at different data points vary (compare the middle versus endpoints of a
simple linear regression In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the ''x'' and ...
), and thus one must divide the different residuals by different estimates for the error, yielding what are called
studentized residual In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. It is a form of a Student's ''t''-statistic, with the estimate of error varying between points. This is ...
s.


See also

* ''F''-test * ''t''2-statistic *
Student's T-Distribution In probability and statistics, Student's ''t''-distribution (or simply the ''t''-distribution) is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in sit ...
*
Student's t-test A ''t''-test is any statistical hypothesis test in which the test statistic follows a Student's ''t''-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of ...
*
Hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
*
Folded-t and half-t distributions In statistics, the folded-''t'' and half-''t'' distributions are derived from Student's ''t''-distribution by taking the absolute values of variates. This is analogous to the folded-normal and the half-normal statistical distributions being deri ...
*
Chi-squared distribution In probability theory and statistics, the chi-squared distribution (also chi-square or \chi^2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. The chi-squa ...


References

{{DEFAULTSORT:T-statistic Statistical ratios Parametric statistics Normal distribution