Truncated Dependent Variable
   HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, truncation results in values that are limited above or below, resulting in a truncated sample. A random variable y is said to be truncated from below if, for some threshold value c, the exact value of y is known for all cases y > c, but unknown for all cases y \leq c. Similarly, truncation from above means the exact value of y is known in cases where y < c, but unknown when y \geq c. Truncation is similar to but distinct from the concept of statistical censoring. A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the bounds entirely omitted, with not even a count of those omitted being kept. With statistical censoring, a note would be recorded documenting which bound (upper or lower) had been exceeded and the value of that bound. With truncated sampling, no note is recorded.


Applications

Usually the values that
insurance adjuster A claims adjuster, desk adjuster, field adjuster, or general adjuster (claim adjuster, claims handler, claim handler or loss adjuster in the United Kingdom, Ireland, Australia, South Africa, the Caribbean and New Zealand) investigates insurance cl ...
s receive are either left-truncated, right-censored, or both. For example, if policyholders are subject to a policy limit ''u'', then any loss amounts that are actually above ''u'' are reported to the insurance company as being exactly ''u'' because ''u'' is the amount the
insurance company Insurance is a means of protection from financial loss in which, in exchange for a fee, a party agrees to compensate another party in the event of a certain loss, damage, or injury. It is a form of risk management, primarily used to hedge ...
pays. The insurer knows that the actual loss is greater than ''u'' but they don't know what it is. On the other hand, left truncation occurs when policyholders are subject to a deductible. If policyholders are subject to a deductible ''d'', any loss amount that is less than ''d'' will not even be reported to the insurance company. If there is a claim on a policy limit of ''u'' and a deductible of ''d'', any loss amount that is greater than ''u'' will be reported to the insurance company as a loss of u-d because that is the amount the insurance company has to pay. Therefore, insurance loss data is left-truncated because the insurance company doesn't know if there are values below the deductible ''d'' because policyholders won't make a claim. The insurance loss is also right-censored if the loss is greater than ''u'' because ''u'' is the most the insurance company will pay. Thus, it only knows that your claim is greater than ''u'', not the exact claim amount.


Probability distributions

Truncation can be applied to any
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
. This will usually lead to a new distribution, not one within the same family. Thus, if a random variable ''X'' has ''F''(''x'') as its distribution function, the new random variable ''Y'' defined as having the distribution of ''X'' truncated to the semi-open interval (''a'', ''b''] has the distribution function :F_Y(y)=\frac \, for ''y'' in the interval (''a'', ''b''], and 0 or 1 otherwise. If truncation were to the closed interval 'a'', ''b'' the distribution function would be :F_Y(y)=\frac \, for ''y'' in the interval 'a'', ''b'' and 0 or 1 otherwise.


Data analysis

The analysis of data where observations are treated as being from truncated versions of standard distributions can be undertaken using
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimation theory, estimating the Statistical parameter, parameters of an assumed probability distribution, given some observed data. This is achieved by Mathematical optimization, ...
, where the likelihood would be derived from the distribution or density of the truncated distribution. This involves taking account of the factor in the modified density function which will depend on the parameters of the original distribution. In practice, if the fraction truncated is very small the effect of truncation might be ignored when analysing data. For example, it is common to use a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
to model data whose values can only be positive but for which the typical range of values is well away from zero. In such cases, a truncated or censored version of the normal distribution may formally be preferable (although there would be alternatives); there would be very little change in results from the more complicated analysis. However, software is readily available for maximum-likelihood estimation of even moderately complicated models, such as
regression models Regression or regressions may refer to: Science * Marine regression, coastal advance due to falling sea level, the opposite of marine transgression * Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...
, for truncated data. In
econometrics Econometrics is the application of Statistics, statistical methods to economic data in order to give Empirical evidence, empirical content to economic relationships.M. Hashem Pesaran (1987). "Econometrics," ''The New Palgrave: A Dictionary of ...
, ''truncated dependent variables'' are variables for which observations cannot be made for certain values in some range. Regression models with such dependent variables require special care that properly recognizes the truncated nature of the variable. Estimation of such
truncated regression model Truncated regression models are a class of models in which the sample has been truncated for certain ranges of the dependent variable. That means observations with values in the dependent variable below or above certain thresholds are systematic ...
can be done in parametric, or semi- and non-parametric frameworks.{{cite journal , last1=Park , first1=B. U. , first2=L. , last2=Simar , first3=V. , last3=Zelenyuk , year=2008 , title=Local Likelihood Estimation of Truncated Regression and its Partial Derivatives: Theory and Application , journal=
Journal of Econometrics The ''Journal of Econometrics'' is a scholarly journal in econometrics. It was first published in 1973. Its current managing editors are Serena Ng and Elie Tamer, Torben Andersen and Xiaohong Chen serve as editors. The journal publishes work deal ...
, volume=146 , issue=1 , pages=185–198 , doi=10.1016/j.jeconom.2008.08.007 , url=https://hal.archives-ouvertes.fr/hal-00520650/file/PEER_stage2_10.1016%252Fj.jeconom.2008.08.007.pdf


See also

*
Censoring (statistics) In statistics, censoring is a condition in which the value of a measurement or observation is only partially known. For example, suppose a study is conducted to measure the impact of a drug on mortality rate. In such a study, it may be known tha ...
*
Trimmed estimator In statistics, a trimmed estimator is an estimator derived from another estimator by excluding some of the extreme values, a process called truncation. This is generally done to obtain a more robust statistic, and the extreme values are considered o ...
*
Truncated distribution In statistics, a truncated distribution is a conditional distribution that results from restricting the domain of some other probability distribution. Truncated distributions arise in practical statistics in cases where the ability to record, or e ...
*
Truncated mean A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, an ...


References

Statistical data types Theory of probability distributions