HOME

TheInfoList



OR:

In
statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
, semiparametric regression includes regression models that combine parametric and
nonparametric Nonparametric statistics is the branch of statistics that is not based solely on Statistical parameter, parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based ...
models. They are often used in situations where the fully nonparametric model may not perform well or when the researcher wants to use a parametric model but the functional form with respect to a subset of the regressors or the density of the errors is not known. Semiparametric regression models are a particular type of
semiparametric model In statistics, a semiparametric model is a statistical model that has parametric and nonparametric components. A statistical model is a parameterized family of distributions: \ indexed by a parameter \theta. * A parametric model is a model i ...
ling and, since semiparametric models contain a parametric component, they rely on parametric assumptions and may be misspecified and
inconsistent In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consistent ...
, just like a fully parametric model.


Methods

Many different semiparametric regression methods have been proposed and developed. The most popular methods are the partially linear, index and varying coefficient models.


Partially linear models

A
partially linear model A partially linear model is a form of semiparametric model In statistics, a semiparametric model is a statistical model that has parametric and nonparametric components. A statistical model is a parameterized family of distributions: \ index ...
is given by : Y_i = X'_i \beta + g\left(Z_i \right) + u_i, \, \quad i = 1,\ldots,n, \, where Y_ is the dependent variable, X_ is a p \times 1 vector of explanatory variables, \beta is a p \times 1 vector of unknown parameters and Z_ \in \operatorname^ . The parametric part of the partially linear model is given by the parameter vector \beta while the nonparametric part is the unknown function g\left(Z_\right) . The data is assumed to be i.i.d. with E\left(u_, X_,Z_\right) = 0 and the model allows for a conditionally
heteroskedastic In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The ...
error process E\left(u^_, x,z\right) = \sigma^\left(x,z\right) of unknown form. This type of model was proposed by Robinson (1988) and extended to handle categorical covariates by Racine and Li (2007). This method is implemented by obtaining a \sqrt consistent estimator of \beta and then deriving an estimator of g\left(Z_\right) from the
nonparametric regression Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. That is, no parametric form is assumed for the relationship ...
of Y_ - X'_\hat on z using an appropriate nonparametric regression method.See Li and Racine (2007) for an in-depth look at nonparametric regression methods.


Index models

A single index model takes the form : Y = g\left(X'\beta_\right) + u, \, where Y , X and \beta_ are defined as earlier and the error term u satisfies E\left(u, X\right) = 0 . The single index model takes its name from the parametric part of the model x'\beta which is a ''scalar'' single index. The nonparametric part is the unknown function g\left(\cdot\right) .


Ichimura's method

The single index model method developed by Ichimura (1993) is as follows. Consider the situation in which y is continuous. Given a known form for the function g\left(\cdot\right) , \beta_ could be estimated using the
nonlinear least squares Non-linear least squares is the form of least squares analysis used to fit a set of ''m'' observations with a model that is non-linear in ''n'' unknown parameters (''m'' ≥ ''n''). It is used in some forms of nonlinear regression. The ...
method to minimize the function : \sum_ \left(Y_i - g\left(X'_i \beta\right)\right)^2. Since the functional form of g\left(\cdot\right) is not known, we need to estimate it. For a given value for \beta an estimate of the function : G\left(X'_i \beta \right) = E\left(Y_i , X'_i \beta\right) = E\left X'_i \beta\right using
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learn ...
method. Ichimura (1993) proposes estimating g\left(X'_\beta\right) with : \hat_\left(X'_i \beta\right),\, the leave-one-out nonparametric kernel estimator of G\left(X'_\beta\right) .


Klein and Spady's estimator

If the dependent variable y is binary and X_ and u_ are assumed to be
independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...
, Klein and Spady (1993) propose a technique for estimating \beta using
maximum likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stat ...
methods. The log-likelihood function is given by : L\left(\beta\right) = \sum_i \left(1-Y_i\right)\ln\left(1-\hat_\left(X'_i\beta\right)\right) + \sum_Y_i\ln\left(\hat_\left(X'_i \beta\right)\right), where \hat_\left(X'_\beta\right) is the leave-one-out estimator.


Smooth coefficient/varying coefficient models

Hastie and Tibshirani (1993) propose a smooth coefficient model given by : Y_i = \alpha\left(Z_i\right) + X'_i\beta\left(Z_i\right) + u_i = \left(1 + X'_i\right)\left(\begin \alpha\left(Z_i\right) \\ \beta\left(Z_i\right) \end\right) + u_i = W'_i\gamma\left(Z_i\right) + u_i, where X_ is a k \times 1 vector and \beta\left(z\right) is a vector of unspecified smooth functions of z . \gamma\left(\cdot\right) may be expressed as : \gamma\left(Z_i\right) = \left(E\left Z_i \rightright)^E\left Z_i\right


See also

*
Nonparametric regression Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. That is, no parametric form is assumed for the relationship ...
* Effective degree of freedom


Notes


References

* * * * * * {{Statistics, correlation Nonparametric statistics