HOME

TheInfoList



OR:

The Generalized Additive Model for Location, Scale and Shape (GAMLSS) is an approach to
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, ...
ling and learning. GAMLSS is a modern distribution-based approach to ( semiparametric) regression. A parametric distribution is assumed for the response (target) variable but the parameters of this distribution can vary according to explanatory variables using linear, nonlinear or smooth functions. In
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
parlance, GAMLSS is a form of supervised machine learning. In particular, the GAMLSS statistical framework enables flexible regression and smoothing models to be fitted to the data. The GAMLSS model assumes the response variable has any parametric distribution which might be heavy or light-tailed, and positively or negatively skewed. In addition, all the parameters of the distribution ocation (e.g., mean), scale (e.g., variance) and shape (skewness and kurtosis)can be modeled as linear, nonlinear or smooth functions of explanatory variables.


Overview of the model

The generalized additive model for location, scale and shape (GAMLSS) is a statistical model developed by Rigby and Stasinopoulos (and later expanded) to overcome some of the limitations associated with the popular
generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
s (GLMs) and
generalized additive model In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear response variable depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth fun ...
s (GAMs). For an overview of these limitations see Nelder and Wedderburn (1972) and Hastie's and Tibshirani's book. In GAMLSS the
exponential family In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate ...
distribution Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a varia ...
assumption for the
response variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
, (y), (essential in GLMs and
GAMs Gams is a municipality in the ''Wahlkreis'' (constituency) of Werdenberg in the canton of St. Gallen in Switzerland. History Gams is first mentioned in 835 as ''Campesias''. In 1210 it was mentioned as ''Chames'', in 1236 as ''Gamps''. Unt ...
), is relaxed and replaced by a general distribution family, including highly
skew Skew may refer to: In mathematics * Skew lines, neither parallel nor intersecting. * Skew normal distribution, a probability distribution * Skew field or division ring * Skew-Hermitian matrix * Skew lattice * Skew polygon, whose vertices do not l ...
and/or kurtotic
continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous g ...
and
discrete distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
s. The systematic part of the model is expanded to allow modeling not only of the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
(or
location In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
) but other parameters of the distribution of ''y'' as linear and/or nonlinear, parametric and/or additive
non-parametric Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distr ...
functions of
explanatory variable Dependent and independent variables are variables in mathematical modeling, statistical modeling and experimental sciences. Dependent variables receive this name because, in an experiment, their values are studied under the supposition or demand ...
s and/or
random effect In statistics, a random effects model, also called a variance components model, is a statistical model where the model parameters are random variables. It is a kind of hierarchical linear model, which assumes that the data being analysed are d ...
s. GAMLSS is especially suited for modelling a
leptokurtic In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtos ...
or
platykurtic In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real number, real-valued random variable. Like skew ...
and/or positively or negatively skewed response variable. For count type response variable data it deals with
over-dispersion In statistics, overdispersion is the presence of greater variability (statistical dispersion) in a data set than would be expected based on a given statistical model. A common task in applied statistics is choosing a parametric model to fit a g ...
by using proper over-dispersed discrete distributions. Heterogeneity also is dealt with by modeling the
scale Scale or scales may refer to: Mathematics * Scale (descriptive set theory), an object defined on a set of points * Scale (ratio), the ratio of a linear dimension of a model to the corresponding dimension of the original * Scale factor, a number ...
or
shape parameter In probability theory and statistics, a shape parameter (also known as form parameter) is a kind of numerical parameter of a parametric family of probability distributionsEveritt B.S. (2002) Cambridge Dictionary of Statistics. 2nd Edition. CUP. t ...
s using explanatory variables. There are several packages written in R related to GAMLSS models, and tutorials for using and interpreting GAMLSS. A GAMLSS model assumes independent observations y_i for i = 1, 2, \dots , n with probability (density) function f (y_i , \mu_i , \sigma_i , \nu_i , \tau_i ) conditional on (\mu_i , \sigma_i , \nu_i , \tau_i ) a vector of four distribution parameters, each of which can be a function of the explanatory variables. The first two population distribution parameters \mu_i and \sigma_i are usually characterized as location and scale parameters, while the remaining parameter(s), if any, are characterized as shape parameters, e.g.
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimo ...
and
kurtosis In probability theory and statistics, kurtosis (from el, κυρτός, ''kyrtos'' or ''kurtos'', meaning "curved, arching") is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kur ...
parameters, although the model may be applied more generally to the parameters of any population distribution with up to four distribution parameters, and can be generalized to more than four distribution parameters. : \begin g_1 (\mu) = \eta_1= X_1 \beta_1 + \sum_^ _(x_) \\ g_2(\sigma) = \eta_2= X_2 \beta_2 + \sum_^_(x_) \\ g_3(\nu) = \eta_3 = X_3 \beta_3 + \sum_^_(x_) \\ g_4(\tau)=\eta_4=X_4 \beta_4 + \sum_^_(x_) \end where μ, σ, ν, τ and \eta_k are vectors of length n, \beta^_k = (\beta_,\beta_,\ldots,\beta_) is a parameter vector of length J'_k, X_k is a fixed known design matrix of order n \times J'_k and h_ is a smooth non-parametric function of explanatory variable x_, j=1,2,\ldots, J_ and k=1,2,3,4. For centile estimation th
WHO Multicentre Growth Reference Study Group
have recommended GAMLSS and the Box–Cox power exponential (BCPE) distributions for the construction of the WHO Child Growth Standards.WHO Multicentre Growth Reference Study Group (2006) WHO Child Growth Standards: Length/height-for-age, weight-for-age, weight-for-length, weight-for-height and body mass index-for-age: Methods and development. Geneva: World Health Organization.


What distributions can be used

The form of the distribution assumed for the response variable y, is very general. For example, an implementation of GAMLSS in R has around 100 different distributions available. Such implementations also allow use of truncated distributions and censored (or interval) response variables.


References


Further reading

* * Cole, T. J., Stanojevic, S., Stocks, J., Coates, A. L., Hankinson, J. L., Wade, A. M. (2009), "Age- and size-related reference ranges: A case study of spirometry through childhood and adulthood", ''Statistics in Medicine'', 28(5), 880–89
Link
* Fenske, N., Fahrmeir, L., Rzehak, P., Hohle, M. (25 September 2008), "Detection of risk factors for obesity in early childhood with quantile regression methods for longitudinal data", ''Department of Statistics: Technical Reports'', No.3
Link
* Hudson, I. L., Kim, S. W., Keatley, M. R. (2010), "Climatic Influences on the Flowering Phenology of Four Eucalypts: A GAMLSS Approach Phenological Research". In ''Phenological Research'', Irene L. Hudson and Marie R. Keatley (eds), Springer Netherland
Link
* Hudson, I. L., Rea, A., Dalrymple, M. L., Eilers, P. H. C. (2008), "Climate impacts on sudden infant death syndrome: a GAMLSS approach", ''Proceedings of the 23rd international workshop on statistical modelling'' pp. 277–280
Link
* * * * Serinaldi, F., Villarini, G., Smith, J. A., Krajewski, W. F. (2008), "Change-Point and Trend Analysis on Annual Maximum Discharge in Continental United States", ''American Geophysical Union Fall Meeting 2008'', abstract #H21A-0803* * * * *


External links


GAMLSS official website gamlss.orgGAMLSS manual (downloadable)Distribution tables in GAMLSSThe GAMLSS packages reference card (downloadable)The booklet for the Utrecht short course on GAMLSS (downloadable)R packages for GAMLSS on CRAN
{{DEFAULTSORT:Generalized Additive Model For Location, Scale And Shape Additive model for location, scale and shape Semi-parametric models