Stan is a
probabilistic programming language for
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properti ...
written in
C++.
[Stan Development Team. 2015]
Stan Modeling Language User's Guide and Reference Manual, Version 2.9.0
/ref> The Stan language is used to specify a (Bayesian) statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form ...
with an imperative program calculating the log probability density function.
Stan is licensed under the New BSD License
BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lice ...
. Stan is named in honour of Stanislaw Ulam, pioneer of the Monte Carlo method.
Stan was created by a development team consisting of 34 members that includes Andrew Gelman
Andrew Eric Gelman (born February 11, 1965) is an American statistician and professor of statistics and political science at Columbia University.
Gelman received bachelor of science degrees in mathematics and in physics from MIT, where he w ...
, Bob Carpenter, Matt Hoffman, and Daniel Lee.
Interfaces
The Stan language itself can be accessed through several interfaces:
* CmdStan – a command-line executable for the shell,
* CmdStanR and rstan – R software libraries,
* CmdStanPy and PyStan – libraries for the Python programming language,
* MatlabStan – integration with the MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
numerical computing environment,
* Stan.jl – integration with the Julia programming language
Julia is a high-level, dynamic programming language. Its features are well suited for numerical analysis and computational science.
Distinctive aspects of Julia's design include a type system with parametric polymorphism in a dynamic program ...
,
* StataStan – integration with Stata
Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers in many fie ...
.
In addition, higher-level interfaces are provided with packages using Stan as backend, primarily in the R language:
* ''rstanarm'' provides a drop-in replacement for frequentist models provided by base R and ''lme4'' using the R formula syntax;
* ''brms'' provides a wide array of linear and nonlinear models using the R formula syntax;
* ''prophet'' provides automated procedures for time series forecasting.
Algorithms
Stan implements gradient-based Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference, stochastic, gradient-based variational Bayesian methods for approximate Bayesian inference, and gradient-based optimization for penalized maximum likelihood estimation.
* MCMC algorithms:
** Hamiltonian Monte Carlo The Hamiltonian Monte Carlo algorithm (originally known as hybrid Monte Carlo) is a Markov chain Monte Carlo method for obtaining a sequence of random samples which converge to being distributed according to a target probability distribution for ...
(HMC)
** No-U-Turn sampler (NUTS), a variant of HMC and Stan's default MCMC engine
* Variational inference algorithms:
** Automatic Differentiation Variational Inference
* Optimization algorithms:
** Limited-memory BFGS
Limited-memory BFGS (L-BFGS or LM-BFGS) is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular a ...
(Stan's default optimization algorithm)
** Broyden–Fletcher–Goldfarb–Shanno algorithm
In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems. Like the related Davidon–Fletcher–Powell method, BFGS determines the ...
** Laplace's method
In mathematics, Laplace's method, named after Pierre-Simon Laplace, is a technique used to approximate integrals of the form
:\int_a^b e^ \, dx,
where f(x) is a twice- differentiable function, ''M'' is a large number, and the endpoints ''a'' ...
for classical standard error estimates and approximate Bayesian posteriors
Automatic differentiation
Stan implements reverse-mode automatic differentiation to calculate gradients of the model, which is required by HMC, NUTS, L-BFGS, BFGS, and variational inference. The automatic differentiation within Stan can be used outside of the probabilistic programming language.
Usage
Stan is used in fields including social science, pharmaceutical statistics, market research
Market research is an organized effort to gather information about target markets and customers: know about them, starting with who they are. It is an important component of business strategy and a major factor in maintaining competitiveness. Ma ...
, and medical imaging.
References
Further reading
*
* Gelman, Andrew, Daniel Lee, and Jiqiang Guo (2015).
Stan: A probabilistic programming language for Bayesian inference and optimization
Journal of Educational and Behavioral Statistics.
* Hoffman, Matthew D., Bob Carpenter, and Andrew Gelman (2012)
Stan, scalable software for Bayesian modeling
, Proceedings of the NIPS Workshop on Probabilistic Programming.
External links
Stan web site
Stan source
a Git repository hosted on GitHub
GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, cont ...
{{Statistical software
Computational statistics
Free Bayesian statistics software
Monte Carlo software
Numerical programming languages
Domain-specific programming languages
Probabilistic software