Stan is a
probabilistic programming language for
statistical inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
written in
C++.
[Stan Development Team. 2015]
Stan Modeling Language User's Guide and Reference Manual, Version 2.9.0
/ref> The Stan language is used to specify a (Bayesian) statistical model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
with an imperative program calculating the log probability density function
In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
.
Stan is licensed under the New BSD License. Stan is named in honour of Stanislaw Ulam Stanislav and variants may refer to:
People
*Stanislav (given name), a Slavic given name with many spelling variations (Stanislaus, Stanislas, Stanisław, etc.)
Places
* Stanislav, Kherson Oblast, a coastal village in Ukraine
* Stanislaus County, ...
, pioneer of the Monte Carlo method
Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be ...
.
Stan was created by a development team consisting of 52 members that includes Andrew Gelman, Bob Carpenter, Daniel Lee, Ben Goodrich, and others.
Example
A simple linear regression model can be described as , where . This can also be expressed as . The latter form can be written in Stan as the following:
data
parameters
model
Interfaces
The Stan language itself can be accessed through several interfaces:
* CmdStan – a command-line executable for the shell
Shell may refer to:
Architecture and design
* Shell (structure), a thin structure
** Concrete shell, a thin shell of concrete, usually with no interior columns or exterior buttresses
Science Biology
* Seashell, a hard outer layer of a marine ani ...
,
* CmdStanR and rstan – R software libraries,
* CmdStanPy and PyStan – libraries for the Python programming language
Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
Python is dynamically type-checked and garbage-collected. It supports multiple prog ...
,
* CmdStan.rb - library for the Ruby programming language,
* MatlabStan – integration with the MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
numerical computing environment,
* Stan.jl – integration with the Julia programming language,
* StataStan – integration with Stata
Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose Statistics, statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers ...
.
* Stan Playground - online a
In addition, higher-level interfaces are provided with packages using Stan as backend, primarily in the R (programming language), R language:
* ''rstanarm'' provides a drop-in replacement for frequentist models provided by base R and ''lme4'' using the R formula syntax;
* ''brms'' provides a wide array of linear and nonlinear models using the R formula syntax;
* ''prophet'' provides automated procedures for time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
forecasting
Forecasting is the process of making predictions based on past and present data. Later these can be compared with what actually happens. For example, a company might Estimation, estimate their revenue in the next year, then compare it against the ...
.
Algorithms
Stan implements gradient-based Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...
(MCMC) algorithms for Bayesian inference, stochastic, gradient-based variational Bayesian methods
Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables (usually ...
for approximate Bayesian inference, and gradient-based optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
for penalized maximum likelihood estimation.
* MCMC algorithms:
** Hamiltonian Monte Carlo (HMC)
** No-U-Turn sampler (NUTS), a variant of HMC and Stan's default MCMC engine
* Variational inference algorithms:
** Automatic Differentiation Variational Inference
** Pathfinder: Parallel quasi-Newton variational inference[
]
* Optimization algorithms:
** Limited-memory BFGS (L-BFGS) (Stan's default optimization algorithm)
** Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS)
** Laplace's approximation
Laplace's approximation provides an analytical expression for a posterior probability distribution by fitting a Gaussian distribution with a mean equal to the MAP solution and precision equal to the observed Fisher information. The approximat ...
for classical standard error estimates and approximate Bayesian posteriors
Automatic differentiation
Stan implements reverse-mode automatic differentiation
In mathematics and computer algebra, automatic differentiation (auto-differentiation, autodiff, or AD), also called algorithmic differentiation, computational differentiation, and differentiation arithmetic Hend Dawood and Nefertiti Megahed (2023) ...
to calculate gradients of the model, which is required by HMC, NUTS, L-BFGS, BFGS, and variational inference. The automatic differentiation within Stan can be used outside of the probabilistic programming language.
Usage
Stan is used in fields including social science, pharmaceutical statistics, market research
Market research is an organized effort to gather information about target markets and customers. It involves understanding who they are and what they need. It is an important component of business strategy and a major factor in maintaining com ...
, and medical imaging
Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to revea ...
.
See also
* PyMC is a probabilistic programming language in Python
* ArviZ a Python library for Exploratory Analysis of Bayesian Models
References
Further reading
*
* Gelman, Andrew, Daniel Lee, and Jiqiang Guo (2015).
Stan: A probabilistic programming language for Bayesian inference and optimization
Journal of Educational and Behavioral Statistics.
* Hoffman, Matthew D., Bob Carpenter, and Andrew Gelman (2012)
Stan, scalable software for Bayesian modeling
, Proceedings of the NIPS Workshop on Probabilistic Programming.
External links
Stan web site
Stan source
a Git
Git () is a distributed version control system that tracks versions of files. It is often used to control source code by programmers who are developing software collaboratively.
Design goals of Git include speed, data integrity, and suppor ...
repository hosted on GitHub
GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
{{Statistical software
Computational statistics
Free Bayesian statistics software
Monte Carlo software
Numerical programming languages
Domain-specific programming languages
Probabilistic software