HOME

TheInfoList



OR:

Stan is a probabilistic programming language for
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
written in C++.Stan Development Team. 2015
Stan Modeling Language User's Guide and Reference Manual, Version 2.9.0
/ref> The Stan language is used to specify a (Bayesian)
statistical model A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of Sample (statistics), sample data (and similar data from a larger Statistical population, population). A statistical model repre ...
with an imperative program calculating the log
probability density function In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a Function (mathematics), function whose value at any given sample (or point) in the sample space (the s ...
. Stan is licensed under the New BSD License. Stan is named in honour of
Stanislaw Ulam Stanislav and variants may refer to: People *Stanislav (given name), a Slavic given name with many spelling variations (Stanislaus, Stanislas, Stanisław, etc.) Places * Stanislav, Kherson Oblast, a coastal village in Ukraine * Stanislaus County, ...
, pioneer of the
Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be ...
. Stan was created by a development team consisting of 52 members that includes Andrew Gelman, Bob Carpenter, Daniel Lee, Ben Goodrich, and others.


Example

A simple linear regression model can be described as y_n = \alpha + \beta x_n + \epsilon_n, where \epsilon_n \sim \text (0, \sigma). This can also be expressed as y_n \sim \text(\alpha + \beta X_n, \sigma). The latter form can be written in Stan as the following: data parameters model


Interfaces

The Stan language itself can be accessed through several interfaces: * CmdStan – a command-line executable for the
shell Shell may refer to: Architecture and design * Shell (structure), a thin structure ** Concrete shell, a thin shell of concrete, usually with no interior columns or exterior buttresses Science Biology * Seashell, a hard outer layer of a marine ani ...
, * CmdStanR and rstan – R software libraries, * CmdStanPy and PyStan – libraries for the
Python programming language Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically type-checked and garbage-collected. It supports multiple prog ...
, * CmdStan.rb - library for the Ruby programming language, * MatlabStan – integration with the
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
numerical computing environment, * Stan.jl – integration with the Julia programming language, * StataStan – integration with
Stata Stata (, , alternatively , occasionally stylized as STATA) is a general-purpose Statistics, statistical software package developed by StataCorp for data manipulation, visualization, statistics, and automated reporting. It is used by researchers ...
. * Stan Playground - online a

In addition, higher-level interfaces are provided with packages using Stan as backend, primarily in the R (programming language), R language: * ''rstanarm'' provides a drop-in replacement for frequentist models provided by base R and ''lme4'' using the R formula syntax; * ''brms'' provides a wide array of linear and nonlinear models using the R formula syntax; * ''prophet'' provides automated procedures for
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
forecasting Forecasting is the process of making predictions based on past and present data. Later these can be compared with what actually happens. For example, a company might Estimation, estimate their revenue in the next year, then compare it against the ...
.


Algorithms

Stan implements gradient-based
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...
(MCMC) algorithms for Bayesian inference, stochastic, gradient-based
variational Bayesian methods Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables (usually ...
for approximate Bayesian inference, and gradient-based
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
for penalized maximum likelihood estimation. * MCMC algorithms: ** Hamiltonian Monte Carlo (HMC) ** No-U-Turn sampler (NUTS), a variant of HMC and Stan's default MCMC engine * Variational inference algorithms: ** Automatic Differentiation Variational Inference ** Pathfinder: Parallel quasi-Newton variational inference * Optimization algorithms: ** Limited-memory BFGS (L-BFGS) (Stan's default optimization algorithm) ** Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) **
Laplace's approximation Laplace's approximation provides an analytical expression for a posterior probability distribution by fitting a Gaussian distribution with a mean equal to the MAP solution and precision equal to the observed Fisher information. The approximat ...
for classical standard error estimates and approximate Bayesian posteriors


Automatic differentiation

Stan implements reverse-mode
automatic differentiation In mathematics and computer algebra, automatic differentiation (auto-differentiation, autodiff, or AD), also called algorithmic differentiation, computational differentiation, and differentiation arithmetic Hend Dawood and Nefertiti Megahed (2023) ...
to calculate gradients of the model, which is required by HMC, NUTS, L-BFGS, BFGS, and variational inference. The automatic differentiation within Stan can be used outside of the probabilistic programming language.


Usage

Stan is used in fields including social science, pharmaceutical statistics,
market research Market research is an organized effort to gather information about target markets and customers. It involves understanding who they are and what they need. It is an important component of business strategy and a major factor in maintaining com ...
, and
medical imaging Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to revea ...
.


See also

* PyMC is a probabilistic programming language in Python * ArviZ a Python library for Exploratory Analysis of Bayesian Models


References


Further reading

* * Gelman, Andrew, Daniel Lee, and Jiqiang Guo (2015).
Stan: A probabilistic programming language for Bayesian inference and optimization
Journal of Educational and Behavioral Statistics. * Hoffman, Matthew D., Bob Carpenter, and Andrew Gelman (2012)
Stan, scalable software for Bayesian modeling
, Proceedings of the NIPS Workshop on Probabilistic Programming.


External links


Stan web site

Stan source
a
Git Git () is a distributed version control system that tracks versions of files. It is often used to control source code by programmers who are developing software collaboratively. Design goals of Git include speed, data integrity, and suppor ...
repository hosted on
GitHub GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
{{Statistical software Computational statistics Free Bayesian statistics software Monte Carlo software Numerical programming languages Domain-specific programming languages Probabilistic software