PyMC (formerly known as PyMC3) is a
probabilistic programming language written in
Python. It can be used for Bayesian statistical modeling and probabilistic machine learning.
PyMC performs inference based on advanced Markov chain Monte Carlo and/or variational fitting algorithms.
It is a rewrite from scratch of the previous version of the PyMC software.
Unlike PyMC2, which had used
Fortran extensions for performing computations, PyMC relies on PyTensor, a Python library that allows defining, optimizing, and efficiently evaluating mathematical expressions involving multi-dimensional arrays.
From version 3.8 PyMC relies on
ArviZ to handle plotting, diagnostics, and statistical checks. PyMC and
Stan are the two most popular
probabilistic programming
Probabilistic programming (PP) is a programming paradigm based on the declarative specification of probabilistic models, for which inference is performed automatically.
Probabilistic programming attempts to unify probabilistic modeling and trad ...
tools.
PyMC is an
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
project, developed by the community and has been fiscally sponsored by
NumFOCUS.
PyMC has been used to solve inference problems in several scientific domains, including
astronomy
Astronomy is a natural science that studies celestial objects and the phenomena that occur in the cosmos. It uses mathematics, physics, and chemistry in order to explain their origin and their overall evolution. Objects of interest includ ...
,
epidemiology
Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and Risk factor (epidemiology), determinants of health and disease conditions in a defined population, and application of this knowledge to prevent dise ...
,
molecular biology,
crystallography,
chemistry
Chemistry is the scientific study of the properties and behavior of matter. It is a physical science within the natural sciences that studies the chemical elements that make up matter and chemical compound, compounds made of atoms, molecules a ...
,
ecology
and psychology.
Previous versions of PyMC were also used widely, for example in
climate science,
public health,
neuroscience
Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions, and its disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, ...
,
and parasitology.
After
Theano announced plans to discontinue development in 2017, the PyMC team evaluated
TensorFlow Probability as a computational backend, but decided in 2020 to
fork
In cutlery or kitchenware, a fork (from 'pitchfork') is a utensil, now usually made of metal, whose long handle terminates in a head that branches into several narrow and often slightly curved tines with which one can spear foods either to h ...
Theano under the name Aesara.
Large parts of the Theano codebase have been refactored and compilation through
JAX and
Numba
Numba is an open-source JIT compiler that translates a subset of Python and NumPy into fast machine code using LLVM, via the llvmlite Python package. It offers a range of options for parallelising Python code for CPUs and GPUs, often with onl ...
were added.
The PyMC team has released the revised computational backend under the name PyTensor and continues the development of PyMC.
Inference engines
PyMC implements non-gradient-based and gradient-based
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that ...
(MCMC) algorithms for Bayesian inference and stochastic, gradient-based
variational Bayesian methods
Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables (usually ...
for approximate Bayesian inference.
* MCMC-based algorithms:
** No-U-Turn sampler
(NUTS), a variant of
Hamiltonian Monte Carlo
The Hamiltonian Monte Carlo algorithm (originally known as hybrid Monte Carlo) is a Markov chain Monte Carlo method for obtaining a sequence of random samples whose distribution converges to a target probability distribution that is difficult to ...
and PyMC's default engine for continuous variables
**
Metropolis–Hastings, PyMC's default engine for discrete variables
** Sequential Monte Carlo for static posteriors
** Sequential Monte Carlo for
approximate Bayesian computation
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.
In all model-based statistical inference, the likel ...
*
Variational inference
Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables (usually ...
algorithms:
** Black-box Variational Inference
See also
*
Stan is a probabilistic programming language for statistical inference written in C++
*
ArviZ a Python library for exploratory analysis of Bayesian models
*
Bambi
''Bambi'' is a 1942 American Animated film, animated Coming of age, coming-of-age drama film produced by Walt Disney Productions and released by RKO Radio Pictures. Loosely based on Felix Salten's 1923 novel ''Bambi, a Life in the Woods'', the ...
is a high-level Bayesian model-building interface based on PyMC
References
Further reading
*
External links
PyMC websitePyMC source a
Git
Git () is a distributed version control system that tracks versions of files. It is often used to control source code by programmers who are developing software collaboratively.
Design goals of Git include speed, data integrity, and suppor ...
repository hosted on
GitHub
GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
PyTensoris a Python library for defining, optimizing, and efficiently evaluating mathematical expressions involving multi-dimensional arrays.
{{Statistical software
Computational statistics
Free Bayesian statistics software
Monte Carlo software
Numerical programming languages
Probabilistic software
Python (programming language) scientific libraries