Differentiable programming is a
programming paradigm
Programming paradigms are a way to classify programming languages based on their features. Languages can be classified into multiple paradigms.
Some paradigms are concerned mainly with implications for the execution model of the language, suc ...
in which a numeric computer program can be
differentiated throughout via
automatic differentiation
In mathematics and computer algebra, automatic differentiation (AD), also called algorithmic differentiation, computational differentiation, auto-differentiation, or simply autodiff, is a set of techniques to evaluate the derivative of a function ...
.
This allows for
gradient-based optimization of parameters in the program, often via
gradient descent
In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
, as well as other learning approaches that are based on higher order derivative information. Differentiable programming has found use in a wide variety of areas, particularly
scientific computing
Computational science, also known as scientific computing or scientific computation (SC), is a field in mathematics that uses advanced computing capabilities to understand and solve complex problems. It is an area of science that spans many disc ...
and
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
.
One of the early proposals to adopt such a framework in a systematic fashion to improve upon learning algorithms was made by the
Advanced Concepts Team
The Advanced Concepts Team is a group of scientists, researchers and young graduates that perform multidisciplinary research within the European Space Agency. Located at the European Space Research and Technology Centre, in the Netherlands, th ...
at the
European Space Agency
, owners =
, headquarters = Paris, Île-de-France, France
, coordinates =
, spaceport = Guiana Space Centre
, seal = File:ESA emblem seal.png
, seal_size = 130px
, image = Views in the Main Control Room (1205 ...
in early 2016.
Approaches
Most differentiable programming frameworks work by constructing a graph containing the control flow and
data structures
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
in the program.
Attempts generally fall into two groups:
* Static,
compiled
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
graph-based approaches such as
TensorFlow
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learnin ...
,
[TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.] Theano
In Greek mythology, Theano (; Ancient Greek: Θεανώ) may refer to the following personages:
*Theano, wife of Metapontus, king of Icaria. Metapontus demanded that she bear him children, or leave the kingdom. She presented the children of Melan ...
, and
MXNet. They tend to allow for good
compiler optimization
In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power con ...
and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving
loops or
recursion
Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
), as well as making it harder for users to reason effectively about their programs.
A proof of concept compiler toolchain called Myia uses a subset of Python as a front end and supports higher-order functions, recursion, and higher-order derivatives.
*
Operator overloading
In computer programming, operator overloading, sometimes termed ''operator ad hoc polymorphism'', is a specific case of polymorphism, where different operators have different implementations depending on their arguments. Operator overloading is ...
, dynamic graph based approaches such as
PyTorch
PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open ...
and
AutoGrad. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to
interpreter overhead (particularly when composing many small operations), poorer scalability, and reduced benefit from compiler optimization.
A package for the
Julia
Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
programming languag
Zygoteworks directly on Julia's
intermediate representation
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
, allowing it to still be
optimized by Julia's just-in-time compiler.
A limitation of earlier approaches is that they are only able to differentiate code written in a suitable manner for the framework, limiting their interoperability with other programs. Newer approaches resolve this issue by constructing the graph from the language's syntax or IR, allowing arbitrary code to be differentiated.
Applications
Differentiable programming has been applied in areas such as combining
deep learning
Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.
De ...
with
physics engines
A physics engine is computer software that provides an approximate simulation of certain physical systems, such as rigid body dynamics (including collision detection), soft body dynamics, and fluid dynamics, of use in the domains of computer gr ...
in
robotics
Robotics is an interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist humans. Robotics integrat ...
, solving electronic structure problems with differentiable
density functional theory
Density-functional theory (DFT) is a computational quantum mechanical modelling method used in physics, chemistry and materials science to investigate the electronic structure (or nuclear structure) (principally the ground state) of many-body ...
,
differentiable
ray tracing,
image processing
An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
,
and
probabilistic programming
Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically.
It represents an attempt to unify probabilistic modeling and traditional general pur ...
.
See also
*
Differentiable function
In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non-vertical tangent line at each interior point in its ...
*
Machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
Notes
References
{{Differentiable computing
Differential calculus
Programming paradigms