In the field of
mathematical optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
, stochastic programming is a framework for
modeling optimization problems that involve
uncertainty
Uncertainty refers to epistemic situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown. Uncertainty arises in partially observable or ...
. A stochastic program is an optimization problem in which some or all problem parameters are uncertain, but follow known
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
s. This framework contrasts with deterministic optimization, in which all problem parameters are assumed to be known exactly. The goal of stochastic programming is to find a decision which both optimizes some criteria chosen by the decision maker, and appropriately accounts for the uncertainty of the problem parameters. Because many real-world decisions involve uncertainty, stochastic programming has found applications in a broad range of areas ranging from
finance
Finance is the study and discipline of money, currency and capital assets. It is related to, but not synonymous with economics, the study of production, distribution, and consumption of money, assets, goods and services (the discipline of fina ...
to
transportation
Transport (in British English), or transportation (in American English), is the intentional movement of humans, animals, and goods from one location to another. Modes of transport include air, land (rail and road), water, cable, pipeline, ...
to energy optimization.
Two-stage problems
The basic idea of two-stage stochastic programming is that (optimal) decisions should be based on data available at the time the decisions are made and cannot depend on future observations. The two-stage formulation is widely used in stochastic programming. The general formulation of a two-stage stochastic programming problem is given by:
where
is the optimal value of the second-stage problem
The classical two-stage linear stochastic programming problems can be formulated as
where
is the optimal value of the second-stage problem
In such formulation
is the first-stage decision variable vector,
is the second-stage decision variable vector, and
contains the data of the second-stage problem. In this formulation, at the first stage we have to make a "here-and-now" decision
before the realization of the uncertain data
, viewed as a random vector, is known. At the second stage, after a realization of
becomes available, we optimize our behavior by solving an appropriate optimization problem.
At the first stage we optimize (minimize in the above formulation) the cost
of the first-stage decision plus the expected cost of the (optimal) second-stage decision. We can view the second-stage problem simply as an optimization problem which describes our supposedly optimal behavior when the uncertain data is revealed, or we can consider its solution as a recourse action where the term
compensates for a possible inconsistency of the system
and
is the cost of this recourse action.
The considered two-stage problem is ''linear'' because the objective functions and the constraints are linear. Conceptually this is not essential and one can consider more general two-stage stochastic programs. For example, if the first-stage problem is integer, one could add integrality constraints to the first-stage problem so that the feasible set is discrete. Non-linear objectives and constraints could also be incorporated if needed.
Distributional assumption
The formulation of the above two-stage problem assumes that the second-stage data
is modeled as a random vector with a ''known'' probability distribution. This would be justified in many situations. For example, the distribution of
could be inferred from historical data if one assumes that the distribution does not significantly change over the considered period of time. Also, the empirical distribution of the sample could be used as an approximation to the distribution of the future values of
. If one has a prior model for
, one could obtain a posteriori distribution by a Bayesian update.
Discretization
To solve the two-stage stochastic problem numerically, one often needs to assume that the random vector
has a finite number of possible realizations, called ''scenarios'', say
, with respective probability masses
. Then the expectation in the first-stage problem's objective function can be written as the summation:
and, moreover, the two-stage problem can be formulated as one large linear programming problem (this is called the deterministic equivalent of the original problem, see section ).
When
has an infinite (or very large) number of possible realizations the standard approach is then to represent this distribution by scenarios. This approach raises three questions, namely:
# How to construct scenarios, see ;
# How to solve the deterministic equivalent. Optimizers such as
CPLEX, and
GLPK
The GNU Linear Programming Kit (GLPK) is a software package intended for solving large-scale linear programming (LP), mixed integer programming (MIP), and other related problems. It is a set of routines written in ANSI C and organized in the form ...
can solve large linear/nonlinear problems. The NEOS Server,
hosted at the
University of Wisconsin, Madison
A university () is an institution of higher (or tertiary) education and research which awards academic degrees in several academic disciplines. Universities typically offer both undergraduate and postgraduate programs. In the United States, t ...
, allows free access to many modern solvers. The structure of a deterministic equivalent is particularly amenable to apply decomposition methods, such as
Benders' decomposition or scenario decomposition;
# How to measure quality of the obtained solution with respect to the "true" optimum.
These questions are not independent. For example, the number of scenarios constructed will affect both the tractability of the deterministic equivalent and the quality of the obtained solutions.
Stochastic linear programming
A stochastic
linear program
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear relationships. Linear programming is ...
is a specific instance of the classical two-stage stochastic program. A stochastic LP is built from a collection of multi-period linear programs (LPs), each having the same structure but somewhat different data. The
two-period LP, representing the
scenario, may be regarded as having the following form:
The vectors
and
contain the first-period variables, whose values must be chosen immediately. The vector
contains all of the variables for subsequent periods. The constraints
involve only first-period variables and are the same in every scenario. The other constraints involve variables of later periods and differ in some respects from scenario to scenario, reflecting uncertainty about the future.
Note that solving the
two-period LP is equivalent to assuming the
scenario in the second period with no uncertainty. In order to incorporate uncertainties in the second stage, one should assign probabilities to different scenarios and solve the corresponding deterministic equivalent.
Deterministic equivalent of a stochastic problem
With a finite number of scenarios, two-stage stochastic linear programs can be modelled as large linear programming problems. This formulation is often called the deterministic equivalent linear program, or abbreviated to deterministic equivalent. (Strictly speaking a deterministic equivalent is any mathematical program that can be used to compute the optimal first-stage decision, so these will exist for continuous probability distributions as well, when one can represent the second-stage cost in some closed form.)
For example, to form the deterministic equivalent to the above stochastic linear program, we assign a probability
to each scenario
. Then we can minimize the expected value of the objective, subject to the constraints from all scenarios:
We have a different vector
of later-period variables for each scenario
. The first-period variables
and
are the same in every scenario, however, because we must make a decision for the first period before we know which scenario will be realized. As a result, the constraints involving just
and
need only be specified once, while the remaining constraints must be given separately for each scenario.
Scenario construction
In practice it might be possible to construct scenarios by eliciting experts' opinions on the future. The number of constructed scenarios should be relatively modest so that the obtained deterministic equivalent can be solved with reasonable computational effort. It is often claimed that a solution that is optimal using only a few scenarios provides more adaptable plans than one that assumes a single scenario only. In some cases such a claim could be verified by a simulation. In theory some measures of guarantee that an obtained solution solves the original problem with reasonable accuracy. Typically in applications only the ''first stage'' optimal solution
has a practical value since almost always a "true" realization of the random data will be different from the set of constructed (generated) scenarios.
Suppose
contains
independent random components, each of which has three possible realizations (for example, future realizations of each random parameters are classified as low, medium and high), then the total number of scenarios is
. Such ''exponential growth'' of the number of scenarios makes model development using expert opinion very difficult even for reasonable size
. The situation becomes even worse if some random components of
have continuous distributions.
Monte Carlo sampling and Sample Average Approximation (SAA) Method
A common approach to reduce the scenario set to a manageable size is by using Monte Carlo simulation. Suppose the total number of scenarios is very large or even infinite. Suppose further that we can generate a sample
of
replications of the random vector
. Usually the sample is assumed to be
independent and identically distributed (i.i.d sample). Given a sample, the expectation function