Dynamic discrete choice (DDC) models, also known as discrete choice models of

dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. I ...

, model an agent's choices over discrete options that have future implications. Rather than assuming observed choices are the result of static utility maximization, observed choices in DDC models are assumed to result from an agent's maximization of the

present value In economics and finance, present value (PV), also known as present discounted value, is the value of an expected income stream determined as of the date of valuation. The present value is usually less than the future value because money has in ...

of utility, generalizing the

utility theory As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosophe ...

upon which

discrete choice In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Su ...

models are based. The goal of DDC methods is to estimate the structural parameters of the agent's decision process. Once these parameters are known, the researcher can then use the estimates to simulate how the agent would behave in a counterfactual state of the world. (For example, how a prospective college student's enrollment decision would change in response to a tuition increase.)

Mathematical representation

Agent

n

's maximization problem can be written mathematically as follows: :

V\left(x_\right)=\max_ \mathbb \left(\sum_^T \sum_^J \beta^ \left(d_=i\right)U_ \left(x_, \varepsilon_\right)\right),

where *

x_

are

state variables A state variable is one of the set of variables that are used to describe the mathematical "state" of a dynamical system. Intuitively, the state of a system describes enough about the system to determine its future behaviour in the absence of a ...

, with

x_

the agent's initial condition *

d_

represents

n

's decision from among

J

discrete alternatives *

\beta \in \left(0,1\right)

is the

discount factor Discounting is a financial mechanism in which a debtor obtains the right to delay payments to a creditor, for a defined period of time, in exchange for a charge or fee.See "Time Value", "Discount", "Discount Yield", "Compound Interest", "Efficient ...

U_

is the flow utility

n

receives from choosing alternative

i

in period

t

, and depends on both the state

x_

and unobserved factors

\varepsilon_

T

is the

time horizon Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component quantity of various measurements used to sequence events, to c ...

* The expectation

\mathbb\left(\cdot\right)

is taken over both the

x_

's and

\varepsilon_

's in

U_

. That is, the agent is uncertain about future transitions in the states, and is also uncertain about future realizations of unobserved factors.

Simplifying assumptions and notation

It is standard to impose the following simplifying assumptions and notation of the dynamic decision problem: ;1. Flow utility is additively separable and linear in parameters The flow utility can be written as an additive sum, consisting of deterministic and stochastic elements. The deterministic component can be written as a linear function of the structural parameters. :

\begin
U_\left(x_,\varepsilon_\right) &&\; = \;&& u_          &&\; + \;&& \varepsilon_ \\
                                             &&\; = \;&& X_\alpha_ &&\; + \;&& \varepsilon_
\end

;2. The optimization problem can be written as a

Bellman equation A Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. It writes the "value" of a decision problem at a certain point in time ...

Define by

V_(x_)

the ''ex ante'' value function for individual

n

in period

t

just before

\varepsilon_

is revealed: :

V_(x_) = \mathbb \max_i \left\

where the expectation operator

\mathbb

is over the

\varepsilon

's, and where

dF\left(x_ \mid x_t \right)

represents the probability distribution over

x_

conditional on

x_

. The expectation over state transitions is accomplished by taking the integral over this probability distribution. It is possible to decompose

V_(x_)

into deterministic and stochastic components: :

V_(x_) = \mathbb \max_i \left\

where

v_

is the value to choosing alternative

i

at time

t

and is written as :

v_(x_) = u_\left(x_\right) + \beta \int_ \mathbb \max_ \left\ \, dF(x_ \mid x_t)

where now the expectation

\mathbb

is taken over the

\varepsilon_

. ;3. The optimization problem follows a Markov decision process The states

x_

follow a

Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happen ...

. That is, attainment of state

x_

depends only on the state

x_

and not

x_

or any prior state.

Conditional value functions and choice probabilities

The value function in the previous section is called the conditional value function, because it is the value function conditional on choosing alternative

i

in period

t

. Writing the conditional value function in this way is useful in constructing formulas for the choice probabilities. To write down the choice probabilities, the researcher must make an assumption about the distribution of the

\varepsilon_

's. As in static discrete choice models, this distribution can be assumed to be iid Type I extreme value, generalized extreme value,

multinomial probit In statistics and econometrics, the multinomial probit model is a generalization of the probit model used when there are several possible categories that the dependent variable can fall into. As such, it is an alternative to the multinomial logi ...

, or

mixed logit Mixed is the past tense of ''mix''. Mixed may refer to: * Mixed (United Kingdom ethnicity category), an ethnicity category that has been used by the United Kingdom's Office for National Statistics since the 1991 Census * ''Mixed'' (album), a co ...

. For the case where

\varepsilon_

is multinomial logit (i.e. drawn iid from the

Type I extreme value distribution In probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

), the formulas for the choice probabilities would be: :

P_ = \frac

Estimation

Estimation of dynamic discrete choice models is particularly challenging, due to the fact that the researcher must solve the backwards recursion problem for each guess of the structural parameters. The most common methods used to estimate the structural parameters are

maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed stati ...

and

method of simulated moments In econometrics, the method of simulated moments (MSM) (also called simulated method of moments) is a structural estimation technique introduced by Daniel McFadden. It extends the generalized method of moments to cases where theoretical moment f ...

. Aside from estimation methods, there are also solution methods. Different solution methods can be employed due to complexity of the problem. These can be divided into full-solution methods and non-solution methods.

Full-solution methods

The foremost example of a full-solution method is the nested fixed point (NFXP) algorithm developed by

John Rust John Philip Rust (born May 23, 1955) is an American economist and econometrician. John Rust received his PhD from MIT in 1983 and taught at the University of Wisconsin, Yale University and University of Maryland before joining Georgetown U ...

in 1987. The NFXP algorithm is described in great detail in its documentation manual. A recent work by Che-Lin Su and

Kenneth Judd Kenneth Lewis Judd (born March 24, 1953) is a computational economist at Stanford University, where he is the Paul H. Bauer Senior Fellow at the Hoover Institution. He received his PhD in economics from the University of Wisconsin in 1980. H ...

in 2012 implements another approach (dismissed as intractable by Rust in 1987), which uses

constrained optimization In mathematical optimization, constrained optimization (in some contexts called constraint optimization) is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The obj ...

of the likelihood function, a special case of

mathematical programming with equilibrium constraints Mathematical programming with equilibrium constraints (MPEC) is the study of constrained optimization problems where the constraints include variational inequalities or complementarities. MPEC is related to the Stackelberg game. MPEC is used ...

(MPEC). Specifically, the likelihood function is maximized subject to the constraints imposed by the model, and expressed in terms of the additional variables that describe the model's structure. This approach requires powerful optimization software such as

Artelys Knitro Artelys Knitro is a commercial software package for solving large scale nonlinear mathematical optimization problems. KNITRO – (the original solver name) short for "Nonlinear Interior point Trust Region Optimization" (the "K" is silent) – wa ...

because of the high dimensionality of the optimization problem. Once it is solved, both the structural parameters that maximize the likelihood, and the solution of the model are found. In the later article Rust and coauthors show that the speed advantage of MPEC compared to NFXP is not significant. Yet, because the computations required by MPEC do not rely on the structure of the model, its implementation is much less labor intensive. Despite numerous contenders, the NFXP maximum likelihood estimator remains the leading estimation method for Markov decision models.

Non-solution methods

An alternative to full-solution methods is non-solution methods. In this case, the researcher can estimate the structural parameters without having to fully solve the backwards recursion problem for each parameter guess. Non-solution methods are typically faster while requiring more assumptions, but the additional assumptions are in many cases realistic. The leading non-solution method is conditional choice probabilities, developed by V. Joseph Hotz and Robert A. Miller.

Examples

Bus engine replacement model

The bus engine replacement model developed in the seminal paper is one of the first dynamic stochastic models of discrete choice estimated using real data, and continues to serve as classical example of the problems of this type. The model is a simple regenerative

optimal stopping In mathematics, the theory of optimal stopping or early stopping : (For French translation, secover storyin the July issue of ''Pour la Science'' (2009).) is concerned with the problem of choosing a time to take a particular action, in order to ...

stochastic dynamic problem faced by the decision maker, Harold Zurcher, superintendent of maintenance at the Madison Metropolitan Bus Company in

Madison, Wisconsin Madison is the county seat of Dane County, Wisconsin, Dane County and the capital city of the U.S. state of Wisconsin. As of the 2020 United States Census, 2020 census the population was 269,840, making it the second-largest city in Wisconsin b ...

. For every

bus A bus (contracted from omnibus, with variants multibus, motorbus, autobus, etc.) is a road vehicle that carries significantly more passengers than an average car or van. It is most commonly used in public transport, but is also in use for cha ...

in operation in each time period Harold Zurcher has to decide whether to replace the

engine An engine or motor is a machine designed to convert one or more forms of energy into mechanical energy. Available energy sources include potential energy (e.g. energy of the Earth's gravitational field as exploited in hydroelectric power ...

and bear the associated replacement cost, or to continue operating the bus at an ever raising cost of operation, which includes insurance and the cost of lost ridership in the case of a breakdown. Let

x_t

denote the

odometer An odometer or odograph is an instrument used for measuring the distance traveled by a vehicle, such as a bicycle or car. The device may be electronic, mechanical, or a combination of the two ( electromechanical). The noun derives from ancient G ...

reading (mileage) at period

t

c(x_t,\theta)

cost of operating the bus which depends on the vector of parameters

\theta

RC

cost of replacing the engine, and

\beta

the

. Then the per-period utility is given by :

U(x_t,\xi_t,d,\theta)= 
\begin
-c(x_t,\theta) + \xi_, & \\
-RC-c(0,\theta) + \xi_, &
\end
=
u(x_t,d,\theta) +
\begin
\xi_, & \textrm\;\; d=\text, \\
\xi_, & \textrm\;\; d=\text,
\end

where

d

denotes the decision (keep or replace) and

\xi_

and

\xi_

represent the component of the utility observed by Harold Zurcher, but not John Rust. It is assumed that

\xi_

and

\xi_

are independent and identically distributed with the

, and that

\xi_

are independent of

\xi_

conditional on

x_t

. Then the optimal decisions satisfy the

V(x,\xi,\theta) = \max_ \left\

where

p(dx'\mid x,d,\theta)

and

q(d\xi'\mid x',\theta)

are respectively transition densities for the observed and unobserved states variables. Time indices in the Bellman equation are dropped because the model is formulated in the infinite horizon settings, the unknown optimal policy is

stationary In addition to its common meaning, stationary may have the following specialized scientific meanings: Mathematics * Stationary point * Stationary process * Stationary state Meteorology * A stationary front is a weather front that is not moving ...

, i.e. independent of time. Given the distributional assumption on

q(d\xi'\mid x',\theta)

, the probability of particular choice

d

is given by :

P(d\mid x,\theta) =  \frac

where

EV(x,d,\theta)

is a unique solution to the

functional equation In mathematics, a functional equation is, in the broadest meaning, an equation in which one or several functions appear as unknowns. So, differential equations and integral equations are functional equations. However, a more restricted mea ...

p(x'\mid x,d,\theta).

It can be shown that the latter functional equation defines a

contraction mapping In mathematics, a contraction mapping, or contraction or contractor, on a metric space (''M'', ''d'') is a function ''f'' from ''M'' to itself, with the property that there is some real number 0 \leq k < 1 such that for all ''x'' an ...

if the state space

x_t

is bounded, so there will be a unique solution

EV(x,d,\theta)

for any

\theta

, and further the implicit function theorem holds, so

EV(x,d,\theta)

is also a

smooth function In mathematical analysis, the smoothness of a function is a property measured by the number of continuous derivatives it has over some domain, called ''differentiability class''. At the very minimum, a function could be considered smooth if ...

\theta

for each

(x,d)

Estimation with nested fixed point algorithm

The contraction mapping above can be solved numerically for the fixed point

EV(x,d,\theta)

that yields choice probabilities

P(d\mid x,\theta)

for any given value of

\theta

. The

log-likelihood The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood funct ...

function can then be formulated as :

L(\theta) = \sum_^N \sum_^ \log(P(d_\mid x_,\theta))+\log(p(x_\mid x_,d_,\theta)),

where

x_

and

d_

represent data on state variables (odometer readings) and decision (keep or replace) for

i=1,\dots,N

individual buses, each in

t=1,\dots,T_i

periods. The joint algorithm for solving the fixed point problem given a particular value of parameter

\theta

and maximizing the log-likelihood

L(\theta)

with respect to

\theta

was named by John Rust ''nested fixed point algorithm'' (NFXP). Rust's implementation of the nested fixed point algorithm is highly optimized for this problem, using Newton–Kantorovich iterations to calculate

P(d\mid x,\theta)

and

quasi-Newton method Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration ...

s, such as the

Berndt–Hall–Hall–Hausman algorithm The Berndt–Hall–Hall–Hausman (BHHH) algorithm is a numerical optimization algorithm similar to the Newton–Raphson algorithm, but it replaces the observed negative Hessian matrix with the outer product of the gradient. This approximation is b ...

, for likelihood maximization.

Estimation with MPEC

In the nested fixed point algorithm,

P(d\mid x,\theta)

is recalculated for each guess of the parameters . The MPEC method instead solves the

problem: :

p(x'\mid x,d,\theta) \end

This method is faster to compute than non-optimized implementations of the nested fixed point algorithm, and takes about as long as highly optimized implementations.

Estimation with non-solution methods

The conditional choice probabilities method of Hotz and Miller can be applied in this setting. Hotz, Miller, Sanders, and Smith proposed a computationally simpler version of the method, and tested it on a study of the bus engine replacement problem. The method works by estimating conditional choice probabilities using

simulation A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of models; the model represents the key characteristics or behaviors of the selected system or process, whereas the ...

, then backing out the implied differences in

value function The value function of an optimization problem gives the value attained by the objective function at a solution, while only depending on the parameters of the problem. In a controlled dynamical system, the value function represents the optimal payo ...

Mathematical representation

Simplifying assumptions and notation

Conditional value functions and choice probabilities

Estimation

Full-solution methods

Non-solution methods

Examples

Bus engine replacement model

Estimation with nested fixed point algorithm

Estimation with MPEC

Estimation with non-solution methods

See also

References

Further reading