Dynamic Discrete Choice
   HOME

TheInfoList



OR:

Dynamic discrete choice (DDC) models, also known as discrete choice models of
dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. I ...
, model an agent's choices over discrete options that have future implications. Rather than assuming observed choices are the result of static utility maximization, observed choices in DDC models are assumed to result from an agent's maximization of the
present value In economics and finance, present value (PV), also known as present discounted value, is the value of an expected income stream determined as of the date of valuation. The present value is usually less than the future value because money has inte ...
of utility, generalizing the
utility theory As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosopher ...
upon which
discrete choice In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Such ...
models are based. The goal of DDC methods is to estimate the structural parameters of the agent's decision process. Once these parameters are known, the researcher can then use the estimates to simulate how the agent would behave in a counterfactual state of the world. (For example, how a prospective college student's enrollment decision would change in response to a tuition increase.)


Mathematical representation

Agent n's maximization problem can be written mathematically as follows: : V\left(x_\right)=\max_ \mathbb \left(\sum_^T \sum_^J \beta^ \left(d_=i\right)U_ \left(x_, \varepsilon_\right)\right), where * x_ are
state variables A state variable is one of the set of variables that are used to describe the mathematical "state" of a dynamical system. Intuitively, the state of a system describes enough about the system to determine its future behaviour in the absence of a ...
, with x_ the agent's
initial condition In mathematics and particularly in dynamic systems, an initial condition, in some contexts called a seed value, is a value of an evolving variable at some point in time designated as the initial time (typically denoted ''t'' = 0). For ...
* d_ represents n's decision from among J discrete alternatives * \beta \in \left(0,1\right) is the
discount factor Discounting is a financial mechanism in which a debtor obtains the right to delay payments to a creditor, for a defined period of time, in exchange for a charge or fee.See "Time Value", "Discount", "Discount Yield", "Compound Interest", "Efficient ...
* U_ is the flow utility n receives from choosing alternative i in period t, and depends on both the state x_ and unobserved factors \varepsilon_ * T is the
time horizon Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component quantity of various measurements used to sequence events, to co ...
* The expectation \mathbb\left(\cdot\right) is taken over both the x_'s and \varepsilon_'s in U_. That is, the agent is uncertain about future transitions in the states, and is also uncertain about future realizations of unobserved factors.


Simplifying assumptions and notation

It is standard to impose the following simplifying assumptions and notation of the dynamic decision problem: ;1. Flow utility is additively separable and linear in parameters The flow utility can be written as an additive sum, consisting of deterministic and stochastic elements. The deterministic component can be written as a linear function of the structural parameters. : \begin U_\left(x_,\varepsilon_\right) &&\; = \;&& u_ &&\; + \;&& \varepsilon_ \\ &&\; = \;&& X_\alpha_ &&\; + \;&& \varepsilon_ \end ;2. The optimization problem can be written as a
Bellman equation A Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. It writes the "value" of a decision problem at a certain point in time ...
Define by V_(x_) the ''ex ante'' value function for individual n in period t just before \varepsilon_ is revealed: : V_(x_) = \mathbb \max_i \left\ where the expectation operator \mathbb is over the \varepsilon's, and where dF\left(x_ \mid x_t \right) represents the probability distribution over x_ conditional on x_. The expectation over state transitions is accomplished by taking the integral over this probability distribution. It is possible to decompose V_(x_) into deterministic and stochastic components: : V_(x_) = \mathbb \max_i \left\ where v_ is the value to choosing alternative i at time t and is written as : v_(x_) = u_\left(x_\right) + \beta \int_ \mathbb \max_ \left\ \, dF(x_ \mid x_t) where now the expectation \mathbb is taken over the \varepsilon_. ;3. The optimization problem follows a Markov decision process The states x_ follow a
Markov chain A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
. That is, attainment of state x_ depends only on the state x_ and not x_ or any prior state.


Conditional value functions and choice probabilities

The value function in the previous section is called the conditional value function, because it is the value function conditional on choosing alternative i in period t. Writing the conditional value function in this way is useful in constructing formulas for the choice probabilities. To write down the choice probabilities, the researcher must make an assumption about the distribution of the \varepsilon_'s. As in static discrete choice models, this distribution can be assumed to be
iid In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independence (probability theory), ...
Type I extreme value, generalized extreme value,
multinomial probit In statistics and econometrics, the multinomial probit model is a generalization of the probit model used when there are several possible categories that the dependent variable can fall into. As such, it is an alternative to the multinomial log ...
, or
mixed logit Mixed is the past tense of ''mix''. Mixed may refer to: * Mixed (United Kingdom ethnicity category), an ethnicity category that has been used by the United Kingdom's Office for National Statistics since the 1991 Census * ''Mixed'' (album), a c ...
. For the case where \varepsilon_ is multinomial logit (i.e. drawn
iid In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independence (probability theory), ...
from the
Type I extreme value distribution In probability theory and statistics, the Gumbel distribution (also known as the type-I generalized extreme value distribution) is used to model the distribution of the maximum (or the minimum) of a number of samples of various distributions. Th ...
), the formulas for the choice probabilities would be: : P_ = \frac


Estimation

Estimation of dynamic discrete choice models is particularly challenging, due to the fact that the researcher must solve the backwards recursion problem for each guess of the structural parameters. The most common methods used to estimate the structural parameters are
maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...
and method of simulated moments. Aside from estimation methods, there are also solution methods. Different solution methods can be employed due to complexity of the problem. These can be divided into full-solution methods and non-solution methods.


Full-solution methods

The foremost example of a full-solution method is the nested fixed point (NFXP) algorithm developed by
John Rust John Philip Rust (born May 23, 1955) is an American economics, economist and econometrics, econometrician. John Rust received his PhD from Massachusetts Institute of Technology, MIT in 1983 and taught at the University of Wisconsin, Yale Universi ...
in 1987. The NFXP algorithm is described in great detail in its documentation manual. A recent work by Che-Lin Su and
Kenneth Judd Kenneth Lewis Judd (born March 24, 1953) is a computational economist at Stanford University, where he is the Paul H. Bauer Senior Fellow at the Hoover Institution. He received his PhD in economics from the University of Wisconsin in 1980. He ...
in 2012 implements another approach (dismissed as intractable by Rust in 1987), which uses
constrained optimization In mathematical optimization, constrained optimization (in some contexts called constraint optimization) is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The obj ...
of the likelihood function, a special case of
mathematical programming with equilibrium constraints Mathematical programming with equilibrium constraints (MPEC) is the study of constrained optimization problems where the constraints include variational inequalities or complementarities. MPEC is related to the Stackelberg game. MPEC is used ...
(MPEC). Specifically, the likelihood function is maximized subject to the constraints imposed by the model, and expressed in terms of the additional variables that describe the model's structure. This approach requires powerful optimization software such as
Artelys Knitro Artelys Knitro is a commercial software package for solving large scale nonlinear mathematical optimization problems. KNITRO – (the original solver name) short for "Nonlinear Interior point Trust Region Optimization" (the "K" is silent) – was ...
because of the high dimensionality of the optimization problem. Once it is solved, both the structural parameters that maximize the likelihood, and the solution of the model are found. In the later article Rust and coauthors show that the speed advantage of MPEC compared to NFXP is not significant. Yet, because the computations required by MPEC do not rely on the structure of the model, its implementation is much less labor intensive. Despite numerous contenders, the NFXP maximum likelihood estimator remains the leading estimation method for Markov decision models.


Non-solution methods

An alternative to full-solution methods is non-solution methods. In this case, the researcher can estimate the structural parameters without having to fully solve the backwards recursion problem for each parameter guess. Non-solution methods are typically faster while requiring more assumptions, but the additional assumptions are in many cases realistic. The leading non-solution method is conditional choice probabilities, developed by V. Joseph Hotz and Robert A. Miller.


Examples


Bus engine replacement model

The bus engine replacement model developed in the seminal paper is one of the first dynamic stochastic models of discrete choice estimated using real data, and continues to serve as classical example of the problems of this type. The model is a simple regenerative
optimal stopping In mathematics, the theory of optimal stopping or early stopping : (For French translation, secover storyin the July issue of ''Pour la Science'' (2009).) is concerned with the problem of choosing a time to take a particular action, in order to ...
stochastic dynamic problem faced by the decision maker, Harold Zurcher, superintendent of maintenance at the Madison Metropolitan Bus Company in
Madison, Wisconsin Madison is the county seat of Dane County and the capital city of the U.S. state of Wisconsin. As of the 2020 census the population was 269,840, making it the second-largest city in Wisconsin by population, after Milwaukee, and the 80th-lar ...
. For every
bus A bus (contracted from omnibus, with variants multibus, motorbus, autobus, etc.) is a road vehicle that carries significantly more passengers than an average car or van. It is most commonly used in public transport, but is also in use for cha ...
in operation in each time period Harold Zurcher has to decide whether to replace the
engine An engine or motor is a machine designed to convert one or more forms of energy into mechanical energy. Available energy sources include potential energy (e.g. energy of the Earth's gravitational field as exploited in hydroelectric power gen ...
and bear the associated replacement cost, or to continue operating the bus at an ever raising cost of operation, which includes insurance and the cost of lost ridership in the case of a breakdown. Let x_t denote the
odometer An odometer or odograph is an instrument used for measuring the distance traveled by a vehicle, such as a bicycle or car. The device may be electronic, mechanical, or a combination of the two (electromechanical). The noun derives from ancient Gr ...
reading (mileage) at period t, c(x_t,\theta) cost of operating the bus which depends on the vector of parameters \theta, RC cost of replacing the engine, and \beta the
discount factor Discounting is a financial mechanism in which a debtor obtains the right to delay payments to a creditor, for a defined period of time, in exchange for a charge or fee.See "Time Value", "Discount", "Discount Yield", "Compound Interest", "Efficient ...
. Then the per-period utility is given by : U(x_t,\xi_t,d,\theta)= \begin -c(x_t,\theta) + \xi_, & \\ -RC-c(0,\theta) + \xi_, & \end = u(x_t,d,\theta) + \begin \xi_, & \textrm\;\; d=\text, \\ \xi_, & \textrm\;\; d=\text, \end where d denotes the decision (keep or replace) and \xi_ and \xi_ represent the component of the utility observed by Harold Zurcher, but not John Rust. It is assumed that \xi_ and \xi_ are independent and identically distributed with the
Type I extreme value distribution In probability theory and statistics, the Gumbel distribution (also known as the type-I generalized extreme value distribution) is used to model the distribution of the maximum (or the minimum) of a number of samples of various distributions. Th ...
, and that \xi_ are independent of \xi_ conditional on x_t. Then the optimal decisions satisfy the
Bellman equation A Bellman equation, named after Richard E. Bellman, is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. It writes the "value" of a decision problem at a certain point in time ...
: V(x,\xi,\theta) = \max_ \left\ where p(dx'\mid x,d,\theta) and q(d\xi'\mid x',\theta) are respectively transition densities for the observed and unobserved states variables. Time indices in the Bellman equation are dropped because the model is formulated in the infinite horizon settings, the unknown optimal policy is stationary, i.e. independent of time. Given the distributional assumption on q(d\xi'\mid x',\theta), the probability of particular choice d is given by : P(d\mid x,\theta) = \frac where EV(x,d,\theta) is a unique solution to the
functional equation In mathematics, a functional equation is, in the broadest meaning, an equation in which one or several functions appear as unknowns. So, differential equations and integral equations are functional equations. However, a more restricted meaning ...
: EV(x,d,\theta)= \int \left \log\left( \sum_ \exp\\right) \rightp(x'\mid x,d,\theta). It can be shown that the latter functional equation defines a
contraction mapping In mathematics, a contraction mapping, or contraction or contractor, on a metric space (''M'', ''d'') is a function ''f'' from ''M'' to itself, with the property that there is some real number 0 \leq k < 1 such that for all ''x'' and ...
if the state space x_t is bounded, so there will be a unique solution EV(x,d,\theta) for any \theta, and further the implicit function theorem holds, so EV(x,d,\theta) is also a
smooth function In mathematical analysis, the smoothness of a function (mathematics), function is a property measured by the number of Continuous function, continuous Derivative (mathematics), derivatives it has over some domain, called ''differentiability cl ...
of \theta for each (x,d).


Estimation with nested fixed point algorithm

The contraction mapping above can be solved numerically for the fixed point EV(x,d,\theta) that yields choice probabilities P(d\mid x,\theta) for any given value of \theta. The
log-likelihood The likelihood function (often simply called the likelihood) represents the probability of random variable realizations conditional on particular values of the statistical parameters. Thus, when evaluated on a given sample, the likelihood functi ...
function can then be formulated as : L(\theta) = \sum_^N \sum_^ \log(P(d_\mid x_,\theta))+\log(p(x_\mid x_,d_,\theta)), where x_ and d_ represent data on state variables (odometer readings) and decision (keep or replace) for i=1,\dots,N individual buses, each in t=1,\dots,T_i periods. The joint algorithm for solving the fixed point problem given a particular value of parameter \theta and maximizing the log-likelihood L(\theta) with respect to \theta was named by John Rust ''nested fixed point algorithm'' (NFXP). Rust's implementation of the nested fixed point algorithm is highly optimized for this problem, using Newton–Kantorovich iterations to calculate P(d\mid x,\theta) and
quasi-Newton method Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. ...
s, such as the
Berndt–Hall–Hall–Hausman algorithm The Berndt–Hall–Hall–Hausman (BHHH) algorithm is a numerical optimization algorithm similar to the Newton–Raphson algorithm, but it replaces the observed negative Hessian matrix with the outer product of the gradient. This approximation is ...
, for likelihood maximization.


Estimation with MPEC

In the nested fixed point algorithm, P(d\mid x,\theta) is recalculated for each guess of the parameters . The MPEC method instead solves the
constrained optimization In mathematical optimization, constrained optimization (in some contexts called constraint optimization) is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The obj ...
problem: : \begin \max & \qquad L(\theta) & \\ \text & \qquad EV(x,d,\theta)= \int \left \log\left( \sum_ \exp\\right) \rightp(x'\mid x,d,\theta) \end This method is faster to compute than non-optimized implementations of the nested fixed point algorithm, and takes about as long as highly optimized implementations.


Estimation with non-solution methods

The conditional choice probabilities method of Hotz and Miller can be applied in this setting. Hotz, Miller, Sanders, and Smith proposed a computationally simpler version of the method, and tested it on a study of the bus engine replacement problem. The method works by estimating conditional choice probabilities using
simulation A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of Conceptual model, models; the model represents the key characteristics or behaviors of the selected system or proc ...
, then backing out the implied differences in
value function The value function of an optimization problem gives the value attained by the objective function at a solution, while only depending on the parameters of the problem. In a controlled dynamical system, the value function represents the optimal payof ...
s.


See also

*
Inverse reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...


References


Further reading

* * * * {{cite book , last=Rust , first=John , title=Handbook of Econometrics , volume=4 , pages=3081–3143 , chapter=Chapter 51 Structural estimation of markov decision processes , publisher=Elsevier , year=1994 , isbn=978-0-444-88766-5 , issn=1573-4412 , doi=10.1016/s1573-4412(05)80020-0 Choice modelling Economics models Mathematical and quantitative methods (economics) Dynamic programming