The Lee–Carter model is a numerical algorithm used in
mortality forecasting and
life expectancy
Life expectancy is a statistical measure of the average time an organism is expected to live, based on the year of its birth, current age, and other demographic factors like sex. The most commonly used measure is life expectancy at birth ...
forecasting
Forecasting is the process of making predictions based on past and present data. Later these can be compared (resolved) against what happens. For example, a company might estimate their revenue in the next year, then compare it against the actual ...
. The input to the model is a matrix of age specific
mortality rates
Mortality rate, or death rate, is a measure of the number of deaths (in general, or due to a specific cause) in a particular population, scaled to the size of that population, per unit of time. Mortality rate is typically expressed in units of de ...
ordered monotonically by time, usually with ages in columns and years in rows. The output is a forecasted matrix of mortality rates in the same format as the input.
The model uses
singular value decomposition
In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any \ m \times n\ matrix. It is re ...
(SVD) to find:
* A
univariate
In mathematics, a univariate object is an expression, equation, function or polynomial involving only one variable. Objects involving more than one variable are multivariate. In some cases the distinction between the univariate and multivariate ...
time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
vector
that captures 80–90% of the mortality trend (here the subscript
refers to time),
* A vector
that describes the relative mortality at each age (here the subscript
refers to age), and
* A scaling constant (referred to here as
but unnamed in the literature).
Surprisingly,
is usually linear, implying that gains to life expectancy are fairly constant year after year in most populations. Prior to computing SVD, age specific mortality rates are first transformed into
, by taking their
logarithms
In mathematics, the logarithm is the inverse function to exponentiation. That means the logarithm of a number to the base is the exponent to which must be raised, to produce . For example, since , the ''logarithm base'' 10 of ...
, and then
centering them by subtracting their age-specific means over time. The age-specific mean over time is denoted by
. The subscript
refers to the fact that
spans both age and time.
Many researchers adjust the
vector by fitting it to empirical life expectancies for each year, using the
and
generated with SVD. When adjusted using this approach, changes to
are usually small.
To forecast mortality,
(either adjusted or not) is projected into
future years using an
ARIMA
Arima, officially The Royal Chartered Borough of Arima is the easternmost and second largest in area of the three boroughs of Trinidad and Tobago. It is geographically adjacent to Sangre Grande and Arouca at the south central foothills of th ...
model. The corresponding forecasted
is recovered by multiplying
by
and the first diagonal element of S (when
). The actual mortality rates are recovered by taking exponentials of this vector.
Because of the linearity of
, it is generally modeled as a
random walk
In mathematics, a random walk is a random process that describes a path that consists of a succession of random steps on some mathematical space.
An elementary example of a random walk is the random walk on the integer number line \mathbb Z ...
with trend. Life expectancy and other
life table
In actuarial science and demography, a life table (also called a mortality table or actuarial table) is a table which shows, for each age, what the probability is that a person of that age will die before their next birthday ("probability of death ...
measures can be calculated from this forecasted matrix after adding back the means and taking exponentials to yield regular mortality rates.
In most implementations,
confidence intervals
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...
for the forecasts are generated by simulating multiple mortality forecasts using
Monte Carlo Method
Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...
s. A band of mortality between 5% and 95% percentiles of the simulated results is considered to be a valid forecast. These simulations are done by extending
into the future using randomization based on the
standard error
The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error o ...
of
derived from the input data.
Algorithm
The algorithm seeks to find the least squares solution to the equation:
:
where
is a matrix of mortality rate for each age
in each year
.
# Compute
which is the average over time of
for each age:
#;:
# Compute
which will be used in SVD:
#;:
# Compute the singular value decomposition of
:
#;:
# Derive
,
(the scaling eigenvalue), and
from
,
, and
:
#;:
#;:
# Forecast
using a standard univariate
ARIMA
Arima, officially The Royal Chartered Borough of Arima is the easternmost and second largest in area of the three boroughs of Trinidad and Tobago. It is geographically adjacent to Sangre Grande and Arouca at the south central foothills of th ...
model to
additional years:
#;:
# Use the forecasted
, with the original
, and
to calculate the forecasted mortality rate for each age:
#;:
Discussion
Without applying SVD or some other method of
dimension reduction
Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally ...
the table of mortality data is a highly correlated multivariate data series, and the complexity of these multidimensional time series makes them difficult to forecast. SVD has become widely used as a method of dimension reduction in many different fields, including by
Google
Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
in their
page rank
PageRank (PR) is an algorithm used by Google Search to rank webpages, web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. A ...
algorithm.
The Lee–Carter model was introduced by
Ronald D. Lee and Lawrence Carter in 1992 with the article "Modeling and Forecasting the Time Series of U.S. Mortality," (Journal of the American Statistical Association 87 (September): 659–671). The model grew out of their work in the late 1980s and early 1990s attempting to use
inverse projection to infer rates in
historical demography
Historical demography is the quantitative study of human population in the past. It is concerned with population size, with the three basic components of population change (fertility, mortality, and migration), and with population characteristi ...
. The model has been used by the United States
Social Security Administration
The United States Social Security Administration (SSA) is an independent agency of the U.S. federal government that administers Social Security, a social insurance program consisting of retirement, disability and survivor benefits. To qualify ...
, the US
Census Bureau
The United States Census Bureau (USCB), officially the Bureau of the Census, is a principal agency of the Federal Statistical System of the United States, U.S. Federal Statistical System, responsible for producing data about the Americans, Ame ...
, and the United Nations. It has become the most widely used mortality forecasting technique in the world today.
There have been extensions to the Lee–Carter model, most notably to account for missing years, correlated male and female populations, and large scale coherency in populations that share a mortality regime (western Europe, for example). Many related papers can be found o
Professor Ronald Lee'swebsite.
Implementations
There are surprisingly few software packages for forecasting with the Lee–Carter model.
LCFITis a web-based package with interactive forms.
* Professor
Rob J. Hyndman provides a
R package for demographythat includes routines for creating and forecasting a Lee–Carter model.
* Alternatives in R include th
StMoMo packageof Villegas, Millossovich and Kaishev (2015).
* Professor German Rodriguez provide
using
Stata.
* Using
Matlab
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...
, Professor Eric Jondeau and Professor Michael Rockinger have put together th
Longevity Toolboxfor parameter estimation.
References
{{DEFAULTSORT:Lee-Carter model
Actuarial science
Population
Population ecology