In
mathematical optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
and
decision theory, a loss function or cost function (sometimes also called an error function)
is a function that maps an
event or values of one or more variables onto a
real number intuitively representing some "cost" associated with the event. An
optimization problem seeks to minimize a loss function. An objective function is either a loss function or its opposite (in specific domains, variously called a
reward function, a
profit function, a
utility function, a
fitness function, etc.), in which case it is to be maximized. The loss function could include terms from several levels of the hierarchy.
In statistics, typically a loss function is used for
parameter estimation, and the event in question is some function of the difference between estimated and true values for an instance of data. The concept, as old as
Laplace
Pierre-Simon, marquis de Laplace (; ; 23 March 1749 – 5 March 1827) was a French scholar and polymath whose work was important to the development of engineering, mathematics, statistics, physics, astronomy, and philosophy. He summarized ...
, was reintroduced in statistics by
Abraham Wald in the middle of the 20th century. In the context of
economics, for example, this is usually
economic cost or
regret. In
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood.
Classification is the grouping of related facts into classes.
It may also refer to:
Business, organizat ...
, it is the penalty for an incorrect classification of an example. In
actuarial science, it is used in an insurance context to model benefits paid over premiums, particularly since the works of
Harald Cramér
Harald Cramér (; 25 September 1893 – 5 October 1985) was a Swedish mathematician, actuary, and statistician, specializing in mathematical statistics and probabilistic number theory. John Kingman described him as "one of the giants of statist ...
in the 1920s. In
optimal control
Optimal control theory is a branch of mathematical optimization that deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. It has numerous applications in science, engineering and ...
, the loss is the penalty for failing to achieve a desired value. In
financial risk management
Financial risk management is the practice of protecting economic value in a firm by using financial instruments to manage exposure to financial risk - principally operational risk, credit risk and market risk, with more specific variants as liste ...
, the function is mapped to a monetary loss.
Examples
Regret
Leonard J. Savage
Leonard Jimmie Savage (born Leonard Ogashevitz; 20 November 1917 – 1 November 1971) was an American mathematician and statistician. Economist Milton Friedman said Savage was "one of the few people I have met whom I would unhesitatingly call a ge ...
argued that using non-Bayesian methods such as
minimax
Minimax (sometimes MinMax, MM or saddle point) is a decision rule used in artificial intelligence, decision theory, game theory, statistics, and philosophy for ''mini''mizing the possible loss for a worst case (''max''imum loss) scenario. When de ...
, the loss function should be based on the idea of ''
regret'', i.e., the loss associated with a decision should be the difference between the consequences of the best decision that could have been made had the underlying circumstances been known and the decision that was in fact taken before they were known.
Quadratic loss function
The use of a
quadratic loss function is common, for example when using
least squares
The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
techniques. It is often more mathematically tractable than other loss functions because of the properties of
variances, as well as being symmetric: an error above the target causes the same loss as the same magnitude of error below the target. If the target is ''t'', then a quadratic loss function is
:
for some constant ''C''; the value of the constant makes no difference to a decision, and can be ignored by setting it equal to 1. This is also known as the squared error loss (SEL).
Many common
statistic
A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypo ...
s, including
t-tests,
regression
Regression or regressions may refer to:
Science
* Marine regression, coastal advance due to falling sea level, the opposite of marine transgression
* Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...
models,
design of experiments, and much else, use
least squares
The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the res ...
methods applied using
linear regression
In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is call ...
theory, which is based on the quadratic loss function.
The quadratic loss function is also used in
linear-quadratic optimal control problems. In these problems, even in the absence of uncertainty, it may not be possible to achieve the desired values of all target variables. Often loss is expressed as a
quadratic form
In mathematics, a quadratic form is a polynomial with terms all of degree two ("form" is another name for a homogeneous polynomial). For example,
:4x^2 + 2xy - 3y^2
is a quadratic form in the variables and . The coefficients usually belong to a ...
in the deviations of the variables of interest from their desired values; this approach is
tractable because it results in linear
first-order conditions. In the context of
stochastic control
Stochastic control or stochastic optimal control is a sub field of control theory that deals with the existence of uncertainty either in observations or in the noise that drives the evolution of the system. The system designer assumes, in a Bayes ...
, the expected value of the quadratic form is used.
0-1 loss function
In
statistics
Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...
and
decision theory, a frequently used loss function is the ''0-1 loss function''
:
where
is the
indicator function
In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\i ...
.
Constructing loss and objective functions
In many applications, objective functions, including loss functions as a particular case, are determined by the problem formulation. In other situations, the decision maker’s preference must be elicited and represented by a scalar-valued function (called also
utility function) in a form suitable for optimization — the problem that
Ragnar Frisch has highlighted in his Nobel Prize lecture.
The existing methods for constructing objective functions are collected in the proceedings of two dedicated conferences.
In particular,
Andranik Tangian showed that the most usable objective functions — quadratic and additive — are determined by a few indifference points. He used this property in the models for constructing these objective functions from either
ordinal or
cardinal
Cardinal or The Cardinal may refer to:
Animals
* Cardinal (bird) or Cardinalidae, a family of North and South American birds
**''Cardinalis'', genus of cardinal in the family Cardinalidae
**''Cardinalis cardinalis'', or northern cardinal, the ...
data that were elicited through computer-assisted interviews with decision makers.
Among other things, he constructed objective functions to optimally distribute budgets for 16 Westfalian universities
and the European subsidies for equalizing unemployment rates among 271 German regions.
Expected loss
In some contexts, the value of the loss function itself is a random quantity because it depends on the outcome of a random variable ''X''.
Statistics
Both
frequentist and
Bayesian statistical theory involve making a decision based on the
expected value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of the loss function; however, this quantity is defined differently under the two paradigms.
Frequentist expected loss
We first define the expected loss in the frequentist context. It is obtained by taking the expected value with respect to the probability distribution, ''P''
''θ'', of the observed data, ''X''. This is also referred to as the risk function of the decision rule ''δ'' and the parameter ''θ''. Here the decision rule depends on the outcome of ''X''. The risk function is given by:
:
Here, ''θ'' is a fixed but possibly unknown state of nature, ''X'' is a vector of observations stochastically drawn from a
population,
is the expectation over all population values of ''X'', ''dP''
''θ'' is a
probability measure
In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more gener ...
over the event space of ''X'' (parametrized by ''θ'') and the integral is evaluated over the entire
support
Support may refer to:
Arts, entertainment, and media
* Supporting character
Business and finance
* Support (technical analysis)
* Child support
* Customer support
* Income Support
Construction
* Support (structure), or lateral support, a ...
of ''X''.
Bayesian expected loss
In a Bayesian approach, the expectation is calculated using the
posterior distribution * of the parameter ''θ'':
:
One then should choose the action ''a
*'' which minimises the expected loss. Although this will result in choosing the same action as would be chosen using the frequentist risk, the emphasis of the Bayesian approach is that one is only interested in choosing the optimal action under the actual observed data, whereas choosing the actual frequentist optimal decision rule, which is a function of all possible observations, is a much more difficult problem.
Examples in statistics
* For a scalar parameter ''θ'', a decision function whose output
is an estimate of ''θ'', and a quadratic loss function (
squared error loss
In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
)
the risk function becomes the
mean squared error of the estimate,
An
Estimator found by minimizing the
Mean squared error estimates the
Posterior distribution's mean.
* In
density estimation, the unknown parameter is
probability density itself. The loss function is typically chosen to be a
norm in an appropriate
function space
In mathematics, a function space is a set of functions between two fixed sets. Often, the domain and/or codomain will have additional structure which is inherited by the function space. For example, the set of functions from any set into a vect ...
. For example, for
''L''2 norm,
the risk function becomes the
mean integrated squared error
Economic choice under uncertainty
In economics, decision-making under uncertainty is often modelled using the
von Neumann–Morgenstern utility function of the uncertain variable of interest, such as end-of-period wealth. Since the value of this variable is uncertain, so is the value of the utility function; it is the expected value of utility that is maximized.
Decision rules
A
decision rule makes a choice using an optimality criterion. Some commonly used criteria are:
*
Minimax
Minimax (sometimes MinMax, MM or saddle point) is a decision rule used in artificial intelligence, decision theory, game theory, statistics, and philosophy for ''mini''mizing the possible loss for a worst case (''max''imum loss) scenario. When de ...
: Choose the decision rule with the lowest worst loss — that is, minimize the worst-case (maximum possible) loss:
*
Invariance
Invariant and invariance may refer to:
Computer science
* Invariant (computer science), an expression whose value doesn't change during program execution
** Loop invariant, a property of a program loop that is true before (and after) each iterat ...
: Choose the decision rule which satisfies an invariance requirement.
*Choose the decision rule with the lowest average loss (i.e. minimize the
expected value
In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a l ...
of the loss function):
Selecting a loss function
Sound statistical practice requires selecting an estimator consistent with the actual acceptable variation experienced in the context of a particular applied problem. Thus, in the applied use of loss functions, selecting which statistical method to use to model an applied problem depends on knowing the losses that will be experienced from being wrong under the problem's particular circumstances.
A common example involves estimating "
location". Under typical statistical assumptions, the
mean or average is the statistic for estimating location that minimizes the expected loss experienced under the
squared-error loss function, while the
median
In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
is the estimator that minimizes expected loss experienced under the absolute-difference loss function. Still different estimators would be optimal under other, less common circumstances.
In economics, when an agent is
risk neutral, the objective function is simply expressed as the expected value of a monetary quantity, such as profit, income, or end-of-period wealth. For
risk-averse or
risk-loving agents, loss is measured as the negative of a
utility function, and the objective function to be optimized is the expected value of utility.
Other measures of cost are possible, for example
mortality
Mortality is the state of being mortal, or susceptible to death; the opposite of immortality.
Mortality may also refer to:
* Fish mortality, a parameter used in fisheries population dynamics to account for the loss of fish in a fish stock throug ...
or
morbidity in the field of
public health or
safety engineering.
For most
optimization algorithms, it is desirable to have a loss function that is globally
continuous and
differentiable.
Two very commonly used loss functions are the
squared loss,
, and the
absolute loss,
. However the absolute loss has the disadvantage that it is not differentiable at
. The squared loss has the disadvantage that it has the tendency to be dominated by
outlier
In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s—when summing over a set of
's (as in
), the final sum tends to be the result of a few particularly large ''a''-values, rather than an expression of the average ''a''-value.
The choice of a loss function is not arbitrary. It is very restrictive and sometimes the loss function may be characterized by its desirable properties. Among the choice principles are, for example, the requirement of completeness of the class of symmetric statistics in the case of
i.i.d. observations, the principle of complete information, and some others.
W. Edwards Deming and
Nassim Nicholas Taleb
Nassim Nicholas Taleb (; alternatively ''Nessim ''or'' Nissim''; born 12 September 1960) is a Lebanese-American essayist, mathematical statistician, former option trader, risk analyst, and aphorist whose work concerns problems of randomness, ...
argue that empirical reality, not nice mathematical properties, should be the sole basis for selecting loss functions, and real losses often are not mathematically nice and are not differentiable, continuous, symmetric, etc. For example, a person who arrives before a plane gate closure can still make the plane, but a person who arrives after can not, a discontinuity and asymmetry which makes arriving slightly late much more costly than arriving slightly early. In drug dosing, the cost of too little drug may be lack of efficacy, while the cost of too much may be tolerable toxicity, another example of asymmetry. Traffic, pipes, beams, ecologies, climates, etc. may tolerate increased load or stress with little noticeable change up to a point, then become backed up or break catastrophically. These situations, Deming and Taleb argue, are common in real-life problems, perhaps more common than classical smooth, continuous, symmetric, differentials cases.
See also
*
Bayesian regret
*
Loss functions for classification
*
Discounted maximum loss
Discounted maximum loss, also known as worst-case risk measure, is the present value of the worst-case scenario for a financial portfolio.
In investment, in order to protect the value of an investment, one must consider all possible alternatives ...
*
Hinge loss
*
Scoring rule
*
Statistical risk
References
Further reading
*
*
*
*
*
{{DEFAULTSORT:Loss Function
Optimal decisions
*