No Free Lunch In Search And Optimization
   HOME

TheInfoList



OR:

In
computational complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations) ...
and
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
the no free lunch theorem is a result that states that for certain types of mathematical problems, the
computational cost In computational complexity theory, a computational resource is a resource used by some computational models in the solution of computational problems. The simplest computational resources are computation time, the number of steps necessary to s ...
of finding a solution, averaged over all problems in the class, is the same for any solution method. The name alludes to the saying "
there ain't no such thing as a free lunch "There ain't no such thing as a free lunch" (alternatively, "There is no such thing as a free lunch" or other variants) is a popular adage communicating the idea that it is impossible to get something for nothing. The acronyms TANSTAAFL, TINSTAA ...
", that is, no method offers a "short cut". This is under the assumption that the search space is a probability density function. It does not apply to the case where the search space has underlying structure (e.g., is a differentiable function) that can be exploited more efficiently (e.g.,
Newton's method in optimization In calculus, Newton's method is an iterative method for finding the roots of a differentiable function , which are solutions to the equation . As such, Newton's method can be applied to the derivative of a twice-differentiable function to fi ...
) than random search or even has closed-form solutions (e.g., the extrema of a quadratic polynomial) that can be determined without search at all. For such probabilistic assumptions, the outputs of all procedures solving a particular type of problem are statistically identical. A colourful way of describing such a circumstance, introduced by
David Wolpert David Hilton Wolpert is an American mathematician, physicist and computer scientist. He is a professor at Santa Fe Institute. He is the author of three books, three patents, over one hundred refereed papers, and has received numerous awards. His ...
and William G. Macready in connection with the problems of search and optimization, is to say that there is no free lunch. Wolpert had previously derived no free lunch theorems for
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
(
statistical inference Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution, distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical ...
). Before Wolpert's article was published, Cullen Schaffer independently proved a restricted version of one of Wolpert's theorems and used it to critique the current state of machine learning research on the problem of induction. In the "no free lunch"
metaphor A metaphor is a figure of speech that, for rhetorical effect, directly refers to one thing by mentioning another. It may provide (or obscure) clarity or identify hidden similarities between two different ideas. Metaphors are often compared wit ...
, each "restaurant" (problem-solving procedure) has a "menu" associating each "lunch plate" (problem) with a "price" (the performance of the procedure in solving the problem). The menus of restaurants are identical except in one regard – the prices are shuffled from one restaurant to the next. For an
omnivore An omnivore () is an animal that has the ability to eat and survive on both plant and animal matter. Obtaining energy and nutrients from plant and animal matter, omnivores digest carbohydrates, protein, fat, and fiber, and metabolize the nutr ...
who is as likely to order each plate as any other, the average cost of lunch does not depend on the choice of restaurant. But a
vegan Veganism is the practice of abstaining from the use of animal product—particularly in diet—and an associated philosophy that rejects the commodity status of animals. An individual who follows the diet or philosophy is known as a vegan. Di ...
who goes to lunch regularly with a
carnivore A carnivore , or meat-eater (Latin, ''caro'', genitive ''carnis'', meaning meat or "flesh" and ''vorare'' meaning "to devour"), is an animal or plant whose food and energy requirements derive from animal tissues (mainly muscle, fat and other sof ...
who seeks economy might pay a high average cost for lunch. To methodically reduce the average cost, one must use advance knowledge of a) what one will order and b) what the order will cost at various restaurants. That is, improvement of performance in problem-solving hinges on using prior information to match procedures to problems. In formal terms, there is no free lunch when the
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
on problem instances is such that all problem solvers have identically distributed results. In the case of
search Searching or search may refer to: Computing technology * Search algorithm, including keyword search ** :Search algorithms * Search and optimization for problem solving in artificial intelligence * Search engine technology, software for findi ...
, a problem instance in this context is a particular
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
, and a result is a
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
of values obtained in evaluation of
candidate solutions In mathematical optimization, a feasible region, feasible set, search space, or solution space is the set of all possible points (sets of values of the choice variables) of an optimization problem that satisfy the problem's constraints, poten ...
in the
domain Domain may refer to: Mathematics *Domain of a function, the set of input values for which the (total) function is defined **Domain of definition of a partial function **Natural domain of a partial function **Domain of holomorphy of a function * Do ...
of the function. For typical interpretations of results, search is an
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
process. There is no free lunch in search if and only if the distribution on objective functions is
invariant Invariant and invariance may refer to: Computer science * Invariant (computer science), an expression whose value doesn't change during program execution ** Loop invariant, a property of a program loop that is true before (and after) each iteratio ...
under
permutation In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or proc ...
of the space of candidate solutions.Streeter, M. (2003)
Two Broad Classes of Functions for Which a No Free Lunch Result Does Not Hold
" ''Genetic and Evolutionary Computation – GECCO 2003'', pp. 1418–1430.
Igel, C., and Toussaint, M. (2004)
A No-Free-Lunch Theorem for Non-Uniform Distributions of Target Functions
" ''Journal of Mathematical Modelling and Algorithms'' 3, pp. 313–322.
English, T. (2004
No More Lunch: Analysis of Sequential Search
''Proceedings of the 2004 IEEE Congress on Evolutionary Computation'', pp. 227–234.
This condition does not hold precisely in practice, but an "(almost) no free lunch" theorem suggests that it holds approximately.S. Droste, T. Jansen, and I. Wegener. 2002.
Optimization with randomized search heuristics: the (A)NFL theorem, realistic scenarios, and difficult functions
" ''Theoretical Computer Science,'' vol. 287, no. 1, pp. 131–144.


Overview

Some computational problems are solved by searching for good solutions in a space of
candidate solution In mathematical optimization, a feasible region, feasible set, search space, or solution space is the set of all possible points (sets of values of the choice variables) of an optimization problem that satisfy the problem's constraints, potent ...
s. A description of how to repeatedly select candidate solutions for evaluation is called a
search algorithm In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the search space of a problem domain, with eith ...
. On a particular problem, different search algorithms may obtain different results, but over all problems, they are indistinguishable. It follows that if an algorithm achieves superior results on some problems, it must pay with inferiority on other problems. In this sense there is no free lunch in search. Alternatively, following Schaffer, search performance is conserved. Usually search is interpreted as
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
, and this leads to the observation that there is no free lunch in optimization. "The 'no free lunch' theorem of Wolpert and Macready," as stated in plain language by Wolpert and Macready themselves, is that "any two algorithms are equivalent when their performance is averaged across all possible problems."Wolpert, D.H., and Macready, W.G. (2005)
Coevolutionary free lunches
" ''IEEE Transactions on Evolutionary Computation'', 9(6): 721–735
The "no free lunch" results indicate that matching algorithms to problems gives higher average performance than does applying a fixed algorithm to all. Igel and Toussaint and English have established a general condition under which there is no free lunch. While it is physically possible, it does not hold precisely. Droste, Jansen, and Wegener have proved a theorem they interpret as indicating that there is "(almost) no free lunch" in practice. To make matters more concrete, consider an optimization practitioner confronted with a problem. Given some knowledge of how the problem arose, the practitioner may be able to exploit the knowledge in selection of an algorithm that will perform well in solving the problem. If the practitioner does not understand how to exploit the knowledge, or simply has no knowledge, then he or she faces the question of whether some algorithm generally outperforms others on real-world problems. The authors of the "(almost) no free lunch" theorem say that the answer is essentially no, but admit some reservations as to whether the theorem addresses practice.


Theorems

A "problem" is, more formally, an
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
that associates
candidate solution In mathematical optimization, a feasible region, feasible set, search space, or solution space is the set of all possible points (sets of values of the choice variables) of an optimization problem that satisfy the problem's constraints, potent ...
s with goodness values. A
search algorithm In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the search space of a problem domain, with eith ...
takes an objective function as input and evaluates candidate solutions one-by-one. The output of the algorithm is the
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
of observed goodness values. Wolpert and Macready stipulate that an algorithm never reevaluates a candidate solution, and that algorithm performance is measured on outputs. For simplicity, we disallow randomness in algorithms. Under these conditions, when a search algorithm is run on every possible input, it generates each possible output exactly once. Because performance is measured on the outputs, the algorithms are indistinguishable in how often they achieve particular levels of performance. Some measures of performance indicate how well search algorithms do at
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
of the objective function. Indeed, there seems to be no interesting application of search algorithms in the class under consideration but to optimization problems. A common performance measure is the least index of the least value in the output sequence. This is the number of evaluations required to minimize the objective function. For some algorithms, the time required to find the minimum is proportional to the number of evaluations. The original no free lunch (NFL) theorems assume that all objective functions are equally likely to be input to search algorithms. It has since been established that there is NFL if and only if, loosely speaking, "shuffling" objective functions has no impact on their probabilities. Although this condition for NFL is physically possible, it has been argued that it certainly does not hold precisely. The obvious interpretation of "not NFL" is "free lunch," but this is misleading. NFL is a matter of degree, not an all-or-nothing proposition. If the condition for NFL holds approximately, then all algorithms yield approximately the same results over all objective functions. "Not NFL" implies only that algorithms are inequivalent overall by ''some'' measure of performance. For a performance measure of interest, algorithms may remain equivalent, or nearly so.


Kolmogorov randomness

Almost all elements of the set of all possible functions (in the set-theoretic sense of "function") are Kolmogorov random, and hence the NFL theorems apply to a set of functions almost all of which cannot be expressed more compactly than as a lookup table that contains a distinct (and random) entry for each point in the search space. Functions that can be expressed more compactly (for example, by a mathematical expression of reasonable size) are by definition not Kolmogorov random. Further, within the set of all possible objective functions, levels of goodness are equally represented among candidate solutions, hence good solutions are scattered throughout the space of candidates. Accordingly, a search algorithm will rarely evaluate more than a small fraction of the candidates before locating a very good solution. Almost all objective functions are of such high
Kolmogorov complexity In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produ ...
that they cannot be stored in a particular computer. More precisely, if we model a given physical computer as a register machine with a given size memory on the order of the memories of modern computers, then most objective functions cannot be stored in their memories. There is more information in the typical objective function or algorithm than
Seth Lloyd Seth Lloyd (born August 2, 1960) is a professor of mechanical engineering and physics at the Massachusetts Institute of Technology. His research area is the interplay of information with complex systems, especially quantum systems. He has perform ...
estimates the observable universe is capable of registering. For instance, if each candidate solution is encoded as a sequence of 300 0's and 1's, and the goodness values are 0 and 1, then most objective functions have Kolmogorov complexity of at least 2300 bits, and this is greater than Lloyd's bound of 1090 ≈ 2299 bits. It follows that the original "no free lunch" theorem does not apply to what can be stored in a physical computer; instead the so-called "tightened" no free lunch theorems need to be applied. It has also been shown that NFL results apply to incomputable functions.


Formal synopsis

Y^X is the set of all
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
s ''f'':''X''→''Y'', where X is a finite solution space and Y is a finite
poset In mathematics, especially order theory, a partially ordered set (also poset) formalizes and generalizes the intuitive concept of an ordering, sequencing, or arrangement of the elements of a Set (mathematics), set. A poset consists of a set toget ...
. The set of all
permutation In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or proc ...
s of ''X'' is ''J''. A
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''F'' is distributed on Y^X. For all ''j'' in ''J'', ''F'' o ''j'' is a random variable distributed on Y^X, with P(''F'' o ''j'' = ''f'') = P(''F'' = ''f'' o ''j''−1) for all ''f'' in Y^X. Let ''a''(''f'') denote the output of search algorithm ''a'' on input ''f''. If ''a''(''F'') and ''b''(''F'') are identically distributed for all search algorithms ''a'' and ''b'', then ''F'' has an ''NFL distribution''. This condition holds if and only if ''F'' and ''F'' o ''j'' are identically distributed for all ''j'' in ''J''. In other words, there is no free lunch for search algorithms if and only if the distribution of objective functions is invariant under permutation of the solution space. Set-theoretic NFL theorems have recently been generalized to arbitrary cardinality X and Y.


Origin

Wolpert and Macready give two principal NFL theorems, the first regarding objective functions that do not change while search is in progress, and the second regarding objective functions that may change. :''Theorem 1'': For any pair of algorithms ''a''1 and ''a''2 ::\sum_f P(d_m^y , f, m, a_1) = \sum_f P(d_m^y , f, m, a_2), where d_m^y denotes the ordered set of size m of the cost values y \in Y associated to input values x \in X, f:X \rightarrow Y is the function being optimized and P(d_m^y , f, m, a) is the
conditional probability In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
of obtaining a given sequence of cost values from algorithm a run m times on function f. In essence, this says that when all functions ''f'' are equally likely, the probability of observing an arbitrary sequence of ''m'' values in the course of search does not depend upon the search algorithm. The second theorem establishes a "more subtle" NFL result for time-varying objective functions.


Interpretations of results

A conventional, but not entirely accurate, interpretation of the NFL results is that "a general-purpose universal optimization strategy is theoretically impossible, and the only way one strategy can outperform another is if it is specialized to the specific problem under consideration". Several comments are in order: *''A general-purpose almost-universal optimizer exists theoretically.'' Each search algorithm performs well on almost all objective functions. So if one is not concerned with the "relatively small" differences between search algorithms, e.g., because computer time is cheap, then you shouldn't worry about no free lunch. *''An algorithm may outperform another on a problem when neither is specialized to the problem.'' Indeed, it may be that both algorithms are among the worst for the problem. More generally, Wolpert and Macready have developed a measure of the degree of "alignment" between an algorithm and a distribution over problems (strictly speaking, an inner product). To say that one algorithm matches a distribution better than another is not to say that either has been consciously specialized to the distribution; an algorithm may have good alignment just by luck. *''In practice, some algorithms reevaluate candidate solutions.'' The reason to only consider performance on never-before-evaluated candidates is to make sure that in comparing algorithms one is comparing apples to apples. Moreover, any superiority of an algorithm that never reevaluates candidates over another algorithm that does on a particular problem may have nothing to do with specialization to the problem. *''For almost all objective functions, specialization is essentially accidental.'' Incompressible, or Kolmogorov random, objective functions have no regularity for an algorithm to exploit, as far as the universal Turing machining used to define Kolmogorov randomness is concerned. So presume that there is one, clearly superior choice of universal Turing machine. Then given an objective function that is incompressible for that Turing machine, there is no basis for choosing between two algorithms if both are compressible, as measured using that Turing machine. If a chosen algorithm performs better than most, the result is happenstance. A Kolmogorov random function has no representation smaller than a lookup table that contains a (random) value corresponding to each point in the search space; ''any'' function that can be expressed more compactly is, by definition, not Kolmogorov random. In practice, only highly compressible (far from random) objective functions fit in the storage of computers, and it is not the case that each algorithm performs well on almost all compressible functions. There is generally a performance advantage in incorporating prior knowledge of the problem into the algorithm. While the NFL results constitute, in a strict sense, full employment theorems for optimization professionals, it is important to bear the larger context in mind. For one thing, humans often have little prior knowledge to work with. For another, incorporating prior knowledge does not give much of a performance gain on some problems. Finally, human time is very expensive relative to computer time. There are many cases in which a company would choose to optimize a function slowly with an unmodified computer program rather than rapidly with a human-modified program. The NFL results do not indicate that it is futile to take "pot shots" at problems with unspecialized algorithms. No one has determined the fraction of practical problems for which an algorithm yields good results rapidly. And there is a practical free lunch, not at all in conflict with theory. Running an implementation of an algorithm on a computer costs very little relative to the cost of human time and the benefit of a good solution. If an algorithm succeeds in finding a satisfactory solution in an acceptable amount of time, a small investment has yielded a big payoff. If the algorithm fails, then little is lost.


Coevolution

Wolpert and Macready have proved that there are
free lunch A free lunch is the providing of a meal at no cost, usually as a sales enticement to attract customers and increase revenues from other business. It was once a common tradition in saloons and taverns in many places in the United States, with th ...
es in
coevolution In biology, coevolution occurs when two or more species reciprocally affect each other's evolution through the process of natural selection. The term sometimes is used for two traits in the same species affecting each other's evolution, as well ...
ary optimization.Wolpert, D.H., and Macready, W.G. (2005)
Coevolutionary free lunches
" ''IEEE Transactions on Evolutionary Computation'', 9(6): 721–735
Their analysis "covers 'self-play' problems. In these problems, the set of players work together to produce a champion, who then engages one or more antagonists in a subsequent multiplayer game." That is, the objective is to obtain a good player, but without an objective function. The goodness of each player (candidate solution) is assessed by observing how well it plays against others. An algorithm attempts to use players and their quality of play to obtain better players. The player deemed best of all by the algorithm is the champion. Wolpert and Macready have demonstrated that some coevolutionary algorithms are generally superior to other algorithms in quality of champions obtained. Generating a champion through self-play is of interest in evolutionary computation and
game theory Game theory is the study of mathematical models of strategic interactions among rational agents. Myerson, Roger B. (1991). ''Game Theory: Analysis of Conflict,'' Harvard University Press, p.&nbs1 Chapter-preview links, ppvii–xi It has appli ...
. The results are inapplicable to coevolution of biological species, which does not yield champions.


See also

*
Evolutionary informatics Intelligent design (ID) is a pseudoscientific argument for the existence of God, presented by its proponents as "an evidence-based scientific theory about life's origins". Numbers 2006, p. 373; " Dcaptured headlines for its bold attempt to ...
*
Inductive bias The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered. In machine learning, one aims to construct algorithms that a ...
*
Occam's razor Occam's razor, Ockham's razor, or Ocham's razor ( la, novacula Occami), also known as the principle of parsimony or the law of parsimony ( la, lex parsimoniae), is the problem-solving principle that "entities should not be multiplied beyond neces ...
*
Simplicity Simplicity is the state or quality of being simple. Something easy to understand or explain seems simple, in contrast to something complicated. Alternatively, as Herbert A. Simon suggests, something is simple or complex depending on the way we ch ...
*
Ugly duckling theorem The ugly duckling theorem is an argument showing that classification is not really possible without some sort of bias. More particularly, it assumes finitely many properties combinable by logical connectives, and finitely many objects; it asserts ...


Notes

{{reflist, 30em


External links

* http://www.no-free-lunch.org
Radcliffe and Surry, 1995, "Fundamental Limitations on Search Algorithms: Evolutionary Computing in Perspective" (an early published paper on NFL, appearing soon after Schaffer's ICML paper, which in turn was based on Wolpert's preprint; available in various formats)NFL publications by Thomas EnglishOld list of publications by David Wolpert, William Macready, and Mario Koeppen on optimization and search
Mathematical optimization Theorems in computational complexity theory