Holland's Schema Theorem
   HOME

TheInfoList



OR:

Holland's schema theorem, also called the fundamental theorem of genetic algorithms, is an inequality that results from coarse-graining an equation for
evolutionary dynamics Evolutionary dynamics is the study of the mathematical principles according to which biological organisms as well as cultural ideas evolve and evolved. This is mostly achieved through the mathematical discipline of population genetics, along with ...
. The Schema Theorem says that short, low-order schemata with above-average fitness increase exponentially in frequency in successive generations. The theorem was proposed by John Holland in the 1970s. It was initially widely taken to be the foundation for explanations of the power of
genetic algorithm In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to g ...
s. However, this interpretation of its implications has been criticized in several publications reviewed in, where the Schema Theorem is shown to be a special case of the
Price equation In the theory of evolution and natural selection, the Price equation (also known as Price's equation or Price's theorem) describes how a trait or allele changes in frequency over time. The equation uses a covariance between a trait and fitness, t ...
with the schema indicator function as the macroscopic measurement. A
schema Schema may refer to: Science and technology * SCHEMA (bioinformatics), an algorithm used in protein engineering * Schema (genetic algorithms), a set of programs or bit strings that have some genotypic similarity * Schema.org, a web markup vocab ...
is a template that identifies a
subset In mathematics, a Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they a ...
of strings with similarities at certain string positions. Schemata are a special case of
cylinder set In mathematics, the cylinder sets form a basis of the product topology on a product of sets; they are also a generating family of the cylinder σ-algebra. General definition Given a collection S of sets, consider the Cartesian product X = \prod ...
s, and hence form a
topological space In mathematics, a topological space is, roughly speaking, a Geometry, geometrical space in which Closeness (mathematics), closeness is defined but cannot necessarily be measured by a numeric Distance (mathematics), distance. More specifically, a to ...
.


Description

Consider binary strings of length 6. The schema 1*10*1 describes the set of all strings of length 6 with 1's at positions 1, 3 and 6 and a 0 at position 4. The * is a wildcard symbol, which means that positions 2 and 5 can have a value of either 1 or 0. The ''order of a schema'' o(H) is defined as the number of fixed positions in the template, while the '' defining length'' \delta(H) is the distance between the first and last specific positions. The order of 1*10*1 is 4 and its defining length is 5. The ''fitness of a schema'' is the average fitness of all strings matching the schema. The fitness of a string is a measure of the value of the encoded problem solution, as computed by a problem-specific evaluation function. Using the established methods and
genetic operator A genetic operator is an Operator (programming), operator used in evolutionary algorithms (EA) to guide the algorithm towards a solution to a given problem. There are three main types of operators (Mutation (evolutionary algorithm) , mutation, Cro ...
s of
genetic algorithms In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to g ...
, the schema theorem states that short, low-order schemata with above-average fitness increase exponentially in successive generations. Expressed as an equation: :\operatorname(m(H,t+1)) \geq -p Here m(H,t) is the number of strings belonging to schema H at generation t, f(H) is the ''observed'' average fitness of schema H and a_t is the ''observed'' average fitness at generation t. The probability of disruption p is the probability that crossover or mutation will destroy the schema H. Under the assumption that p_m \ll 1, it can be expressed as: :p = p_c + o(H) p_m where o(H) is the order of the schema, l is the length of the code, p_m is the probability of mutation and p_c is the probability of crossover. So a schema with a shorter defining length \delta(H) is less likely to be disrupted.
An often misunderstood point is why the Schema Theorem is an ''inequality'' rather than an equality. The answer is in fact simple: the Theorem neglects the small, yet non-zero, probability that a string belonging to the schema H will be created "from scratch" by mutation of a single string (or recombination of two strings) that did ''not'' belong to H in the previous generation. Moreover, the expression for p is clearly pessimistic: depending on the mating partner, recombination may not disrupt the scheme even when a cross point is selected between the first and the last fixed position of H.


Limitation

The schema theorem holds under the assumption of a genetic algorithm that maintains an infinitely large population, but does not always carry over to (finite) practice: due to
sampling error In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics of the sample ...
in the initial population, genetic algorithms may converge on schemata that have no selective advantage. This happens in particular in multimodal optimization, where a function can have multiple peaks: the population may drift to prefer one of the peaks, ignoring the others. The reason that the Schema Theorem cannot explain the power of genetic algorithms is that it holds for all problem instances, and cannot distinguish between problems in which genetic algorithms perform poorly, and problems for which genetic algorithms perform well.


References

{{reflist * J. Holland,
Adaptation in Natural and Artificial Systems
', The MIT Press; Reprint edition 1992 (originally published in 1975). * J. Holland, ''Hidden Order: How Adaptation Builds Complexity'', Helix Books; 1996. Genetic algorithms Theorems in discrete mathematics