HOME

TheInfoList



OR:

In artificial intelligence, genetic programming (GP) is a technique of evolving programs, starting from a population of unfit (usually random) programs, fit for a particular task by applying operations analogous to natural genetic processes to the population of programs. The operations are: selection of the fittest programs for reproduction (crossover) and mutation according to a predefined fitness measure, usually proficiency at the desired task. The crossover operation involves swapping random parts of selected pairs (parents) to produce new and different offspring that become part of the new generation of programs. Mutation involves substitution of some random part of a program with some other random part of a program. Some programs not selected for reproduction are copied from the current generation to the new generation. Then the selection and other operations are recursively applied to the new generation of programs. Typically, members of each new generation are on average more fit than the members of the previous generation, and the best-of-generation program is often better than the best-of-generation programs from previous generations. Termination of the evolution usually occurs when some individual program reaches a predefined proficiency or fitness level. It may and often does happen that a particular run of the algorithm results in premature convergence to some local maximum which is not a globally optimal or even good solution. Multiple runs (dozens to hundreds) are usually necessary to produce a very good result. It may also be necessary to have a large starting population size and variability of the individuals to avoid pathologies.


History

The first record of the proposal to evolve programs is probably that of
Alan Turing Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical co ...
in 1950. There was a gap of 25 years before the publication of John Holland's 'Adaptation in Natural and Artificial Systems' laid out the theoretical and empirical foundations of the science. In 1981, Richard Forsyth demonstrated the successful evolution of small programs, represented as trees, to perform classification of crime scene evidence for the UK Home Office. Although the idea of evolving programs, initially in the computer language Lisp, was current amongst John Holland’s students, it was not until they organised the first
Genetic Algorithms In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to gene ...
(GA) conference in Pittsburgh that Nichael Cramer published evolved programs in two specially designed languages, which included the first statement of modern "tree-based" Genetic Programming (that is, procedural languages organized in tree-based structures and operated on by suitably defined GA-operators). In 1988,
John Koza John R. Koza is a computer scientist and a former adjunct professor at Stanford University, most notable for his work in pioneering the use of genetic programming for the optimization of complex problems. Koza co-founded Scientific Games Corporati ...
(also a PhD student of John Holland) patented his invention of a GA for program evolution. This was followed by publication in the International Joint Conference on Artificial Intelligence IJCAI-89. Koza followed this with 205 publications on “Genetic Programming” (GP), name coined by David Goldberg, also a PhD student of John Holland. However, it is the series of 4 books by Koza, starting in 1992 with accompanying videos, that really established GP. Subsequently, there was an enormous expansion of the number of publications with the Genetic Programming Bibliography, surpassing 10,000 entries. In 2010, Koza listed 77 results where Genetic Programming was human competitive. In 1996, Koza started the annual Genetic Programming conference which was followed in 1998 by the annual EuroGP conference, and the first book in a GP series edited by Koza. 1998 also saw the first GP textbook. GP continued to flourish, leading to the first specialist GP journal and three years later (2003) the annual Genetic Programming Theory and Practice (GPTP) workshop was established by Rick Riolo. Genetic Programming papers continue to be published at a diversity of conferences and associated journals. Today there are nineteen GP books including several for students.


Foundational work in GP

Early work that set the stage for current genetic programming research topics and applications is diverse, and includes software synthesis and repair, predictive modeling, data mining, financial modeling, soft sensors, design, and image processing. Applications in some areas, such as design, often make use of intermediate representations, such as Fred Gruau’s cellular encoding. Industrial uptake has been significant in several areas including finance, the chemical industry, bioinformatics and the steel industry.


Methods


Program representation

GP evolves computer programs, traditionally represented in memory as
tree structure A tree structure, tree diagram, or tree model is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, although the chart is gener ...
s.Nichael L. Crame
"A Representation for the Adaptive Generation of Simple Sequential Programs"
.
Trees can be easily evaluated in a recursive manner. Every tree node has an operator function and every terminal node has an operand, making mathematical expressions easy to evolve and evaluate. Thus traditionally GP favors the use of
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s that naturally embody tree structures (for example, Lisp; other
functional programming languages In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that m ...
are also suitable). Non-tree representations have been suggested and successfully implemented, such as
linear genetic programming :''"Linear genetic programming" is unrelated to "linear programming".'' Linear genetic programming (LGP) is a particular subset of genetic programming wherein computer programs in a population are represented as a sequence of instructions from i ...
which suits the more traditional
imperative languages In computer science, imperative programming is a programming paradigm of software that uses statements that change a program's state. In much the same way that the imperative mood in natural languages expresses commands, an imperative program c ...
ee, for example, Banzhaf ''et al.'' (1998) The commercial GP software ''Discipulus'' uses automatic induction of binary machine code ("AIM") to achieve better performance. ''µGP'' uses
directed multigraph In mathematics, and more specifically in graph theory, a multigraph is a graph which is permitted to have multiple edges (also called ''parallel edges''), that is, edges that have the same end nodes. Thus two vertices may be connected by mor ...
s to generate programs that fully exploit the syntax of a given
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
.
Multi expression programming Multi Expression Programming (MEP) is an evolutionary algorithm for generating mathematical functions describing a given set of data. MEP is a Genetic Programming variant encoding multiple solutions in the same chromosome. MEP representation is no ...
uses
Three-address code In computer science, three-address code (often abbreviated to TAC or 3AC) is an intermediate code used by optimizing compilers to aid in the implementation of code-improving transformations. Each TAC instruction has at most three operands and is ty ...
for encoding solutions. Other program representations on which significant research and development have been conducted include programs for stack-based virtual machines, and sequences of integers that are mapped to arbitrary programming languages via grammars.
Cartesian genetic programming Cartesian genetic programming is a form of genetic programming that uses a graph representation to encode computer programs. It grew from a method of evolving digital circuits developed by Julian F. Miller and Peter Thomson in 1997. The term ‘Ca ...
is another form of GP, which uses a graph representation instead of the usual tree based representation to encode computer programs. Most representations have structurally noneffective code ( introns). Such non-coding genes may seem to be useless because they have no effect on the performance of any one individual. However, they alter the probabilities of generating different offspring under the variation operators, and thus alter the individual's
variational properties In evolutionary biology, the variational properties of an organism are those properties relating to the production of variation among its offspring. In a broader sense variational properties include phenotypic plasticity. Wagner, G. P. and Altenbe ...
. Experiments seem to show faster convergence when using program representations that allow such non-coding genes, compared to program representations that do not have any non-coding genes.


Selection

Selection is a process whereby certain individuals are selected from the current generation that would serve as parents for the next generation. The individuals are selected probabilistically such that the better performing individuals have a higher chance of getting selected. The most commonly used selection method in GP is
tournament selection Tournament selection is a method of selecting an individual from a population of individuals in a genetic algorithm. Tournament selection involves running several "tournaments" among a few individuals (or "chromosomes") chosen at random from the po ...
, although other methods such as
fitness proportionate selection Fitness proportionate selection, also known as roulette wheel selection, is a genetic operator used in genetic algorithms for selecting potentially useful solutions for recombination. In fitness proportionate selection, as in all selection method ...
, lexicase selection, and others have been demonstrated to perform better for many GP problems. Elitism, which involves seeding the next generation with the best individual (or best ''n'' individuals) from the current generation, is a technique sometimes employed to avoid regression.


Crossover

In Genetic Programming two fit individuals are chosen from the population to be parents for one or two children. In tree genetic programming, these parents are represented as inverted lisp like trees, with their root nodes at the top. In subtree crossover in each parent a subtree is randomly chosen. (Highlighted with yellow in the animation.) In the root donating parent (in the animation on the left) the chosen subtree is removed and replaced with a copy of the randomly chosen subtree from the other parent, to give a new child tree. Sometimes two child crossover is used, in which case the removed subtree (in the animation on the left) is not simply deleted but is copied to a copy of the second parent (here on the right) replacing (in the copy) its randomly chosen subtree. Thus this type of subtree crossover takes two fit trees and generates two child trees.


Mutation

There are many types of mutation in genetic programming. They start from a fit syntactically correct parent and aim to randomly create a syntactically correct child. In the animation a subtree is randomly chosen (highlighted by yellow). It is removed and replaced by a randomly generated subtree. Other mutation operators select a leaf (external node) of the tree and replace it with a randomly chosen leaf. Another mutation is to select at random a function (internal node) and replace it with another function with the same arity (number of inputs). Hoist mutation randomly chooses a subtree and replaces it with a subtree within itself. Thus hoist mutation is guaranteed to make the child smaller. Leaf and same arity function replacement ensure the child is the same size as the parent. Whereas subtree mutation (in the animation) may, depending upon the function and terminal sets, have a bias to either increase or decrease the tree size. Other subtree based mutations try to carefully control the size of the replacement subtree and thus the size of the child tree. Similarly there are many types of linear genetic programming mutation, each of which tries to ensure the mutated child is still syntactically correct.


Applications

GP has been successfully used as an
automatic programming In computer science, the term automatic programming identifies a type of computer programming in which some mechanism generates a computer program to allow human programmers to write the code at a higher abstraction level. There has been little ...
tool, a machine learning tool and an automatic problem-solving engine. GP is especially useful in the domains where the exact form of the solution is not known in advance or an approximate solution is acceptable (possibly because finding the exact solution is very difficult). Some of the applications of GP are curve fitting, data modeling,
symbolic regression Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity. No particular model is provided as a start ...
,
feature selection In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construc ...
, classification, etc. John R. Koza mentions 76 instances where Genetic Programming has been able to produce results that are competitive with human-produced results (called Human-competitive results). Since 2004, the annual Genetic and Evolutionary Computation Conference (
GECCO The Genetic and Evolutionary Computation Conference (GECCO) is the premier conference in the area of genetic and evolutionary computation. GECCO has been held every year since 1999, when it was first established as a recombination of the Internat ...
) holds Human Competitive Awards (called Humies) competition, where cash awards are presented to human-competitive results produced by any form of genetic and evolutionary computation. GP has won many awards in this competition over the years. In 2022, it was shown that genetic programming of
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
pipelines could be leveraged to improve the autonomous detection of falls from
EEG Electroencephalography (EEG) is a method to record an electrogram of the spontaneous electrical activity of the brain. The biosignals detected by EEG have been shown to represent the postsynaptic potentials of pyramidal neurons in the neocortex ...
brainwave Neural oscillations, or brainwaves, are rhythmic or repetitive patterns of neural activity in the central nervous system. Neural tissue can generate oscillatory activity in many ways, driven either by mechanisms within individual neurons or by ...
data.


Meta-genetic programming

Meta-genetic programming is the proposed
meta-learning Meta-learning is a branch of metacognition concerned with learning about one's own learning and learning processes. The term comes from the meta prefix's modern meaning of an abstract recursion, or "X about X", similar to its use in metaknowled ...
technique of evolving a genetic programming system using genetic programming itself. It suggests that chromosomes, crossover, and mutation were themselves evolved, therefore like their real life counterparts should be allowed to change on their own rather than being determined by a human programmer. Meta-GP was formally proposed by
Jürgen Schmidhuber Jürgen Schmidhuber (born 17 January 1963) is a German computer scientist most noted for his work in the field of artificial intelligence, deep learning and artificial neural networks. He is a co-director of the Dalle Molle Institute for Artifi ...
in 1987.
Doug Lenat Douglas Bruce Lenat (born 1950) is the CEO of Cycorp, Inc. of Austin, Texas, and has been a prominent researcher in artificial intelligence; he was awarded the biannual IJCAI Computers and Thought Award in 1976 for creating the machine learning p ...
's Eurisko is an earlier effort that may be the same technique. It is a recursive but terminating algorithm, allowing it to avoid infinite recursion. In the "autoconstructive evolution" approach to meta-genetic programming, the methods for the production and variation of offspring are encoded within the evolving programs themselves, and programs are executed to produce new programs to be added to the population. Critics of this idea often say this approach is overly broad in scope. However, it might be possible to constrain the fitness criterion onto a general class of results, and so obtain an evolved GP that would more efficiently produce results for sub-classes. This might take the form of a meta evolved GP for producing human walking algorithms which is then used to evolve human running, jumping, etc. The fitness criterion applied to the meta GP would simply be one of efficiency.


See also

*
Bio-inspired computing Bio-inspired computing, short for biologically inspired computing, is a field of study which seeks to solve computer science problems using models of biology. It relates to connectionism, social behavior, and emergence. Within computer science, b ...
*
Cartesian genetic programming Cartesian genetic programming is a form of genetic programming that uses a graph representation to encode computer programs. It grew from a method of evolving digital circuits developed by Julian F. Miller and Peter Thomson in 1997. The term ‘Ca ...
* Covariance Matrix Adaptation Evolution Strategy (CMA-ES) *
Fitness approximation Fitness approximationY. JinA comprehensive survey of fitness approximation in evolutionary computation ''Soft Computing'', 9:3–12, 2005 aims to approximate the objective or fitness functions in evolutionary optimization by building up machine l ...
*
Gene expression programming In computer programming, gene expression programming (GEP) is an evolutionary algorithm that creates computer programs or models. These computer programs are complex tree structures that learn and adapt by changing their sizes, shapes, and compos ...
* Genetic improvement *
Genetic representation In computer programming, genetic representation is a way of presenting solutions/individuals in evolutionary computation methods. Genetic representation can encode appearance, behavior, physical qualities of individuals. Designing a good genetic re ...
* Grammatical evolution *
Inductive programming Inductive programming (IP) is a special area of automatic programming, covering research from artificial intelligence and programming, which addresses learning of typically declarative (logic or functional) and often recursive programs from inc ...
*
Linear genetic programming :''"Linear genetic programming" is unrelated to "linear programming".'' Linear genetic programming (LGP) is a particular subset of genetic programming wherein computer programs in a population are represented as a sequence of instructions from i ...
*
Multi expression programming Multi Expression Programming (MEP) is an evolutionary algorithm for generating mathematical functions describing a given set of data. MEP is a Genetic Programming variant encoding multiple solutions in the same chromosome. MEP representation is no ...
* Propagation of schema


References


External links


Aymen S Saket & Mark C Sinclair

''Genetic Programming and Evolvable Machines''
a journal
''Evo2 for genetic programming''

GP bibliography


* Riccardo Poli, William B. Langdon,Nicholas F. McPhee, John R. Koza,
A Field Guide to Genetic Programming
(2008)
Genetic Programming, a community maintained resource
{{DEFAULTSORT:Genetic Programming Genetic algorithms