Ray Solomonoff (July 25, 1926 – December 7, 2009) was the inventor of

algorithmic probability In algorithmic information theory, algorithmic probability, also known as Solomonoff probability, is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in induc ...

, his General Theory of Inductive Inference (also known as Universal Inductive Inference),Samuel Rathmanner and Marcus Hutter. A philosophical treatise of universal induction. Entropy, 13(6):1076–1136, 2011. and was a founder of

algorithmic information theory Algorithmic information theory (AIT) is a branch of theoretical computer science that concerns itself with the relationship between computation and information of computably generated objects (as opposed to stochastically generated), such as st ...

. He was an originator of the branch of

artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...

based on

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machin ...

, prediction and

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...

. He circulated the first report on non-semantic machine learning in 1956."An Inductive Inference Machine", Dartmouth College, N.H., version of Aug. 14, 1956
(pdf scanned copy of the original)
/ref> Solomonoff first described algorithmic probability in 1960, publishing the theorem that launched

Kolmogorov complexity In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that pr ...

and

. He first described these results at a conference at

Caltech The California Institute of Technology (branded as Caltech or CIT)The university itself only spells its short form as "Caltech"; the institution considers other spellings such a"Cal Tech" and "CalTech" incorrect. The institute is also occasional ...

in 1960, and in a report, Feb. 1960, "A Preliminary Report on a General Theory of Inductive Inference." He clarified these ideas more fully in his 1964 publications, "A Formal Theory of Inductive Inference," Part I and Part II.Solomonoff, R.,
A Formal Theory of Inductive Inference, Part II
''Information and Control'', Vol 7, No. 2 pp 224–254, June 1964. Algorithmic probability is a mathematically formalized combination of

Occam's razor Occam's razor, Ockham's razor, or Ocham's razor ( la, novacula Occami), also known as the principle of parsimony or the law of parsimony ( la, lex parsimoniae), is the problem-solving principle that "entities should not be multiplied beyond neces ...

, and the Principle of Multiple Explanations. It is a machine independent method of assigning a probability value to each hypothesis (algorithm/program) that explains a given observation, with the simplest hypothesis (the shortest program) having the highest probability and the increasingly complex hypotheses receiving increasingly small probabilities. Solomonoff founded the theory of universal

inductive inference Inductive reasoning is a method of reasoning in which a general principle is derived from a body of observations. It consists of making broad generalizations based on specific observations. Inductive reasoning is distinct from ''deductive'' rea ...

, which is based on solid philosophical foundations and has its root in

and

. The theory uses algorithmic probability in a

Bayesian Thomas Bayes (/beɪz/; c. 1701 – 1761) was an English statistician, philosopher, and Presbyterian minister. Bayesian () refers either to a range of concepts and approaches that relate to statistical methods based on Bayes' theorem, or a follower ...

framework. The universal prior is taken over the class of all computable measures; no hypothesis will have a zero probability. This enables

Bayes' rule In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule), named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For exam ...

(of causation) to be used to predict the most likely next event in a series of events, and how likely it will be. Although he is best known for

and his general theory of

, he made many other important discoveries throughout his life, most of them directed toward his goal in artificial intelligence: to develop a machine that could solve hard problems using probabilistic methods.

Life history through 1964

Ray Solomonoff was born on July 25, 1926, in

Cleveland, Ohio Cleveland ( ), officially the City of Cleveland, is a city in the U.S. state of Ohio and the county seat of Cuyahoga County. Located in the northeastern part of the state, it is situated along the southern shore of Lake Erie, across the U ...

, son of Jewish

Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries * Rossiyane (), Russian language term for all citizens and p ...

immigrants Phillip Julius and Sarah Mashman Solomonoff. He attended Glenville High School, graduating in 1944. In 1944 he joined the

United States Navy The United States Navy (USN) is the maritime service branch of the United States Armed Forces and one of the eight uniformed services of the United States. It is the largest and most powerful navy in the world, with the estimated tonnage ...

as Instructor in Electronics. From 1947–1951 he attended the

University of Chicago The University of Chicago (UChicago, Chicago, U of C, or UChi) is a private research university in Chicago, Illinois. Its main campus is located in Chicago's Hyde Park neighborhood. The University of Chicago is consistently ranked among the ...

, studying under Professors such as

Rudolf Carnap Rudolf Carnap (; ; 18 May 1891 – 14 September 1970) was a German-language philosopher who was active in Europe before 1935 and in the United States thereafter. He was a major member of the Vienna Circle and an advocate of logical positivism. ...

and

Enrico Fermi Enrico Fermi (; 29 September 1901 – 28 November 1954) was an Italian (later naturalized American) physicist and the creator of the world's first nuclear reactor, the Chicago Pile-1. He has been called the "architect of the nuclear age" and ...

, and graduated with an M.S. in Physics in 1951. From his earliest years he was motivated by the pure joy of mathematical discovery and by the desire to explore where no one had gone before. At age of 16, in 1942, he began to search for a general method to solve mathematical problems. In 1952 he met

Marvin Minsky Marvin Lee Minsky (August 9, 1927 – January 24, 2016) was an American cognitive and computer scientist concerned largely with research of artificial intelligence (AI), co-founder of the Massachusetts Institute of Technology's AI laboratory, an ...

, John McCarthy and others interested in machine intelligence. In 1956 Minsky and McCarthy and others organized the Dartmouth Summer Research Conference on Artificial Intelligence, where Solomonoff was one of the original 10 invitees—he, McCarthy, and Minsky were the only ones to stay all summer. It was for this group that

Artificial Intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...

was first named as a science. Computers at the time could solve very specific mathematical problems, but not much else. Solomonoff wanted to pursue a bigger question, how to make machines more generally intelligent, and how computers could use probability for this purpose.

Work history through 1964

He wrote three papers, two with

Anatol Rapoport Anatol Rapoport ( uk, Анатолій Борисович Рапопо́рт; russian: Анато́лий Бори́сович Рапопо́рт; May 22, 1911January 20, 2007) was an American mathematical psychologist. He contributed to general ...

, in 1950–52, that are regarded as the earliest statistical analysis of networks. He was one of the 10 attendees at the 1956 Dartmouth Summer Research Project on Artificial Intelligence. He wrote and circulated a report among the attendees: "An Inductive Inference Machine". It viewed machine learning as probabilistic, with an emphasis on the importance of training sequences, and on the use of parts of previous solutions to problems in constructing trial solutions for new problems. He published a version of his findings in 1957. These were the first papers to be written on probabilistic machine learning. In the late 1950s, he invented probabilistic languages and their associated grammars. A probabilistic language assigns a probability value to every possible string. Generalizing the concept of probabilistic grammars led him to his discovery in 1960 of Algorithmic Probability and General Theory of Inductive Inference. Prior to the 1960s, the usual method of calculating probability was based on frequency: taking the ratio of favorable results to the total number of trials. In his 1960 publication, and, more completely, in his 1964 publications, Solomonoff seriously revised this definition of probability. He called this new form of probability "Algorithmic Probability" and showed how to use it for prediction in his theory of inductive inference. As part of this work, he produced the philosophical foundation for the use of Bayes rule of causation for prediction. The basic theorem of what was later called

Kolmogorov Complexity In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that pr ...

was part of his General Theory. Writing in 1960, he begins: "Consider a very long sequence of symbols ... We shall consider such a sequence of symbols to be 'simple' and have a high a priori probability, if there exists a very brief description of this sequence – using, of course, some sort of stipulated description method. More exactly, if we use only the symbols 0 and 1 to express our description, we will assign the probability 2^−''N'' to a sequence of symbols if its shortest possible binary description contains ''N'' digits." The probability is with reference to a particular

universal Turing machine In computer science, a universal Turing machine (UTM) is a Turing machine that can simulate an arbitrary Turing machine on arbitrary input. The universal machine essentially achieves this by reading both the description of the machine to be simu ...

. Solomonoff showed and in 1964 proved that the choice of machine, while it could add a constant factor would not change the probability ratios very much. These probabilities are machine independent. In 1965, the Russian mathematician Kolmogorov independently published similar ideas. When he became aware of Solomonoff's work, he acknowledged Solomonoff, and for several years, Solomonoff's work was better known in the Soviet Union than in the Western World. The general consensus in the scientific community, however, was to associate this type of complexity with Kolmogorov, who was more concerned with randomness of a sequence. Algorithmic Probability and Universal (Solomonoff) Induction became associated with Solomonoff, who was focused on prediction — the extrapolation of a sequence. Later in the same 1960 publication Solomonoff describes his extension of the single-shortest-code theory. This is Algorithmic Probability. He states: "It would seem that if there are several different methods of describing a sequence, each of these methods should be given ''some'' weight in determining the probability of that sequence." He then shows how this idea can be used to generate the universal a priori probability distribution and how it enables the use of Bayes rule in inductive inference. Inductive inference, by adding up the predictions of all models describing a particular sequence, using suitable weights based on the lengths of those models, gets the probability distribution for the extension of that sequence. This method of prediction has since become known as Solomonoff induction. He enlarged his theory, publishing a number of reports leading up to the publications in 1964. The 1964 papers give a more detailed description of Algorithmic Probability, and Solomonoff Induction, presenting five different models, including the model popularly called the Universal Distribution.

Work history from 1964 to 1984

Other scientists who had been at the 1956 Dartmouth Summer Conference (such as Newell and Simon) were developing the branch of Artificial Intelligence that used machines governed by if-then rules, fact based. Solomonoff was developing the branch of Artificial Intelligence that focussed on probability and prediction; his specific view of A.I. described machines that were governed by the Algorithmic Probability distribution. The machine generates theories together with their associated probabilities, to solve problems, and as new problems and theories develop, updates the probability distribution on the theories. In 1968 he found a proof for the efficacy of Algorithmic Probability, but mainly because of lack of general interest at that time, did not publish it until 10 years later. In his report, he published the proof for the convergence theorem. In the years following his discovery of Algorithmic Probability he focused on how to use this probability and Solomonoff Induction in actual prediction and problem solving for A.I. He also wanted to understand the deeper implications of this probability system. One important aspect of Algorithmic Probability is that it is complete and incomputable. In the 1968 report he shows that Algorithmic Probability is ''complete''; that is, if there is any describable regularity in a body of data, Algorithmic Probability will eventually discover that regularity, requiring a relatively small sample of that data. Algorithmic Probability is the only probability system known to be complete in this way. As a necessary consequence of its completeness it is ''incomputable''. The incomputability is because some algorithms—a subset of those that are partially recursive—can never be evaluated fully because it would take too long. But these programs will at least be recognized as possible solutions. On the other hand, any ''computable'' system is ''incomplete''. There will always be descriptions outside that system's search space, which will never be acknowledged or considered, even in an infinite amount of time. Computable prediction models hide this fact by ignoring such algorithms. In many of his papers he described how to search for solutions to problems and in the 1970s and early 1980s developed what he felt was the best way to update the machine. The use of probability in A.I., however, did not have a completely smooth path. In the early years of A.I., the relevance of probability was problematic. Many in the A.I. community felt probability was not usable in their work. The area of pattern recognition did use a form of probability, but because there was no broadly based theory of how to incorporate probability in any A.I. field, most fields did not use it at all. There were, however, researchers such as

Pearl A pearl is a hard, glistening object produced within the soft tissue (specifically the mantle) of a living shelled mollusk or another animal, such as fossil conulariids. Just like the shell of a mollusk, a pearl is composed of calcium ca ...

and Peter Cheeseman who argued that probability could be used in artificial intelligence. About 1984, at an annual meeting of the American Association for Artificial Intelligence (AAAI), it was decided that probability was in no way relevant to A.I. A protest group formed, and the next year there was a workshop at the AAAI meeting devoted to "Probability and Uncertainty in AI." This yearly workshop has continued to the present day. As part of the protest at the first workshop, Solomonoff gave a paper on how to apply the universal distribution to problems in A.I. This was an early version of the system he has been developing since that time. In that report, he described the search technique he had developed. In search problems, the best order of search, is time

T_i/P_i

, where

T_i

is the time needed to test the trial and

P_i

is the probability of success of that trial. He called this the "Conceptual Jump Size" of the problem. Levin's search technique approximates this order, and so Solomonoff, who had studied Levin's work, called this search technique Lsearch.

Work history — the later years

In other papers he explored how to limit the time needed to search for solutions, writing on resource bounded search. The search space is limited by available time or computation cost rather than by cutting out search space as is done in some other prediction methods, such as Minimum Description Length. Throughout his career Solomonoff was concerned with the potential benefits and dangers of A.I., discussing it in many of his published reports. In 1985 he analyzed a likely evolution of A.I., giving a formula predicting when it would reach the "Infinity Point". This work is part of the history of thought about a possible

technological singularity The technological singularity—or simply the singularity—is a hypothetical future point in time at which technological growth becomes uncontrollable and irreversible, resulting in unforeseeable changes to human civilization. According to the ...

. Originally algorithmic induction methods extrapolated ordered sequences of strings. Methods were needed for dealing with other kinds of data. A 1999 report, generalizes the Universal Distribution and associated convergence theorems to unordered sets of strings and a 2008 report, to unordered pairs of strings. In 1997, 2003 and 2006 he showed that incomputability and subjectivity are both necessary and desirable characteristics of any high performance induction system. In 1970 he formed his own one man company, Oxbridge Research, and continued his research there except for periods at other institutions such as MIT, University of Saarland in Germany and the Dalle Molle Institute for Artificial Intelligence in Lugano, Switzerland. In 2003 he was the first recipient of the Kolmogorov Award by The Computer Learning Research Center at the Royal Holloway, University of London, where he gave the inaugural Kolmogorov Lecture. Solomonoff was most recently a visiting professor at the CLRC. In 2006 he spoke at AI@50, "Dartmouth Artificial Intelligence Conference: the Next Fifty Years" commemorating the fiftieth anniversary of the original Dartmouth summer study group. Solomonoff was one of five original participants to attend. In Feb. 2008, he gave the keynote address at the Conference "Current Trends in the Theory and Application of Computer Science" (CTTACS), held at

Notre Dame University The University of Notre Dame du Lac, known simply as Notre Dame ( ) or ND, is a private Catholic research university in Notre Dame, Indiana, outside the city of South Bend. French priest Edward Sorin founded the school in 1842. The main campu ...

in Lebanon. He followed this with a short series of lectures, and began research on new applications of Algorithmic Probability. Algorithmic Probability and Solomonoff Induction have many advantages for Artificial Intelligence. Algorithmic Probability gives extremely accurate probability estimates. These estimates can be revised by a reliable method so that they continue to be acceptable. It utilizes search time in a very efficient way. In addition to probability estimates, Algorithmic Probability "has for AI another important value: its multiplicity of models gives us many different ways to understand our data; A description of Solomonoff's life and work prior to 1997 is in "The Discovery of Algorithmic Probability", Journal of Computer and System Sciences, Vol 55, No. 1, pp 73–88, August 1997. The paper, as well as most of the others mentioned here, are available on his website at th
publications page
In an article published the year of his death, a journal article said of Solomonoff: "A very conventional scientist understands his science using a single 'current paradigm'—the way of understanding that is most in vogue at the present time. A more creative scientist understands his science in very many ways, and can more easily create new theories, new ways of understanding, when the 'current paradigm' no longer fits the current data"."Algorithmic Probability, Theory and Applications," In Information Theory and Statistical Learning, Eds Frank Emmert-Streib and Matthias Dehmer, Springer Science and Business Media, 2009, p. 11

References

External links

Ray Solomonoff's Homepage
* For a detailed description of Algorithmic Probability, se
"Algorithmic Probability"
by Hutter, Legg and Vitanyi in the scholarpedia.
Ray Solomonoff (1926–2009) 85th memorial conference, Melbourne, Australia, Nov/Dec 2011
an
Proceedings, "Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence", Springer, LNAI/LNCS 7070

Pioneer of machine learning celebrated 14 December 2011
{{DEFAULTSORT:Solomonoff, Ray American information theorists 1926 births 2009 deaths Glenville High School alumni Scientists from Cleveland Artificial intelligence researchers Theoretical computer scientists