HOME

TheInfoList



OR:

The history of natural language processing describes the advances of
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
(Outline of natural language processing). There is some overlap with the
history of machine translation Machine translation is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one natural language to another. In the 1950s, machine translation became a reality in research, although ref ...
, the history of speech recognition, and the
history of artificial intelligence The history of artificial intelligence (AI) began in antiquity, with myths, stories and rumors of artificial beings endowed with intelligence or consciousness by master craftsmen. The seeds of modern AI were planted by philosophers who attemp ...
.


Research and development

The history of machine translation dates back to the seventeenth century, when philosophers such as
Leibniz Gottfried Wilhelm (von) Leibniz . ( – 14 November 1716) was a German polymath active as a mathematician, philosopher, scientist and diplomat. He is one of the most prominent figures in both the history of philosophy and the history of ma ...
and Descartes put forward proposals for codes which would relate words between languages. All of these proposals remained theoretical, and none resulted in the development of an actual machine. The first patents for "translating machines" were applied for in the mid-1930s. One proposal, by Georges Artsrouni was simply an automatic bilingual dictionary using
paper tape Five- and eight-hole punched paper tape Paper tape reader on the Harwell computer with a small piece of five-hole tape connected in a circle – creating a physical program loop Punched tape or perforated paper tape is a form of data storage ...
. The other proposal, by
Peter Troyanskii Petr Petrovich Troyanskii (January 1894 – 24 May 1950) was a Russian educator and scholar. He was born into the family of a railway repair-shop worker in Orenburg in the Southern Urals. The family had fourteen children and the living was har ...
, a
Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *Rossiyane (), Russian language term for all citizens and peo ...
, was more detailed. It included both the bilingual dictionary, and a method for dealing with grammatical roles between languages, based on
Esperanto Esperanto ( or ) is the world's most widely spoken constructed international auxiliary language. Created by the Warsaw-based ophthalmologist L. L. Zamenhof in 1887, it was intended to be a universal second language for international communi ...
. In 1950,
Alan Turing Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical com ...
published his famous article "
Computing Machinery and Intelligence "Computing Machinery and Intelligence" is a seminal paper written by Alan Turing on the topic of artificial intelligence. The paper, published in 1950 in ''Mind'', was the first to introduce his concept of what is now known as the Turing test to ...
" which proposed what is now called the
Turing test The Turing test, originally called the imitation game by Alan Turing in 1950, is a test of a machine's ability to artificial intelligence, exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human. Turing propos ...
as a criterion of intelligence. This criterion depends on the ability of a computer program to impersonate a human in a real-time written conversation with a human judge, sufficiently well that the judge is unable to distinguish reliably — on the basis of the conversational content alone — between the program and a real human. In 1957,
Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American public intellectual: a linguist, philosopher, cognitive scientist, historian, social critic, and political activist. Sometimes called "the father of modern linguistics", Chomsky is ...
’s
Syntactic Structures ''Syntactic Structures'' is an influential work in linguistics by American linguist Noam Chomsky, originally published in 1957. It is an elaboration of his teacher Zellig Harris's model of transformational generative grammar. A short monograph ...
revolutionized Linguistics with '
universal grammar Universal grammar (UG), in modern linguistics, is the theory of the genetic component of the language faculty, usually credited to Noam Chomsky. The basic postulate of UG is that there are innate constraints on what the grammar of a possible hum ...
', a rule based system of syntactic structures. The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem. However, real progress was much slower, and after the
ALPAC report ALPAC (Automatic Language Processing Advisory Committee) was a committee of seven scientists led by John R. Pierce, established in 1964 by the United States government in order to evaluate the progress in computational linguistics in general and ...
in 1966, which found that ten years long research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted until the late 1980s, when the first
statistical machine translation Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contras ...
systems were developed. Some notably successful NLP systems developed in the 1960s were
SHRDLU SHRDLU was an early natural-language understanding computer program, developed by Terry Winograd at MIT in 1968–1970. In the program, the user carries on a conversation with the computer, moving objects, naming collections and querying the ...
, a natural language system working in restricted "
blocks world The blocks world is a Automated planning and scheduling, planning domain in artificial intelligence. The algorithm is similar to a set of wooden blocks of various shapes and colors sitting on a table. The goal is to build one or more vertical sta ...
s" with restricted vocabularies. In 1969
Roger Schank Roger Carl Schank (born 1946) is an American artificial intelligence theorist, cognitive psychologist, learning scientist, educational reformer, and entrepreneur. Beginning in the late 1960s, he pioneered conceptual dependency theory (within th ...
introduced the
conceptual dependency theory Conceptual dependency theory is a model of natural language understanding used in artificial intelligence systems. Roger Schank at Stanford University introduced the model in 1969, in the early days of artificial intelligence. This model was exten ...
for natural language understanding. This model, partially influenced by the work of
Sydney Lamb Sydney MacDonald Lamb (born May 4, 1929 in Denver, Colorado) is an American linguist and professor at Rice University, whose stratificational grammar is a significant alternative theory to Chomsky's transformational grammar. He has speciali ...
, was extensively used by Schank's students at
Yale University Yale University is a private research university in New Haven, Connecticut. Established in 1701 as the Collegiate School, it is the third-oldest institution of higher education in the United States and among the most prestigious in the wo ...
, such as Robert Wilensky, Wendy Lehnert, and
Janet Kolodner Janet Lynne Kolodner is an American cognitive scientist and learning scientist and Regents' Professor Emerita in the School of Interactive Computing, College of Computing at the Georgia Institute of Technology. She was Founding Editor in Chie ...
. In 1970, William A. Woods introduced the
augmented transition network An augmented transition network or ATN is a type of graph theoretic structure used in the operational definition of formal languages, used especially in parsing relatively complex natural languages, and having wide application in artificial intell ...
(ATN) to represent natural language input. Instead of ''
phrase structure rules Phrase structure rules are a type of rewrite rule used to describe a given language's syntax and are closely associated with the early stages of transformational grammar, proposed by Noam Chomsky in 1957. They are used to break down a natural lang ...
'' ATNs used an equivalent set of
finite state automata A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
that were called recursively. ATNs and their more general format called "generalized ATNs" continued to be used for a number of years. During the 1970s many programmers began to write 'conceptual ontologies', which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, many
chatterbots A chatbot or chatterbot is a software application used to conduct an on-line chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent. Designed to convincingly simulate the way a human would beha ...
were written including
PARRY PARRY was an early example of a chatbot, implemented in 1972 by psychiatrist Kenneth Colby. History PARRY was written in 1972 by psychiatrist Kenneth Colby, then at Stanford University. While ELIZA was a tongue-in-cheek simulation of a Rogeria ...
,
Racter ''Racter'' is an artificial intelligence computer program that generates English language prose at random. It was published in 1984 by Mindscape. History Racter, short for ''raconteur'', was written by William Chamberlain and Thomas Etter. The e ...
, and
Jabberwacky Jabberwacky is a chatterbot created by British programmer Rollo Carpenter. Its stated aim is to "simulate natural human chat in an interesting, entertaining and humorous manner". It is an early attempt at creating an artificial intelligence th ...
. Up to the 1980s, most NLP systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in NLP with the introduction of
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
algorithms for language processing. This was due both to the steady increase in computational power resulting from
Moore's Law Moore's law is the observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years. Moore's law is an observation and projection of a historical trend. Rather than a law of physics, it is an empir ...
and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g.
transformational grammar In linguistics, transformational grammar (TG) or transformational-generative grammar (TGG) is part of the theory of generative grammar, especially of natural languages. It considers grammar to be a system of rules that generate exactly those combin ...
), whose theoretical underpinnings discouraged the sort of
corpus linguistics Corpus linguistics is the study of language, study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feas ...
that underlies the machine-learning approach to language processing.Chomskyan linguistics encourages the investigation of "
corner case In engineering, a corner case (or pathological case) involves a problem or situation that occurs only outside normal operating parameters—specifically one that manifests itself when multiple environmental variables or conditions are simultaneou ...
s" that stress the limits of its theoretical models (comparable to
pathological Pathology is the study of the causal, causes and effects of disease or injury. The word ''pathology'' also refers to the study of disease in general, incorporating a wide range of biology research fields and medical practices. However, when us ...
phenomena in mathematics), typically created using
thought experiment A thought experiment is a hypothetical situation in which a hypothesis, theory, or principle is laid out for the purpose of thinking through its consequences. History The ancient Greek ''deiknymi'' (), or thought experiment, "was the most anci ...
s, rather than the systematic investigation of typical phenomena that occur in real-world data, as is the case in
corpus linguistics Corpus linguistics is the study of language, study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feas ...
. The creation and use of such
corpora Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ...
of real-world data is a fundamental part of machine-learning algorithms for NLP. In addition, theoretical underpinnings of Chomskyan linguistics such as the so-called "
poverty of the stimulus Poverty of the stimulus (POS) is the controversial argument from linguistics that children are not exposed to rich enough data within their linguistic environments to acquire every feature of their language. This is considered evidence contrary to ...
" argument entail that general learning algorithms, as are typically used in machine learning, cannot be successful in language processing. As a result, the Chomskyan paradigm discouraged the application of such models to language processing.
Some of the earliest-used machine learning algorithms, such as
decision tree A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains condit ...
s, produced systems of hard if-then rules similar to existing hand-written rules. Increasingly, however, research has focused on statistical models, which make soft,
probabilistic Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
decisions based on attaching
real-valued In mathematics, value may refer to several, strongly related notions. In general, a mathematical value may be any definite mathematical object. In elementary mathematics, this is most often a number – for example, a real number such as or an i ...
weights to the features making up the input data. The cache language models upon which many
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...
systems now rely are examples of such statistical models. Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks. Many of the notable early successes occurred in the field of
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
, due especially to work at IBM Research, where successively more complicated statistical models were developed. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the
Parliament of Canada The Parliament of Canada (french: Parlement du Canada) is the federal legislature of Canada, seated at Parliament Hill in Ottawa, and is composed of three parts: the King, the Senate, and the House of Commons. By constitutional convention, the ...
and the
European Union The European Union (EU) is a supranational political and economic union of member states that are located primarily in Europe. The union has a total area of and an estimated total population of about 447million. The EU has often been des ...
as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, most other systems depended on corpora specifically developed for the tasks implemented by these systems, which was (and often continues to be) a major limitation in the success of these systems. As a result, a great deal of research has gone into methods of more effectively learning from limited amounts of data. Recent research has increasingly focused on
unsupervised ''Unsupervised'' is an American adult animated sitcom created by David Hornsby, Rob Rosell, and Scott Marder which ran on FX from January 19 to December 20, 2012. The show was created, and for the most part, written by David Hornsby, Scott Marder ...
and semi-supervised learning algorithms. Such algorithms are able to learn from data that has not been hand-annotated with the desired answers, or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than
supervised learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
, and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web se ...
), which can often make up for the inferior results.


Software


References


Bibliography

* * . * {{Russell Norvig 2003. History of artificial intelligence Natural language processing History of linguistics History of software