HOME

TheInfoList



OR:

Computational linguistics is an interdisciplinary field concerned with the
computational modelling Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be deter ...
of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon
linguistics Linguistics is the science, scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure ...
,
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
,
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
, mathematics,
logic Logic is the study of correct reasoning. It includes both formal and informal logic. Formal logic is the science of deductively valid inferences or of logical truths. It is a formal science investigating how conclusions follow from premise ...
, philosophy, cognitive science, cognitive psychology, psycholinguistics,
anthropology Anthropology is the scientific study of humanity, concerned with human behavior, human biology, cultures, societies, and linguistics, in both the present and past, including past human species. Social anthropology studies patterns of be ...
and
neuroscience Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions and disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, developme ...
, among others.


Sub-fields and related areas

Traditionally, computational linguistics emerged as an area of
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
performed by computer scientists who had specialized in the application of computers to the processing of a natural language. With the formation of the
Association for Computational Linguistics The Association for Computational Linguistics (ACL) is a scientific and professional organization for people working on natural language processing. Its namesake conference is one of the primary high impact conferences for natural language proces ...
(ACL) and the establishment of independent conference series, the field consolidated during the 1970s and 1980s. The Association for Computational Linguistics defines computational linguistics as: The term "computational linguistics" is nowadays (2020) taken to be a near-synonym of natural language processing (NLP) and
language technology Language technology, often called human language technology (HLT), studies methods of how computer programs or electronic devices can analyze, produce, modify or respond to human texts and speech. Working with language technology often requires broa ...
. These terms put a stronger emphasis on aspects of practical applications rather than theoretical inquiry. In practice, they have largely replaced the term "computational linguistics" in the NLP/ACL community, although they specifically refer to the sub-field of applied computational linguistics, only. Computational linguistics has both theoretical and applied components. Theoretical computational linguistics focuses on issues in
theoretical linguistics Theoretical linguistics is a term in linguistics which, like the related term general linguistics, can be understood in different ways. Both can be taken as a reference to theory of language, or the branch of linguistics which inquires into the n ...
and cognitive science. Applied computational linguistics focuses on the practical outcome of modeling human language use. Theoretical computational linguistics includes the development of formal theories of
grammar In linguistics, the grammar of a natural language is its set of structural constraints on speakers' or writers' composition of clauses, phrases, and words. The term can also refer to the study of such constraints, a field that includes domain ...
(
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from L ...
) and
semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comp ...
, often grounded in formal logics and symbolic ( knowledge-based) approaches. Areas of research that are studied by theoretical computational linguistics include: * Computational complexity of natural language, largely modeled on
automata theory Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science. The word ''automata'' comes from the Greek word αὐτόματο ...
, with the application of context-sensitive grammar and linearly bounded
Turing machine A Turing machine is a mathematical model of computation describing an abstract machine that manipulates symbols on a strip of tape according to a table of rules. Despite the model's simplicity, it is capable of implementing any computer algori ...
s. *
Computational semantics Computational semantics is the study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions. It consequently plays an important role in natural-language processing and computatio ...
comprises defining suitable logics for
linguistic meaning Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
representation, automatically constructing them and reasoning with them Applied computational linguistics has been dominated by
statistical methods Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industria ...
, like neural networks and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
, since about 1990. Socher et al. (2012) was an early deep learning tutorial at the ACL 2012, and met with both interest and (at the time) scepticism by most participants. Until then, neural learning was basically rejected because of its lack of statistical interpretability. Until 2015, deep learning had evolved into the major framework of NLP. As for the tasks addressed by applied computational linguistics, see Natural language processing article. This includes classical problems such as the design of POS-taggers (part-of-speech taggers),
parser Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lat ...
s for natural languages, or tasks such as
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
(MT), the sub-division of computational linguistics dealing with having computers translate between languages. As one of the earliest and most difficult applications of computational linguistics, MT draws on many subfields and both theoretical and applied aspects. Traditionally, automatic language translation has been considered a notoriously hard branch of computational linguistics. Aside from dichotomy between theoretical and applied computational linguistics, other divisions of computational into major areas according to different criteria exist, including: * medium of the language being processed, whether spoken or textual:
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ...
and speech synthesis deal with how spoken language can be understood or created using computers. * task being performed, e.g., whether analyzing language (recognition) or synthesizing language (generation): Parsing and generation are sub-divisions of computational linguistics dealing respectively with taking language apart and putting it together. Traditionally, applications of computers to address research problems in other branches of linguistics have been described as tasks within computational linguistics. Among other aspects, this includes * Computer-aided corpus linguistics, which has been used since the 1970s as a way to make detailed advances in the field of discourse analysis * Simulation and study of language evolution in
historical linguistics Historical linguistics, also termed diachronic linguistics, is the scientific study of language change over time. Principal concerns of historical linguistics include: # to describe and account for observed changes in particular languages # ...
/
glottochronology Glottochronology (from Attic Greek γλῶττα ''tongue, language'' and χρόνος ''time'') is the part of lexicostatistics which involves comparative linguistics and deals with the chronological relationship between languages.Sheila Embleton ...
.


Origins

Computational linguistics is often grouped within the field of
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
but was present before the development of artificial intelligence. Computational linguistics originated with efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Since computers can make arithmetic (systematic) calculations much faster and more accurately than humans, it was thought to be only a short matter of time before they could also begin to process language. Computational and quantitative methods are also used historically in the attempted reconstruction of earlier forms of modern languages and sub-grouping modern languages into language families. Earlier methods, such as
lexicostatistics Lexicostatistics is a method of comparative linguistics that involves comparing the percentage of lexical cognates between languages to determine their relationship. Lexicostatistics is related to the comparative method but does not reconstruct a ...
and
glottochronology Glottochronology (from Attic Greek γλῶττα ''tongue, language'' and χρόνος ''time'') is the part of lexicostatistics which involves comparative linguistics and deals with the chronological relationship between languages.Sheila Embleton ...
, have been proven to be premature and inaccurate. However, recent interdisciplinary studies that borrow concepts from biological studies, especially
gene mapping Gene mapping describes the methods used to identify the locus of a gene and the distances between genes. Gene mapping can also describe the distances between different sites within a gene. The essence of all genome mapping is to place a c ...
, have proved to produce more sophisticated analytical tools and more reliable results. When
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
(also known as mechanical translation) failed to yield accurate translations right away, automated processing of human languages was recognized as far more complex than had originally been assumed. Computational linguistics was born as the name of the new field of study devoted to developing
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
s and software for intelligently processing language data. The term "computational linguistics" itself was first coined by David Hays, a founding member of both the Association for Computational Linguistics (ACL) and the
International Committee on Computational Linguistics The International Committee on Computational Linguistics (ICCL) was founded by Dr. David Hays of the RAND Corporation in 1965 to promote the biennial International Conference on Computational Linguistics, which since the third conference in Stock ...
(ICCL). To translate one language into another, it was observed that one had to understand the
grammar In linguistics, the grammar of a natural language is its set of structural constraints on speakers' or writers' composition of clauses, phrases, and words. The term can also refer to the study of such constraints, a field that includes domain ...
of both languages, including both
morphology Morphology, from the Greek and meaning "study of shape", may refer to: Disciplines * Morphology (archaeology), study of the shapes or forms of artifacts * Morphology (astronomy), study of the shape of astronomical objects such as nebulae, galaxies ...
(the grammar of word forms) and syntax (the grammar of sentence structure). To understand syntax, one had to also understand the
semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comp ...
and the lexicon (or 'vocabulary'), and even something of the
pragmatics In linguistics and related fields, pragmatics is the study of how context contributes to meaning. The field of study evaluates how human language is utilized in social interactions, as well as the relationship between the interpreter and the in ...
of language use. Thus, what started as an effort to translate between languages evolved into an entire discipline devoted to understanding how to represent and process natural languages using computers. Nowadays research within the scope of computational linguistics is done at computational linguistics departments, computational linguistics laboratories,
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
departments, and linguistics departments. Some research in the field of computational linguistics aims to create working speech or text processing systems while others aim to create a system allowing human-machine interaction. Programs meant for human-machine communication are called conversational agents.


Approaches

Just as computational linguistics can be performed by experts in a variety of fields and through a wide assortment of departments, so too can the research fields broach a diverse range of topics. The following sections discuss some of the literature available across the entire field broken into four main area of discourse: developmental linguistics, structural linguistics, linguistic production, and linguistic comprehension.


Developmental approaches

Language is a cognitive skill that develops throughout the life of an individual. This developmental process has been examined using several techniques, and a computational approach is one of them. Human
language development Language development in humans is a process starting early in life. Infants start without knowing a language, yet by 10 months, babies can distinguish speech sounds and engage in babbling. Some research has shown that the earliest learning begi ...
does provide some constraints which make it harder to apply a computational method to understanding it. For instance, during language acquisition, children are largely only exposed to positive evidence. This means that during the linguistic development of an individual, the only evidence for what is a correct form is provided, and no evidence for what is not correct. This is insufficient information for a simple hypothesis testing procedure for information as complex as language,Braine, M.D.S. (1971). On two types of models of the internalization of grammars. In D.I. Slobin (Ed.), The ontogenesis of grammar: A theoretical perspective. New York: Academic Press. and so provides certain boundaries for a computational approach to modeling language development and acquisition in an individual. Attempts have been made to model the developmental process of language acquisition in children from a computational angle, leading to both statistical grammars and connectionist models.Powers, D.M.W. & Turk, C.C.R. (1989). ''Machine Learning of Natural Language''. Springer-Verlag. . Work in this realm has also been proposed as a method to explain the
evolution of language The origin of language (spoken and signed, as well as language-related technological systems such as writing), its relationship with human evolution, and its consequences have been subjects of study for centuries. Scholars wishing to study th ...
through history. Using models, it has been shown that languages can be learned with a combination of simple input presented incrementally as the child develops better memory and longer attention span. This was simultaneously posed as a reason for the long developmental period of human children. Both conclusions were drawn because of the strength of the
artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
which the project created. The ability of infants to develop language has also been modeled using robots in order to test linguistic theories. Enabled to learn as children might, a model was created based on an
affordance Affordance is what the environment offers the individual. American psychologist James J. Gibson coined the term in his 1966 book, ''The Senses Considered as Perceptual Systems'', and it occurs in many of his earlier essays. However, his best-know ...
model in which mappings between actions, perceptions, and effects were created and linked to spoken words. Crucially, these robots were able to acquire functioning word-to-meaning mappings without needing grammatical structure, vastly simplifying the learning process and shedding light on information which furthers the current understanding of linguistic development. It is important to note that this information could only have been empirically tested using a computational approach. As our understanding of the linguistic development of an individual within a lifetime is continually improved using neural networks and learning robotic systems, it is also important to keep in mind that languages themselves change and develop through time. Computational approaches to understanding this phenomenon have unearthed very interesting information. Using the
Price equation In the theory of evolution and natural selection, the Price equation (also known as Price's equation or Price's theorem) describes how a trait or allele changes in frequency over time. The equation uses a covariance between a trait and fitness, ...
and Pólya urn dynamics, researchers have created a system which not only predicts future linguistic evolution but also gives insight into the evolutionary history of modern-day languages. This modeling effort achieved, through computational linguistics, what would otherwise have been impossible. It is clear that the understanding of linguistic development in humans as well as throughout evolutionary time has been fantastically improved because of advances in computational linguistics. The ability to model and modify systems at will affords science an ethical method of testing hypotheses that would otherwise be intractable.


Structural approaches

To create better computational models of language, an understanding of language's structure is crucial. To this end, the
English language English is a West Germanic language of the Indo-European language family, with its earliest forms spoken by the inhabitants of early medieval England. It is named after the Angles, one of the ancient Germanic peoples that migrated to the ...
has been meticulously studied using computational approaches to better understand how the language works on a structural level. One of the most important pieces of being able to study linguistic structure is the availability of large linguistic corpora or samples. This grants computational linguists the raw data necessary to run their models and gain a better understanding of the underlying structures present in the vast amount of data which is contained in any single language. One of the most cited English linguistic corpora is the Penn
Treebank In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empiri ...
. Derived from widely-different sources, such as IBM computer manuals and transcribed telephone conversations, this corpus contains over 4.5 million words of American English. This corpus has been primarily annotated using part-of-speech tagging and syntactic bracketing and has yielded substantial empirical observations related to language structure. Theoretical approaches to the structure of languages have also been developed. These works allow computational linguistics to have a framework within which to work out hypotheses that will further the understanding of the language in a myriad of ways. One of the original theoretical theses on the internalization of
grammar In linguistics, the grammar of a natural language is its set of structural constraints on speakers' or writers' composition of clauses, phrases, and words. The term can also refer to the study of such constraints, a field that includes domain ...
and structure of language proposed two types of models. In these models, rules or patterns learned increase in strength with the frequency of their encounter. The work also created a question for computational linguists to answer: how does an infant learn a specific and non-normal grammar (
Chomsky normal form In formal language theory, a context-free grammar, ''G'', is said to be in Chomsky normal form (first described by Noam Chomsky) if all of its production rules are of the form: : ''A'' → ''BC'',   or : ''A'' → ''a'',   or : ''S'' ...
) without learning an overgeneralized version and getting stuck? Theoretical efforts like these set the direction for research to go early in the lifetime of a field of study, and are crucial to the growth of the field. Structural information about languages allows for the discovery and implementation of similarity recognition between pairs of text utterances. For instance, it has recently been proven that based on the structural information present in patterns of human discourse, conceptual
recurrence plots In descriptive statistics and chaos theory, a recurrence plot (RP) is a plot showing, for each moment i in time, the times at which the state of a dynamical system returns to the previous state at i, i.e., when the phase space trajectory visits r ...
can be used to model and visualize trends in data and create reliable measures of similarity between natural textual utterances. This technique is a strong tool for further probing the structure of human discourse. Without the computational approach to this question, the vastly complex information present in discourse data would have remained inaccessible to scientists. Information regarding the structural data of a language is available for
English English usually refers to: * English language * English people English may also refer to: Peoples, culture, and language * ''English'', an adjective for something of, from, or related to England ** English national ide ...
as well as other languages, such as
Japanese Japanese may refer to: * Something from or related to Japan, an island country in East Asia * Japanese language, spoken mainly in Japan * Japanese people, the ethnic group that identifies with Japan through ancestry or culture ** Japanese diaspor ...
. Using computational methods, Japanese sentence corpora were analyzed and a pattern of
log-normality In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable is log-normally distributed, then has a norma ...
was found in relation to sentence length. Though the exact cause of this lognormality remains unknown, it is precisely this sort of information which computational linguistics is designed to uncover. This information could lead to further important discoveries regarding the underlying structure of Japanese and could have any number of effects on the understanding of Japanese as a language. Computational linguistics allows for very exciting additions to the scientific knowledge base to happen quickly and with very little room for doubt. Without a computational approach to the structure of linguistic data, much of the information that is available now would still be hidden under the vastness of data within any single language. Computational linguistics allows scientists to parse huge amounts of data reliably and efficiently, creating the possibility for discoveries unlike any seen in most other approaches.


Production approaches

The production of language is equally as complex in the information it provides and the necessary skills which a fluent producer must have. That is to say, comprehension is only half the problem of communication. The other half is how a system produces language, and computational linguistics has made interesting discoveries in this area. In a now famous paper published in 1950
Alan Turing Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical co ...
proposed the possibility that machines might one day have the ability to "think". As a
thought experiment A thought experiment is a hypothetical situation in which a hypothesis, theory, or principle is laid out for the purpose of thinking through its consequences. History The ancient Greek ''deiknymi'' (), or thought experiment, "was the most anc ...
for what might define the concept of thought in machines, he proposed an "imitation test" in which a human subject has two text-only conversations, one with a fellow human and another with a machine attempting to respond like a human. Turing proposes that if the subject cannot tell the difference between the human and the machine, it may be concluded that the machine is capable of thought. Today this test is known as the
Turing test The Turing test, originally called the imitation game by Alan Turing in 1950, is a test of a machine's ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human. Turing proposed that a human evaluato ...
and it remains an influential idea in the area of artificial intelligence. One of the earliest and best-known examples of a computer program designed to converse naturally with humans is the
ELIZA ELIZA is an early natural language processing computer program created from 1964 to 1966 at the MIT Artificial Intelligence Laboratory by Joseph Weizenbaum. Created to demonstrate the superficiality of communication between humans and machines, ...
program developed by
Joseph Weizenbaum Joseph Weizenbaum (8 January 1923 – 5 March 2008) was a German American computer scientist and a professor at MIT. The Weizenbaum Award is named after him. He is considered one of the fathers of modern artificial intelligence. Life and caree ...
at
MIT The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the m ...
in 1966. The program emulated a Rogerian psychotherapist when responding to written statements and questions posed by a user. It appeared capable of understanding what was said to it and responding intelligently, but in truth, it simply followed a pattern matching routine that relied on only understanding a few keywords in each sentence. Its responses were generated by recombining the unknown parts of the sentence around properly translated versions of the known words. For example, in the phrase "It seems that you hate me" ELIZA understands "you" and "me" which matches the general pattern "you ome wordsme", allowing ELIZA to update the words "you" and "me" to "I" and "you" and replying "What makes you think I hate you?". In this example ELIZA has no understanding of the word "hate", but it is not required for a logical response in the context of this type of psychotherapy. Some projects are still trying to solve the problem which first started computational linguistics off as its field in the first place. However, methods have become more refined, and consequently, the results generated by computational linguists have become more enlightening. To improve computer translation, several models have been compared, including
hidden Markov models A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...
, smoothing techniques, and the specific refinements of those to apply them to verb translation. The model which was found to produce the most natural translations of
German German(s) may refer to: * Germany (of or related to) ** Germania (historical use) * Germans, citizens of Germany, people of German ancestry, or native speakers of the German language ** For citizens of Germany, see also German nationality law **Ge ...
and French words was a refined alignment model with a first-order dependence and a fertility model. They also provide efficient training algorithms for the models presented, which can give other scientists the ability to improve further on their results. This type of work is specific to computational linguistics and has applications that could vastly improve understanding of how language is produced and comprehended by computers. Work has also been done in making computers produce language in a more naturalistic manner. Using linguistic input from humans, algorithms have been constructed which are able to modify a system's style of production based on a factor such as linguistic input from a human, or more abstract factors like politeness or any of the five main dimensions of personality. This work takes a computational approach via
parameter estimation Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their valu ...
models to categorize the vast array of linguistic styles we see across individuals and simplify it for a computer to work in the same way, making
human–computer interaction Human–computer interaction (HCI) is research in the design and the use of computer technology, which focuses on the interfaces between people (users) and computers. HCI researchers observe the ways humans interact with computers and design te ...
much more natural.


Text-based interactive approach

Many of the earliest and simplest models of human–computer interaction, such as ELIZA for example, involve a text-based input from the user to generate a response from the computer. By this method, words typed by a user trigger the computer to recognize specific patterns and reply accordingly, through a process known as keyword spotting.


Speech-based interactive approach

Recent technologies have placed more of an emphasis on speech-based interactive systems. These systems, such as
Siri Siri ( ) is a virtual assistant that is part of Apple Inc.'s iOS, iPadOS, watchOS, macOS, tvOS, and audioOS operating systems. It uses voice queries, gesture based control, focus-tracking and a natural-language user interface to answer qu ...
of the
iOS iOS (formerly iPhone OS) is a mobile operating system created and developed by Apple Inc. exclusively for its hardware. It is the operating system that powers many of the company's mobile devices, including the iPhone; the term also include ...
operating system, operate on a similar pattern-recognizing technique as that of text-based systems, but with the former, the user input is conducted through
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ...
. This branch of linguistics involves the processing of the user's speech as sound waves and the interpreting of the acoustics and language patterns for the computer to recognize the input.


Comprehension approaches

Much of the focus of modern computational linguistics is on comprehension. With the proliferation of the internet and the abundance of easily accessible written human language, the ability to create a program capable of understanding human language would have many broad and exciting possibilities, including improved search engines, automated customer service, and online education. Early work in comprehension included applying Bayesian statistics to the task of optical character recognition, as illustrated by Bledsoe and Browing in 1959 in which a large dictionary of possible letters was generated by "learning" from example letters and then the probability that any one of those learned examples matched the new input was combined to make a final decision. Other attempts at applying Bayesian statistics to language analysis included the work of Mosteller and Wallace (1963) in which an analysis of the words used in ''
The Federalist Papers ''The Federalist Papers'' is a collection of 85 articles and essays written by Alexander Hamilton, James Madison, and John Jay under the collective pseudonym "Publius" to promote the ratification of the Constitution of the United States. The c ...
'' was used to attempt to determine their authorship (concluding that Madison most likely authored the majority of the papers). In 1971
Terry Winograd Terry Allen Winograd (born February 24, 1946) is an American professor of computer science at Stanford University, and co-director of the Stanford Human–Computer Interaction Group. He is known within the philosophy of mind and artificial intel ...
developed an early natural language processing engine capable of interpreting naturally written commands within a simple rule-governed environment. The primary language parsing program in this project was called
SHRDLU SHRDLU was an early natural-language understanding computer program, developed by Terry Winograd at MIT in 1968–1970. In the program, the user carries on a conversation with the computer, moving objects, naming collections and querying the ...
, which was capable of carrying out a somewhat natural conversation with the user giving it commands, but only within the scope of the toy environment designed for the task. This environment consisted of different shaped and colored blocks, and SHRDLU was capable of interpreting commands such as "Find a block which is taller than the one you are holding and put it into the box." and asking questions such as "I don't understand which pyramid you mean." in response to the user's input. While impressive, this kind of natural language processing has proven much more difficult outside the limited scope of the toy environment. Similarly, a project developed by
NASA The National Aeronautics and Space Administration (NASA ) is an independent agencies of the United States government, independent agency of the US federal government responsible for the civil List of government space agencies, space program ...
called
LUNAR Lunar most commonly means "of or relating to the Moon". Lunar may also refer to: Arts and entertainment * ''Lunar'' (series), a series of video games * "Lunar" (song), by David Guetta * "Lunar", a song by Priestess from the 2009 album ''Prior t ...
was designed to provide answers to naturally written questions about the geological analysis of lunar rocks returned by the Apollo missions. These kinds of problems are referred to as question answering. Initial attempts at understanding spoken language were based on work done in the 1960s and 1970s in signal modeling where an unknown signal is analyzed to look for patterns and to make predictions based on its history. An initial and somewhat successful approach to applying this kind of signal modeling to language was achieved with the use of hidden Markov models as detailed by Rabiner in 1989. This approach attempts to determine probabilities for the arbitrary number of models that could be being used in generating speech as well as modeling the probabilities for various words generated from each of these possible models. Similar approaches were employed in early
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the ...
attempts starting in the late 70s at IBM using word/part-of-speech pair probabilities. More recently these kinds of statistical approaches have been applied to more difficult tasks such as topic identification using Bayesian parameter estimation to infer topic probabilities in text documents.


Applications

Applied computational linguistics is largely equivalent with natural language processing. Example applications for end users include speech recognition software, such as Apple's Siri feature, spellcheck tools, speech synthesis programs, which are often used to demonstrate pronunciation or help disabled people, and machine translation programs and websites, such as Google Translate. Computational linguistics are also helpful in situations involving
social media Social media are interactive media technologies that facilitate the creation and sharing of information, ideas, interests, and other forms of expression through virtual communities and networks. While challenges to the definition of ''social medi ...
and the
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
, e.g., for providing content filters in chatrooms or on website searches, for grouping and organizing content through social media mining, document retrieval and clustering. For instance, if a person searches "red, large, four-wheeled vehicle," to find pictures of a red truck, the search engine will still find the information desired by matching words such as "four-wheeled" with "car". Computational approaches are also important to support linguistic research, e.g., in corpus linguistics or
historical linguistics Historical linguistics, also termed diachronic linguistics, is the scientific study of language change over time. Principal concerns of historical linguistics include: # to describe and account for observed changes in particular languages # ...
. As for the study of change over time, computational methods can contribute to the modeling and identification of language familiesBowern, Claire. "Computational phylogenetics." Annual Review of Linguistics 4 (2018): 281-296. (see further quantitative comparative linguistics or phylogenetics), as well as the modeling of changes in sound and meaning.


Legacy

The subject of computational linguistics has had a recurring impact on popular culture: * The Star Trek franchise features heavily classical NLP applications, most notably
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
( universal translator), natural language user interfaces and question answering. * The 1983 film ''
WarGames ''WarGames'' is a 1983 American science fiction techno-thriller film written by Lawrence Lasker and Walter F. Parkes and directed by John Badham. The film, which stars Matthew Broderick, Dabney Coleman, John Wood, and Ally Sheedy, follow ...
'' features a young computer hacker who interacts with an artificially intelligent supercomputer. * A 1997 film, ''
Conceiving Ada ''Conceiving Ada'' is a 1997 film produced, written, and directed by Lynn Hershman Leeson. Henry S. Rosenthal was co-producer of the film. The cinematography was by Hiro Narita and Bill Zarchy. Synopsis Emmy Coer is a computer scientist obsessed ...
'', focuses on Ada Lovelace, considered one of the first computer programmers, as well as themes of computational linguistics. * ''
Her Her is the objective and possessive form of the English-language feminine pronoun She (pronoun), she. Her, HER or H.E.R. may also refer to: Arts, entertainment and media Music * H.E.R. (born 1997), American singer **H.E.R. (album), ''H.E.R.'' ...
,'' a 2013 film, depicts a man's interactions with the "world's first artificially intelligent operating system." * The 2014 film ''
The Imitation Game ''The Imitation Game'' is a 2014 American historical drama film directed by Morten Tyldum and written by Graham Moore, based on the 1983 biography '' Alan Turing: The Enigma'' by Andrew Hodges. The film's title quotes the name of the game c ...
'' follows the life of computer scientist Alan Turing, developer of the Turing Test. * The 2015 film '' Ex Machina'' centers around human interaction with artificial intelligence. * The 2016 film '' Arrival'', based on
Ted Chiang Ted Chiang (born 1967) is an American science fiction writer. His work has won four Nebula awards, four Hugo awards, the John W. Campbell Award for Best New Writer, and six Locus awards. His short story "Story of Your Life" was the basis of the ...
's
Story of Your Life "Story of Your Life" is a science fiction novella by American writer Ted Chiang, first published in '' Starlight 2'' in 1998, and in 2002 in Chiang's collection of short stories, ''Stories of Your Life and Others''. Its major themes are languag ...
, takes a whole new approach of linguistics to communicate with advanced alien race called heptapods.


See also

*
Artificial intelligence in fiction Artificial intelligence is a recurrent theme in science fiction, whether utopian, emphasising the potential benefits, or dystopian, emphasising the dangers. The notion of machines with human-like intelligence dates back at least to Samuel Butler ...
*
Collostructional analysis Collostructional analysis is a family of methods developed by (in alphabetical order) Stefan Th. Gries (University of California, Santa Barbara) and Anatol Stefanowitsch (Free University of Berlin). Collostructional analysis aims at measuring the ...
*
Computational lexicology Computational lexicology is a branch of computational linguistics, which is concerned with the use of computers in the study of lexicon. It has been more narrowly described by some scholars (Amsler, 1980) as the use of computers in the study of '' ...
* ''Computational Linguistics'' (journal) * Computational models of language acquisition *
Computational semantics Computational semantics is the study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions. It consequently plays an important role in natural-language processing and computatio ...
*
Computational semiotics Computational semiotics is an interdisciplinary field that applies, conducts, and draws on research in logic, mathematics, the theory and practice of computation, formal and natural language studies, the cognitive sciences generally, and semio ...
*
Computer-assisted reviewing {{Unreferenced, date=September 2008 Computer-assisted reviewing (CAR) tools are pieces of software based on text-comparison and analysis algorithms. These tools focus on the differences between two documents, taking into account each document's typ ...
* Dialog systems *
Glottochronology Glottochronology (from Attic Greek γλῶττα ''tongue, language'' and χρόνος ''time'') is the part of lexicostatistics which involves comparative linguistics and deals with the chronological relationship between languages.Sheila Embleton ...
*
Grammar induction Grammar induction (or grammatical inference) is the process in machine learning of learning a formal grammar (usually as a collection of ''re-write rules'' or '' productions'' or alternatively as a finite state machine or automaton of some kind) fr ...
*
Human speechome project The Human Speechome Project (" speechome" as an approximate rhyme for "genome") is an effort to closely observe and model the language acquisition of a child over the first three years of life. The project was conducted at the Massachusetts Institu ...
* Internet linguistics *
Lexicostatistics Lexicostatistics is a method of comparative linguistics that involves comparing the percentage of lexical cognates between languages to determine their relationship. Lexicostatistics is related to the comparative method but does not reconstruct a ...
* Natural language processing *
Natural language user interface Natural-language user interface (LUI or NLUI) is a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications. In interface d ...
*
Quantitative linguistics Quantitative linguistics (QL) is a sub-discipline of general linguistics and, more specifically, of mathematical linguistics. Quantitative linguistics deals with language learning, language change, and application as well as structure of natural la ...
*
Semantic relatedness Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. These are mathematical tool ...
* Semantometrics * Systemic functional linguistics *
Translation memory A translation memory (TM) is a database that stores "segments", which can be sentences, paragraphs or sentence-like units (headings, titles or elements in a list) that have previously been translated, in order to aid human translators. The translat ...
* Universal Networking Language


References


Further reading

* * Steven Bird, Ewan Klein, and Edward Loper (2009). ''Natural Language Processing with Python''. O'Reilly Media. . * Daniel Jurafsky and James H. Martin (2008). ''Speech and Language Processing'', 2nd edition. Pearson Prentice Hall. . * Mohamed Zakaria KURDI (2016). ''Natural Language Processing and Computational Linguistics: speech, morphology, and syntax'', Volume 1. ISTE-Wiley. . * Mohamed Zakaria KURDI (2017). ''Natural Language Processing and Computational Linguistics: semantics, discourse, and applications'', Volume 2. ISTE-Wiley. .


External links


Association for Computational Linguistics (ACL)
*
ACL Anthology of research papers
*
ACL Wiki for Computational Linguistics

CICLing annual conferences on Computational Linguistics

Computational Linguistics – Applications workshop
*
Language Technology World



The Research Group in Computational Linguistics
{{DEFAULTSORT:Computational Linguistics Formal sciences Cognitive science Computational fields of study