Computational linguistics is an

interdisciplinary Interdisciplinarity or interdisciplinary studies involves the combination of multiple academic disciplines into one activity (e.g., a research project). It draws knowledge from several fields such as sociology, anthropology, psychology, economi ...

field concerned with the computational modelling of

natural language A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...

, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon

linguistics Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...

computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...

mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...

logic Logic is the study of correct reasoning. It includes both formal and informal logic. Formal logic is the study of deductively valid inferences or logical truths. It examines how conclusions follow from premises based on the structure o ...

philosophy Philosophy ('love of wisdom' in Ancient Greek) is a systematic study of general and fundamental questions concerning topics like existence, reason, knowledge, Value (ethics and social sciences), value, mind, and language. It is a rational an ...

cognitive science Cognitive science is the interdisciplinary, scientific study of the mind and its processes. It examines the nature, the tasks, and the functions of cognition (in a broad sense). Mental faculties of concern to cognitive scientists include percep ...

cognitive psychology Cognitive psychology is the scientific study of human mental processes such as attention, language use, memory, perception, problem solving, creativity, and reasoning. Cognitive psychology originated in the 1960s in a break from behaviorism, whi ...

psycholinguistics Psycholinguistics or psychology of language is the study of the interrelation between linguistic factors and psychological aspects. The discipline is mainly concerned with the mechanisms by which language is processed and represented in the mind ...

anthropology Anthropology is the scientific study of humanity, concerned with human behavior, human biology, cultures, society, societies, and linguistics, in both the present and past, including archaic humans. Social anthropology studies patterns of behav ...

and

neuroscience Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions, and its disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, ...

, among others. Computational linguistics is closely related to mathematical linguistics.

Origins

The field overlapped with

since the efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Since rule-based approaches were able to make

arithmetic Arithmetic is an elementary branch of mathematics that deals with numerical operations like addition, subtraction, multiplication, and division. In a wider sense, it also includes exponentiation, extraction of roots, and taking logarithms. ...

(systematic) calculations much faster and more accurately than humans, it was expected that

lexicon A lexicon (plural: lexicons, rarely lexica) is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Greek word () ...

, morphology,

syntax In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...

and

semantics Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...

can be learned using explicit rules, as well. After the failure of rule-based approaches, David Hays coined the term in order to distinguish the field from AI and co-founded both the Association for Computational Linguistics (ACL) and the International Committee on Computational Linguistics (ICCL) in the 1970s and 1980s. What started as an effort to translate between languages evolved into a much wider field of

natural language processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...

Annotated corpora

In order to be able to meticulously study the

English language English is a West Germanic language that developed in early medieval England and has since become a English as a lingua franca, global lingua franca. The namesake of the language is the Angles (tribe), Angles, one of the Germanic peoples th ...

, an annotated text corpus was much needed. The Penn Treebank was one of the most used corpora. It consisted of IBM computer manuals, transcribed telephone conversations, and other texts, together containing over 4.5 million words of American English, annotated using both part-of-speech tagging and syntactic bracketing. Japanese sentence corpora were analyzed and a pattern of log-normality was found in relation to sentence length.

Modeling language acquisition

The fact that during

language acquisition Language acquisition is the process by which humans acquire the capacity to perceive and comprehend language. In other words, it is how human beings gain the ability to be aware of language, to understand it, and to produce and use words and s ...

, children are largely only exposed to positive evidence, meaning that the only evidence for what is a correct form is provided, and no evidence for what is not correct,Braine, M.D.S. (1971). On two types of models of the internalization of grammars. In D.I. Slobin (Ed.), The ontogenesis of grammar: A theoretical perspective. New York: Academic Press. was a limitation for the models at the time because the now available deep learning models were not available in late 1980s.Powers, D.M.W. & Turk, C.C.R. (1989). ''Machine Learning of Natural Language''. Springer-Verlag. . It has been shown that languages can be learned with a combination of simple input presented incrementally as the child develops better memory and longer attention span, which explained the long period of

in human infants and children. Robots have been used to test linguistic theories. Enabled to learn as children might, models were created based on an affordance model in which mappings between actions, perceptions, and effects were created and linked to spoken words. Crucially, these robots were able to acquire functioning word-to-meaning mappings without needing grammatical structure. Using the Price equation and Pólya urn dynamics, researchers have created a system which not only predicts future linguistic evolution but also gives insight into the evolutionary history of modern-day languages.

Chomsky's theories

Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American professor and public intellectual known for his work in linguistics, political activism, and social criticism. Sometimes called "the father of modern linguistics", Chomsky is also a ...

's theories have influenced computational linguistics, particularly in understanding how infants learn complex grammatical structures, such as those described in Chomsky normal form. Attempts have been made to determine how an infant learns a "non-normal grammar" as theorized by Chomsky normal form. Research in this area combines structural approaches with computational models to analyze large linguistic corpora like the Penn Treebank, helping to uncover patterns in language acquisition.

References

External links

Association for Computational Linguistics (ACL)
*
ACL Anthology of research papers
*
ACL Wiki for Computational Linguistics

CICLing annual conferences on Computational Linguistics

Computational Linguistics – Applications workshop
*
Language Technology World

The Research Group in Computational Linguistics
{{DEFAULTSORT:Computational Linguistics Formal sciences Cognitive science Computational fields of study *