Data-oriented Parsing
   HOME





Data-oriented Parsing
Data-oriented parsing (DOP, also data-oriented processing) is a probabilistic model in computational linguistics. DOP was conceived by Remko Scha in 1990 with the aim of developing a performance-oriented grammar framework. Unlike other probabilistic models, DOP takes into account all subtrees contained in a treebank rather than being restricted to, for example, 2-level subtrees (like PCFGs), thus allowing for more context-sensitive information. Several variants of DOP have been developed. The initial version developed by Rens Bod in 1992 was based on tree-substitution grammar,R. Bod, A computational model of language performance: Data oriented parsing, in: COLING 1992 Volume 3: The 15th International Conference on Computational Linguistics, https://www.aclweb.org/anthology/C92-3126.pdf while more recently, DOP has been combined with lexical-functional grammar (LFG). The resulting DOP-LFG finds an application in machine translation. Other work on learning and parameter estimation f ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Probabilistic Parsing
In theoretical linguistics and computational linguistics, probabilistic context free grammars (PCFGs) extend context-free grammars, similar to how hidden Markov models extend regular grammars. Each Formal grammar#The syntax of grammars, production is assigned a probability. The probability of a derivation (parse) is the product of the probabilities of the productions used in that derivation. These probabilities can be viewed as parameters of the model, and for large problems it is convenient to learn these parameters via machine learning. A probabilistic grammar's validity is constrained by context of its training dataset. PCFGs originated from grammar theory, and have application in areas as diverse as natural language processing to the study the structure of RNA molecules and design of programming languages. Designing efficient PCFGs has to weigh factors of scalability and generality. Issues such as grammar ambiguity must be resolved. The grammar design affects results accuracy. Gr ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Conceptual Model
The term conceptual model refers to any model that is formed after a wikt:concept#Noun, conceptualization or generalization process. Conceptual models are often abstractions of things in the real world, whether physical or social. Semantics, Semantic studies are relevant to various stages of process of concept formation, concept formation. Semantics is fundamentally a study of concepts, the meaning that thinking beings give to various elements of their experience. Overview Concept models and conceptual models The value of a conceptual model is usually directly proportional to how well it corresponds to a past, present, future, actual or potential state of affairs. A concept model (a model of a concept) is quite different because in order to be a good model it need not have this real world correspondence. In artificial intelligence, conceptual models and conceptual graphs are used for building expert systems and knowledge-based systems; here the analysts are concerned to repres ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Computational Linguistics
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics draws upon linguistics, computer science, artificial intelligence, mathematics, logic, philosophy, cognitive science, cognitive psychology, psycholinguistics, anthropology and neuroscience, among others. Computational linguistics is closely related to mathematical linguistics. Origins The field overlapped with artificial intelligence since the efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Since rule-based approaches were able to make arithmetic (systematic) calculations much faster and more accurately than humans, it was expected that lexicon, morphology, syntax and semantics can be learned using explicit rules, a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Remko Scha
Remko Jan Hendrik Scha (15 September 1945 – 9 November 2015) was a professor of computational linguistics at the faculty of humanities and Institute for Logic, Language and Computation at the University of Amsterdam. He made important contributions to semantics, in particular the treatment of plurals, and to discourse analysis, and laid the foundations for what became an important research paradigm in computational linguistics, Data Oriented Parsing. He was a composer and performer of algorithmic art. He made recordings of music which has been generated by motor-driven machines. One notable example of this type of music is his 1982 album of electric guitar music, "Machine Guitars", on which all guitars are played by saber saws without human intervention, except for one in which the guitar is played by a rotating wire brush, again with no human intervention. Recorded in Eindhoven and New York, it was described by Byron Coley in ''The Wire'' 231 as one of "the definitive modern NYC ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Linguistic Performance
The term linguistic performance was used by Noam Chomsky in 1960 to describe "the actual use of language in concrete situations". It is used to describe both the Language production, production, sometimes called '' parole'', as well as the comprehension of language. Performance is defined in opposition to "Linguistic competence, competence", the latter describing the mental knowledge that a speaker or listener has of language. Part of the motivation for the distinction between performance and competence comes from speech errors: despite having a perfect understanding of the correct forms, a speaker of a language may unintentionally produce incorrect forms. This is because performance occurs in real situations, and so is subject to many non-linguistic influences. For example, distractions or memory limitations can affect lexical retrieval (Chomsky 1965:3), and give rise to errors in both production and perception. Such non-linguistic factors are completely independent of the act ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Treebank
In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data. Etymology The term ''treebank'' was coined by linguist Geoffrey Leech in the 1980s, by analogy to other repositories such as a seedbank or bloodbank. This is because both syntactic and semantic structure are commonly represented compositionally as a tree structure. The term ''parsed corpus'' is often used interchangeably with the term treebank, with the emphasis on the primacy of sentences rather than trees. Construction Treebanks are often created on top of a corpus that has already been annotated with part-of-speech tags. In turn, treebanks are sometimes enhanced with semantic or other linguistic information. Treebanks can be created completely manually, where linguists annotate each sentence with syntactic structur ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Stochastic Context-free Grammar
In theoretical linguistics and computational linguistics, probabilistic context free grammars (PCFGs) extend context-free grammars, similar to how hidden Markov models extend regular grammars. Each Formal grammar#The syntax of grammars, production is assigned a probability. The probability of a derivation (parse) is the product of the probabilities of the productions used in that derivation. These probabilities can be viewed as parameters of the model, and for large problems it is convenient to learn these parameters via machine learning. A probabilistic grammar's validity is constrained by context of its training dataset. PCFGs originated from grammar theory, and have application in areas as diverse as natural language processing to the study the structure of RNA molecules and design of programming languages. Designing efficient PCFGs has to weigh factors of scalability and generality. Issues such as grammar ambiguity must be resolved. The grammar design affects results accuracy. Gr ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Lexical-functional Grammar
Lexical functional grammar (LFG) is a constraint-based grammar framework in theoretical linguistics. It posits several parallel levels of syntactic structure, including a phrase structure grammar representation of word order and constituency, and a representation of grammatical functions such as subject and object, similar to dependency grammar. The development of the theory was initiated by Joan Bresnan and Ronald Kaplan in the 1970s, in reaction to the theory of transformational grammar which was current in the late 1970s. It mainly focuses on syntax, including its relation with morphology and semantics. There has been little LFG work on phonology (although ideas from optimality theory have recently been popular in LFG research). Some recent work combines LFG with Distributed Morphology in Lexical-Realizational Functional Grammar.Ash Asudeh, Paul B. Melchin & Daniel Siddiqi (2021). ''Constraints all the way down: DM in a representational model of grammar''. In ''WCCFL 39 Proceed ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Machine Translation
Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages. Early approaches were mostly rule-based or statistical. These methods have since been superseded by neural machine translation and large language models. History Origins The origins of machine translation can be traced back to the work of Al-Kindi, a ninth-century Arabic cryptographer who developed techniques for systemic language translation, including cryptanalysis, frequency analysis, and probability and statistics, which are used in modern machine translation. The idea of machine translation later appeared in the 17th century. In 1629, René Descartes proposed a universal language, with equivalent ideas in different tongues sharing one symbol. The idea of using digital computers for translation of natural languages was proposed as early as 1947 by England's A. D. Booth and Warr ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Parameter Estimation
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An ''estimator'' attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered: * The probabilistic approach (described in this article) assumes that the measured data is random with probability distribution dependent on the parameters of interest * The set-membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector. Examples For example, it is desired to estimate the proportion of a population of voters who will vote for a particular candidate. That proportion is the parameter sought; the estimate is based on a small random sample of voters. Alternatively, it is ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Journal Of Experimental And Theoretical Artificial Intelligence
The ''Journal of Experimental and Theoretical Artificial Intelligence'' is a quarterly peer-reviewed scientific journal published by Taylor and Francis. It covers all aspects of artificial intelligence and was established in 1989. The editor-in-chief is Eric Dietrich (Binghamton University), the deputy editors-in-chief are Li Pheng Khoo (School of Mechanical & Aerospace Engineering, Nanyang Technological University) and Antonio Lieto (Department of Computer Science, University of Turin). Abstracting and indexing The journal is abstracted and indexed in: According to the ''Journal Citation Reports'', the journal has a 2020/2021 impact factor The impact factor (IF) or journal impact factor (JIF) of an academic journal is a type of journal ranking. Journals with higher impact factor values are considered more prestigious or important within their field. The Impact Factor of a journa ... of 2.340 . References External links * {{Official website, 1=http://www.tandfonl ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]