HOME

TheInfoList



OR:

Semantic parsing is the task of converting a
natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
utterance In spoken language analysis, an utterance is a continuous piece of speech, often beginning and ending with a clear pause. In the case of oral languages, it is generally, but not always, bounded by silence. Utterances do not exist in written langu ...
to a
logical form In logic, logical form of a statement is a precisely-specified semantic version of that statement in a formal system. Informally, the logical form attempts to formalize a possibly ambiguous statement into a statement with a precise, unambiguo ...
: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
,
question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
,Berant, Jonathan, et al
"Semantic Parsing on Freebase from Question-Answer Pairs."
EMNLP. Vol. 2. No. 5. 2013.
ontology induction,
automated reasoning In computer science, in particular in knowledge representation and reasoning and metalogic, the area of automated reasoning is dedicated to understanding different aspects of reasoning. The study of automated reasoning helps produce computer progra ...
, and code generation. The phrase was first used in the 1970s by
Yorick Wilks Yorick Wilks FBCS (born 27 October 1939), a British computer scientist, is emeritus professor of artificial intelligence at the University of Sheffield, visiting professor of artificial intelligence at Gresham College (a post created especiall ...
as the basis for machine translation programs working with only semantic representations. In
computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
, semantic parsing is a process of segmentation for 3D objects.


Types


Shallow

Shallow semantic parsing is concerned with identifying entities in an utterance and labelling them with the roles they play. Shallow semantic parsing is sometimes known as slot-filling or frame semantic parsing, since its theoretical basis comes from frame semantics, wherein a word evokes a frame of related concepts and roles. Slot-filling systems are widely used in virtual assistants in conjunction with intent classifiers, which can be seen as mechanisms for identifying the frame evoked by an utterance.Kumar, Anjishnu, et al
"Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding."
''arXiv preprint arXiv:1711.00549'' (2017).
Popular architectures for slot-filling are largely variants of an encoder-decoder model, wherein two
recurrent neural network A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic ...
s (RNNs) are trained jointly to encode an utterance into a vector and to decode that vector into a sequence of slot labels. This type of model is used in the
Amazon Alexa Amazon Alexa, also known simply as Alexa, is a virtual assistant technology largely based on a Polish speech synthesiser named Ivona, bought by Amazon in 2013. It was first used in the Amazon Echo smart speaker and the Echo Dot, Echo Studio and ...
spoken language understanding system.


Deep

Deep semantic parsing, also known as compositional semantic parsing, is concerned with producing precise meaning representations of utterances that can contain significant
compositionality In semantics, mathematical logic and related disciplines, the principle of compositionality is the principle that the meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them. ...
. Shallow semantic parsers can parse utterances like "show me flights from Boston to Dallas" by classifying the intent as "list flights", and filling slots "source" and "destination" with "Boston" and "Dallas", respectively. However, shallow semantic parsing cannot parse arbitrary compositional utterances, like "show me flights from Boston to anywhere that has flights to Juneau". Deep semantic parsing attempts to parse such utterances, typically by converting them to a formal meaning representation language.


Representation languages

Early semantic parsers used highly domain-specific meaning representation languages, with later systems using more extensible languages like
Prolog Prolog is a logic programming language associated with artificial intelligence and computational linguistics. Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily ...
,Zelle, John M., and Raymond J. Mooney
"Learning to parse database queries using inductive logic programming."
''Proceedings of the national conference on artificial intelligence''. 1996.
lambda calculus Lambda calculus (also written as ''λ''-calculus) is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. It is a universal model of computation ...
, lambda dependency-based compositional semantics (λ-DCS), SQL,Hemphill, Charles T., John J. Godfrey, and George R. Doddington
"The ATIS spoken language systems pilot corpus."
''Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24–27, 1990''. 1990.
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
,Ling, Wang, et al
"Latent predictor networks for code generation."
''arXiv preprint arXiv:1603.06744'' (2016).
the Alexa Meaning Representation Language, and the
Abstract Meaning Representation Abstract Meaning Representation (AMR) is a semantic representation language. AMR graphs are rooted, labeled, directed, acyclic graphs ( DAGs), comprising whole sentences. They are intended to abstract away from syntactic representations, in the se ...
(AMR). Some work has used more exotic meaning representations, like query graphs, semantic graphs, or vector representations.


Models

Most modern deep semantic parsing models are either based on defining a formal grammar for a
chart parser In computer science, a chart parser is a type of parser suitable for ambiguous grammars (including grammars of natural languages). It uses the dynamic programming approach—partial hypothesized results are stored in a structure called a chart and ...
or utilizing RNNs to directly translate from a natural language to a meaning representation language. Examples of systems built on formal grammars are the Cornell Semantic Parsing Framework,
Stanford University Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is consider ...
's Semantic Parsing with Execution (SEMPRE), and the Word Alignment-based Semantic Parser (WASP).


Datasets

Datasets used for training statistical semantic parsing models are divided into two main classes based on application: those used for question answering via
knowledge base A knowledge base (KB) is a technology used to store complex structured and unstructured information used by a computer system. The initial use of the term was in connection with expert systems, which were the first knowledge-based systems. Ori ...
queries, and those used for code generation.


Question answering

A standard dataset for question answering via semantic parsing is the Air Travel Information System (ATIS) dataset, which contains questions and commands about upcoming flights as well as corresponding SQL. Another benchmark dataset is the GeoQuery dataset which contains questions about the geography of the U.S. paired with corresponding Prolog. The Overnight dataset is used to test how well semantic parsers adapt across multiple domains; it contains natural language queries about 8 different domains paired with corresponding λ-DCS expressions.


Code generation

Popular datasets for code generation include two
trading card A trading card (or collectible card) is a small card, usually made out of paperboard or thick paper, which usually contains an image of a certain person, place or thing (fictional or real) and a short description of the picture, along with other ...
datasets that link the text that appears on cards to code that precisely represents those cards. One was constructed linking
Magic: The Gathering ''Magic: The Gathering'' (colloquially known as ''Magic'' or ''MTG'') is a Tabletop game, tabletop and Digital collectible card game, digital Collectible card game, collectable card game created by Richard Garfield. Released in 1993 by Wizards ...
card texts to Java snippets; the other by linking
Hearthstone ''Hearthstone'' is a free-to-play online digital collectible card game developed and published by Blizzard Entertainment. Originally subtitled ''Heroes of Warcraft'', ''Hearthstone'' builds upon the existing lore of the ''Warcraft'' series by u ...
card texts to Python snippets. The
IFTTT IFTTT (, an acronym of If This Then That) is a private commercial company that runs online digital automation platforms which it offers as a service. Their platforms provide a visual interface for making cross-platform if statements to its us ...
dataset uses a specialized domain-specific language with short conditional commands. The Django dataset pairs Python snippets with English and Japanese pseudocode describing them. The
RoboCup RoboCup is an annual international robotics competition founded in 1996 by a group of university professors (including Hiroaki Kitano, Manuela M. Veloso, and Minoru Asada). The aim of the competition is to promote robotics and AI research by offer ...
datasetKuhlmann, Gregory, et al
"Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer."
''The AAAI-2004 workshop on supervisory control of learning and adaptive systems''. 2004.
pairs English rules with their representations in a domain-specific language that can be understood by virtual soccer-playing robots.


See also

*
Automatic programming In computer science, the term automatic programming identifies a type of computer programming in which some mechanism generates a computer program to allow human programmers to write the code at a higher abstraction level. There has been little ...
*
Class (philosophy) A class is a collection whose members either fall under a predicate or are classified by a rule. Hence, while a set can be extensionally defined only by its elements, a class has also an intensional dimension that unite its members. When the ter ...
*
Formal semantics (linguistics) Formal semantics is the study of grammatical meaning in natural languages using formal tools from logic and theoretical computer science. It is an interdisciplinary field, sometimes regarded as a subfield of both linguistics and philosophy of lang ...
*
Information extraction Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...
*
Information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
*
Question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
*
Semantic analysis (linguistics) In linguistics, semantic analysis is the process of relating syntactic structures, from the levels of phrases, clauses, sentences and paragraphs to the level of the writing as a whole, to their language-independent meanings. It also involves re ...
*
Semantic role labeling In natural language processing, semantic role labeling (also called shallow semantic parsing or slot-filling) is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of ...
*
Statistical semantics In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of informat ...
*
Syntax In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituency) ...
*
Type–token distinction The type–token distinction is the difference between naming a ''class'' (type) of objects and naming the individual ''instances'' (tokens) of that class. Since each type may be exemplified by multiple tokens, there are generally more tokens than ...


References

{{reflist Tasks of natural language processing Computational linguistics Parsing