Semantic parsing is the task of converting a
natural language
In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languag ...
utterance to a
logical form
In logic, logical form of a statement is a precisely-specified semantic version of that statement in a formal system. Informally, the logical form attempts to formalize a possibly ambiguous statement into a statement with a precise, unambig ...
: a machine-understandable representation of its meaning.
Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include
machine translation
Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
,
question answering,
[Berant, Jonathan, et al]
"Semantic Parsing on Freebase from Question-Answer Pairs."
EMNLP. Vol. 2. No. 5. 2013. ontology induction
Ontology learning (ontology extraction, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that ...
,
automated reasoning
In computer science, in particular in knowledge representation and reasoning and metalogic, the area of automated reasoning is dedicated to understanding different aspects of reasoning. The study of automated reasoning helps produce computer progr ...
, and
code generation. The phrase was first used in the 1970s by
Yorick Wilks as the basis for machine translation programs working with only semantic representations.
In
computer vision
Computer vision is an Interdisciplinarity, interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate t ...
, semantic parsing is a process of
segmentation for 3D objects.
Types
Shallow
Shallow semantic parsing is concerned with identifying entities in an utterance and labelling them with the roles they play. Shallow semantic parsing is sometimes known as slot-filling or frame semantic parsing, since its theoretical basis comes from
frame semantics, wherein a word evokes a frame of related concepts and roles. Slot-filling systems are widely used in
virtual assistants
Virtual may refer to:
* Virtual (horse), a thoroughbred racehorse
* Virtual channel, a channel designation which differs from that of the actual radio channel (or range of frequencies) on which the signal travels
* Virtual function, a programming ...
in conjunction with intent classifiers, which can be seen as mechanisms for identifying the frame evoked by an utterance.
[Kumar, Anjishnu, et al]
"Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding."
''arXiv preprint arXiv:1711.00549'' (2017). Popular architectures for slot-filling are largely variants of an encoder-decoder model, wherein two
recurrent neural networks (RNNs) are trained jointly to encode an utterance into a vector and to decode that vector into a sequence of slot labels. This type of model is used in the
Amazon Alexa spoken language understanding system.
Deep
Deep semantic parsing, also known as compositional semantic parsing, is concerned with producing precise meaning representations of utterances that can contain significant
compositionality. Shallow semantic parsers can parse utterances like "show me flights from Boston to Dallas" by classifying the intent as "list flights", and filling slots "source" and "destination" with "Boston" and "Dallas", respectively. However, shallow semantic parsing cannot parse arbitrary compositional utterances, like "show me flights from Boston to anywhere that has flights to Juneau". Deep semantic parsing attempts to parse such utterances, typically by converting them to a formal meaning representation language.
Representation languages
Early semantic parsers used highly domain-specific meaning representation languages, with later systems using more extensible languages like
Prolog
Prolog is a logic programming language associated with artificial intelligence and computational linguistics.
Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily a ...
,
[Zelle, John M., and Raymond J. Mooney]
"Learning to parse database queries using inductive logic programming."
''Proceedings of the national conference on artificial intelligence''. 1996. lambda calculus, lambda dependency-based compositional semantics (λ-DCS),
SQL,
[Hemphill, Charles T., John J. Godfrey, and George R. Doddington]
"The ATIS spoken language systems pilot corpus."
''Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24–27, 1990''. 1990. Python,
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
,
[Ling, Wang, et al]
"Latent predictor networks for code generation."
''arXiv preprint arXiv:1603.06744'' (2016). the Alexa Meaning Representation Language,
and the
Abstract Meaning Representation (AMR). Some work has used more exotic meaning representations, like query graphs, semantic graphs, or vector representations.
Models
Most modern deep semantic parsing models are either based on defining a formal grammar for a
chart parser or utilizing RNNs to directly translate from a natural language to a meaning representation language. Examples of systems built on formal grammars are the Cornell Semantic Parsing Framework,
Stanford University's Semantic Parsing with Execution (SEMPRE),
and the Word Alignment-based Semantic Parser (WASP).
Datasets
Datasets used for training statistical semantic parsing models are divided into two main classes based on application: those used for question answering via
knowledge base queries, and those used for code generation.
Question answering
A standard dataset for question answering via semantic parsing is the Air Travel Information System (ATIS) dataset, which contains questions and commands about upcoming flights as well as corresponding SQL.
Another benchmark dataset is the GeoQuery dataset which contains questions about the
geography of the U.S. paired with corresponding Prolog.
The Overnight dataset is used to test how well semantic parsers adapt across multiple domains; it contains natural language queries about 8 different domains paired with corresponding λ-DCS expressions.
Code generation
Popular datasets for code generation include two
trading card datasets that link the text that appears on cards to code that precisely represents those cards. One was constructed linking
Magic: The Gathering card texts to Java snippets; the other by linking
Hearthstone card texts to Python snippets.
The
IFTTT dataset uses a specialized domain-specific language with short conditional commands. The
Django dataset pairs Python snippets with English and Japanese pseudocode describing them. The
RoboCup dataset
[Kuhlmann, Gregory, et al]
"Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer."
''The AAAI-2004 workshop on supervisory control of learning and adaptive systems''. 2004. pairs English rules with their representations in a domain-specific language that can be understood by virtual soccer-playing robots.
See also
*
Automatic programming
*
Class (philosophy)
*
Formal semantics (linguistics)
*
Information extraction
*
Information retrieval
*
Question answering
*
Semantic analysis (linguistics)
*
Semantic role labeling
*
Statistical semantics
*
Syntax
*
Type–token distinction
References
{{reflist
Tasks of natural language processing
Computational linguistics
Parsing