Semantic Parsers
   HOME

TheInfoList



OR:

Semantic parsing is the task of converting a
natural language A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
utterance In spoken language analysis, an utterance is a continuous piece of speech, by one person, before or after which there is silence on the part of the person. In the case of oral language, spoken languages, it is generally, but not always, bounded ...
to a
logical form In logic, the logical form of a statement is a precisely specified semantic version of that statement in a formal system. Informally, the logical form attempts to formalize a possibly ambiguous statement into a statement with a precise, unamb ...
: a machine-understandable representation of its meaning. Semantic parsing can thus be understood as extracting the precise meaning of an utterance. Applications of semantic parsing include
machine translation Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages. Early approaches were mostly rule-based or statisti ...
,
question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP) that is concerned with building systems that automatically answer questions that are posed by humans in a n ...
,Berant, Jonathan, et al
"Semantic Parsing on Freebase from Question-Answer Pairs."
EMNLP. Vol. 2. No. 5. 2013.
ontology induction,
automated reasoning In computer science, in particular in knowledge representation and reasoning and metalogic, the area of automated reasoning is dedicated to understanding different aspects of reasoning. The study of automated reasoning helps produce computer progr ...
, and code generation. The phrase was first used in the 1970s by Yorick Wilks as the basis for machine translation programs working with only semantic representations. Semantic parsing is one of the important tasks in computational linguistics and natural language processing. Semantic parsing maps text to formal meaning representations. This contrasts with semantic role labeling and other forms of shallow semantic processing, which do not aim to produce complete formal meanings. In
computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
, semantic parsing is a process of segmentation for 3D objects.


History & Background

Early research of semantic parsing included the generation of grammar manually as well as utilizing applied programming logic. In the 2000s, most of the work in this area involved the creation/learning and use of different grammars and lexicons on controlled tasks, particularly general grammars such as SCFGs. This improved upon manual grammars primarily because they leveraged the syntactical nature of the sentence, but they still couldn’t cover enough variation and weren’t robust enough to be used in the real world. However, following the development of advanced neural network techniques, especially the Seq2Seq model, and the availability of powerful computational resources, neural semantic parsing started emerging. Not only was it providing competitive results on the existing datasets, but it was robust to noise and did not require a lot of supervision and manual intervention. The current transition of traditional parsing to neural semantic parsing has not been perfect though. Neural semantic parsing, even with its advantages, still fails to solve the problem at a deeper level. Neural models like Seq2Seq treat the parsing problem as a sequential translation problem, and the model learns patterns in a black-box manner, which means we cannot really predict whether the model is truly solving the problem. Intermediate efforts and modifications to the Seq2Seq to incorporate syntax and semantic meaning have been attempted, with a marked improvement in results, but there remains a lot of ambiguity to be taken care of.


Types


Shallow Semantic Parsing

Shallow semantic parsing is concerned with identifying entities in an utterance and labelling them with the roles they play. Shallow semantic parsing is sometimes known as slot-filling or frame semantic parsing, since its theoretical basis comes from frame semantics, wherein a word evokes a frame of related concepts and roles. Slot-filling systems are widely used in
virtual assistants Virtual may refer to: * Virtual image, an apparent image of an object (as opposed to a real object), in the study of optics * Virtual (horse), a thoroughbred racehorse * Virtual channel, a channel designation which differs from that of the actual ...
in conjunction with intent classifiers, which can be seen as mechanisms for identifying the frame evoked by an utterance.Kumar, Anjishnu, et al
"Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding."
''arXiv preprint arXiv:1711.00549'' (2017).
Popular architectures for slot-filling are largely variants of an encoder-decoder model, wherein two
recurrent neural network Recurrent neural networks (RNNs) are a class of artificial neural networks designed for processing sequential data, such as text, speech, and time series, where the order of elements is important. Unlike feedforward neural networks, which proces ...
s (RNNs) are trained jointly to encode an utterance into a vector and to decode that vector into a sequence of slot labels. This type of model is used in the
Amazon Alexa Amazon Alexa is a virtual assistant technology marketed by Amazon and implemented in software applications for smart phones, tablets, wireless smart speakers, and other electronic appliances. Alexa was largely developed from a Polish speech s ...
spoken language understanding system. This parsing follow an unsupervised learning techniques.


Deep Semantic Parsing

Deep semantic parsing, also known as compositional semantic parsing, is concerned with producing precise meaning representations of utterances that can contain significant
compositionality In semantics, mathematical logic and related disciplines, the principle of compositionality is the principle that the meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them. ...
. Shallow semantic parsers can parse utterances like "show me flights from Boston to Dallas" by classifying the intent as "list flights", and filling slots "source" and "destination" with "Boston" and "Dallas", respectively. However, shallow semantic parsing cannot parse arbitrary compositional utterances, like "show me flights from Boston to anywhere that has flights to Juneau". Deep semantic parsing attempts to parse such utterances, typically by converting them to a formal meaning representation language. Nowadays, compositional semantic parsing are using
Large Language Models A large language model (LLM) is a language model trained with Self-supervised learning, self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially Natural language generation, language g ...
to solve artificial compositional generalization tasks such as SCAN.


Neural Semantic Parsing

Semantic parsers play a crucial role in natural language understanding systems because they transform natural language utterances into machine-executable logical structures or programmes. A well-established field of study, semantic parsing finds use in voice assistants, question answering, instruction following, and code generation. Since Neural approaches have been available for two years, many of the presumptions that underpinned semantic parsing have been rethought, leading to a substantial change in the models employed for semantic parsing. Though Semantic neural network and Neural Semantic Parsing both deal with
Natural Language Processing Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
(NLP) and semantics, they are not same. The models and executable formalisms used in semantic parsing research have traditionally been strongly dependent on concepts from formal semantics in linguistics, like the λ-calculus produced by a CCG parser. Nonetheless, more approachable formalisms, like conventional programming languages, and NMT-style models that are considerably more accessible to a wider NLP audience, are made possible by recent work with neural encoder-decoder semantic parsers. We'll give a summary of contemporary neural approaches to semantic parsing and discuss how they've affected the field's understanding of semantic parsing.


Representation languages

Early semantic parsers used highly domain-specific meaning representation languages, with later systems using more extensible languages like
Prolog Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving, and computational linguistics. Prolog has its roots in first-order logic, a formal logic. Unlike many other programming language ...
,Zelle, John M., and Raymond J. Mooney
"Learning to parse database queries using inductive logic programming."
''Proceedings of the national conference on artificial intelligence''. 1996.
lambda calculus In mathematical logic, the lambda calculus (also written as ''λ''-calculus) is a formal system for expressing computability, computation based on function Abstraction (computer science), abstraction and function application, application using var ...
, lambda dependency-based compositional semantics (λ-DCS),
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
,Hemphill, Charles T., John J. Godfrey, and George R. Doddington
"The ATIS spoken language systems pilot corpus."
''Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24–27, 1990''. 1990.
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (prog ...
,
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
,Ling, Wang, et al
"Latent predictor networks for code generation."
''arXiv preprint arXiv:1603.06744'' (2016).
the Alexa Meaning Representation Language, and the
Abstract Meaning Representation Abstract Meaning Representation (AMR) is a semantic representation language. AMR graphs are rooted, labeled, directed, acyclic graphs ( DAGs), comprising whole sentences. They are intended to abstract away from syntactic representations, in the se ...
(AMR). Some work has used more exotic meaning representations, like query graphs, semantic graphs, or vector representations.


Models

Most modern deep semantic parsing models are either based on defining a
formal grammar A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
for a
chart parser In computer science, a chart parser is a type of parser suitable for ambiguous grammars (including grammars of natural languages). It uses the dynamic programming approach—partial hypothesized results are stored in a structure called a chart a ...
or utilizing RNNs to directly translate from a natural language to a meaning representation language. Examples of systems built on formal grammars are the Cornell Semantic Parsing Framework,
Stanford University Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...
's Semantic Parsing with Execution (SEMPRE), and the Word Alignment-based Semantic Parser (WASP).


Datasets

Datasets used for training statistical semantic parsing models are divided into two main classes based on application: those used for question answering via
knowledge base In computer science, a knowledge base (KB) is a set of sentences, each sentence given in a knowledge representation language, with interfaces to tell new sentences and to ask questions about what is known, where either of these interfaces migh ...
queries, and those used for code generation.


Question answering

A standard dataset for question answering via semantic parsing is the Air Travel Information System (ATIS) dataset, which contains questions and commands about upcoming flights as well as corresponding SQL. Another benchmark dataset is the GeoQuery dataset which contains questions about the geography of the U.S. paired with corresponding Prolog. The Overnight dataset is used to test how well semantic parsers adapt across multiple domains; it contains natural language queries about 8 different domains paired with corresponding λ-DCS expressions. Recently, semantic parsing is gaining significant popularity as a result of new research works and many large companies, namely
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
,
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
,
Amazon Amazon most often refers to: * Amazon River, in South America * Amazon rainforest, a rainforest covering most of the Amazon basin * Amazon (company), an American multinational technology company * Amazons, a tribe of female warriors in Greek myth ...
, etc. are working on this area. One on the recent works of Semantic Parsing for question answering is attached here. Shown in this picture is a representation of an example conversation from SPICE. The left column shows dialogue turns (T1–T3) with user (U) and system (S) utterances. The middle column shows the annotations provided in CSQA. Blue boxes on the right show the sequence of actions (AS) and corresponding SPARQL semantic parses (SP).


Code generation

Popular datasets for code generation include two
trading card A trading card (or collectible card) is a small card, usually made out of paperboard or thick paper, which usually contains an image of a certain person, place or thing (fictional or real) and a short description of the picture, along with other t ...
datasets that link the text that appears on cards to code that precisely represents those cards. One was constructed linking Magic: The Gathering card texts to Java snippets; the other by linking
Hearthstone ''Hearthstone'' is a 2014 Online game, online digital collectible card game, digital collectible card video game produced by Blizzard Entertainment, released under the free-to-play model. Originally subtitled ''Heroes of Warcraft'', ''Hearthsto ...
card texts to Python snippets. The
IFTTT IFTTT (, an acronym of Conditional (computer programming)#If–then(–else), if this, then that) is a private commercial company that runs services that allow a user to program a response to events in the world. IFTTT has partnerships with dif ...
dataset uses a specialized domain-specific language with short conditional commands. The Django dataset pairs Python snippets with English and Japanese pseudocode describing them. The
RoboCup RoboCup is an annual international robotics competition founded in 1996 by a group of university professors (including Hiroaki Kitano, Manuela M. Veloso, Itsuki Noda and Minoru Asada). The aim of the competition is to promote robotics and AI ...
dataset pairs English rules with their representations in a domain-specific language that can be understood by virtual soccer-playing robots.


Application Areas

Within the field of natural language processing (NLP), semantic parsing deals with transforming human language into a format that is easier for machines to understand and comprehend. This method is useful in a number of contexts: * Voice Assistants and Chatbots: Semantic parsing enhances the quality of user interaction in devices such as smart speakers and
chatbots A chatbot (originally chatterbot) is a software application or web interface designed to have textual or spoken conversations. Modern chatbots are typically online and use generative artificial intelligence systems that are capable of main ...
for customer service by comprehending and answering user inquiries in natural language. * Information Retrieval: It improves the comprehension and processing of user queries by search engines and databases, resulting in more precise and pertinent search results. *
Machine Translation Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages. Early approaches were mostly rule-based or statisti ...
: To improve the quality and context of translation, machine translation entails comprehending the semantics of one language in order to translate it into another accurately. * Text Analytics: Business intelligence and social media monitoring benefit from the meaningful insights that can be extracted from text data through semantic parsing. Examples of these insights include
sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subje ...
, topic modelling, and trend analysis. *
Question Answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP) that is concerned with building systems that automatically answer questions that are posed by humans in a n ...
Systems: Found in systems such as IBM Watson, these systems assist in comprehending and analyzing natural language queries in order to deliver precise responses. They are particularly helpful in areas such as customer service and educational resources. * Command and Control Systems: Semantic parsing aids in the accurate interpretation of voice or text commands used to control systems in applications such as software interfaces or smart homes. * Content Categorization: It is a useful tool for online publishing and digital content management as it aids in the classification and organization of vast amounts of textual material by analyzing its semantic content. * Technologies related to accessibility: Helps create tools for the disabled, such as sign language interpretation and
text to speech Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...
conversion. * Legal and Healthcare Informatics: Semantic parsing can extract and structure important information from legal documents and medical records to support research and decision-making. Semantic parsing aims to improve various applications' efficiency and efficacy by bridging the gap between human language and machine processing in each of these domains.


Evaluation

The performance of Semantic parsers is also measured using standard evaluation metrics as like syntactic parsing. This can be evaluated for the ratio of exact matches (percentage of sentences that were perfectly parsed), and precision, recall, and F1-score calculated based on the correct constituency or dependency assignments in the parse relative to that number in reference and/or hypothesis parses. The latter are also known as the PARSEVAL metrics.


See also

*
Automatic programming In computer science, automatic programming is a type of computer programming in which some mechanism generates a computer program, to allow human programmers to write the code at a higher abstraction level. There has been little agreement on the ...
* Class (philosophy) *
Formal semantics (linguistics) Formal semantics is the scientific study of linguistic meaning through formal tools from logic and mathematics. It is an interdisciplinary field, sometimes regarded as a subfield of both linguistics and philosophy of language. Formal semanticists r ...
* Information extraction *
Information retrieval Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
*
Minimal recursion semantics Minimal recursion semantics (MRS) is a framework for computational semantics. It can be implemented in typed feature structure formalisms such as head-driven phrase structure grammar and lexical functional grammar. It is suitable for computational ...
*
Process philosophy Process philosophy (also ontology of becoming or processism) is an approach in philosophy that identifies processes, changes, or shifting relationships as the only real experience of everyday living. In opposition to the classical view of change ...
*
Question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP) that is concerned with building systems that automatically answer questions that are posed by humans in a n ...
*
Semantic analysis (linguistics) In linguistics, semantic analysis is the process of relating syntactic structures, from the levels of words, phrases, clauses, sentence (linguistics), sentences and paragraphs to the level of the writing as a whole, to their language-independent ...
*
Semantic role labeling In natural language processing, semantic role labeling (also called shallow semantic parsing or slot-filling) is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of ...
*
Statistical semantics In linguistics, statistical semantics applies the methods of statistics to the problem of determining the meaning of words or phrases, ideally through unsupervised learning, to a degree of precision at least sufficient for the purpose of informatio ...
*
Syntax In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
*
Type–token distinction The type–token distinction is the difference between a ''type'' of objects (analogous to a ''class'') and the individual ''tokens'' of that type (analogous to ''instances''). Since each type may be instantiated by multiple tokens, there are g ...


References

{{Formal semantics Tasks of natural language processing Computational linguistics Semantics Parsing