Shallow parsing (also chunking or light
parsing
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
) is an analysis of a
sentence which first identifies constituent parts of sentences (nouns, verbs, adjectives, etc.) and then links them to higher order units that have discrete grammatical meanings (
noun
A noun () is a word that generally functions as the name of a specific object or set of objects, such as living creatures, places, actions, qualities, states of existence, or ideas.Example nouns for:
* Living creatures (including people, alive, d ...
groups or
phrases
In syntax and grammar, a phrase is a group of words or singular word acting as a grammatical unit. For instance, the English expression "the very happy squirrel" is a noun phrase which contains the adjective phrase "very happy". Phrases can consi ...
, verb groups, etc.). While the most elementary chunking algorithms simply link constituent parts on the basis of elementary search patterns (e.g., as specified by
regular expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s), approaches that use
machine learning techniques (classifiers,
topic modeling
In statistics and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden ...
, etc.) can take contextual information into account and thus compose chunks in such a way that they better reflect the semantic relations between the basic constituents. That is, these more advanced methods get around the problem that combinations of elementary constituents can have different higher level meanings depending on the context of the sentence.
It is a technique widely used in
natural language processing
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
. It is similar to the concept of
lexical analysis
In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' (strings with an assigned and thus identified m ...
for computer languages. Under the name "shallow structure hypothesis", it is also used as an explanation for why
second language
A person's second language, or L2, is a language that is not the native language (first language or L1) of the speaker, but is learned later. A second language may be a neighbouring language, another language of the speaker's home country, or a fo ...
learners often fail to parse complex sentences correctly.
References
Citations
Sources
*
*
External links
Apache OpenNLPOpenNLP
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named en ...
includes a chunker.
GATE General Architecture for Text EngineeringGATE
A gate or gateway is a point of entry to or from a space enclosed by walls. The word derived from old Norse "gat" meaning road or path; But other terms include ''yett and port''. The concept originally referred to the gap or hole in the wall ...
includes a chunker.
*
NLTK
The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It was developed by Steven Bird and E ...
br>
chunkingIllinois Shallow ParserShallow Parse
Demo
See also
*
Parser
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
*
Semantic role labeling In natural language processing, semantic role labeling (also called shallow semantic parsing or slot-filling) is the process that assigns labels to words or phrases in a sentence that indicates their semantic role in the sentence, such as that of a ...
*
Named entity recognition
Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre ...
Natural language parsing
Tasks of natural language processing
{{comp-ling-stub