Shallow parsing (also chunking or light
parsing
Parsing, syntax analysis, or syntactic analysis is a process of analyzing a String (computer science), string of Symbol (formal), symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal gramm ...
) is an analysis of a
sentence which first identifies constituent parts of sentences (nouns, verbs, adjectives, etc.) and then links them to higher order units that have discrete grammatical meanings (
noun
In grammar, a noun is a word that represents a concrete or abstract thing, like living creatures, places, actions, qualities, states of existence, and ideas. A noun may serve as an Object (grammar), object or Subject (grammar), subject within a p ...
groups or
phrases, verb groups, etc.). While the most elementary chunking algorithms simply link constituent parts on the basis of elementary search patterns (e.g., as specified by
regular expression
A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s), approaches that use
machine learning techniques (classifiers,
topic modeling, etc.) can take contextual information into account and thus compose chunks in such a way that they better reflect the semantic relations between the basic constituents. That is, these more advanced methods get around the problem that combinations of elementary constituents can have different higher level meanings depending on the context of the sentence.
It is a technique widely used in
natural language processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
. It is similar to the concept of
lexical analysis
Lexical tokenization is conversion of a text into (semantically or syntactically) meaningful ''lexical tokens'' belonging to categories defined by a "lexer" program. In case of a natural language, those categories include nouns, verbs, adjectives ...
for computer languages. Under the name "shallow structure hypothesis", it is also used as an explanation for why
second language
A second language (L2) is a language spoken in addition to one's first language (L1). A second language may be a neighbouring language, another language of the speaker's home country, or a foreign language.
A speaker's dominant language, which ...
learners often fail to parse complex sentences correctly.
References
Citations
Sources
*
*
External links
Apache OpenNLP OpenNLP includes a chunker.
GATE General Architecture for Text EngineeringGATE
A gate or gateway is a point of entry to or from a space enclosed by walls. The word is derived from Proto-Germanic language, Proto-Germanic ''*gatan'', meaning an opening or passageway. Synonyms include yett (which comes from the same root w ...
includes a chunker.
*
NLTKbr>
chunkingIllinois Shallow ParserShallow Parse
Demo
See also
*
Parser
Parsing, syntax analysis, or syntactic analysis is a process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar by breaking it into parts. The term '' ...
*
Semantic role labeling
*
Named-entity recognition
Natural language parsing
Tasks of natural language processing
{{comp-ling-stub