Lexical choice is the subtask of

Natural language generation Natural language generation (NLG) is a software process that produces natural language output. In one of the most widely-cited survey of NLG methods, NLG is characterized as "the subfield of artificial intelligence and computational linguistics tha ...

that involves choosing the content words (nouns, verbs, adjectives, and adverbs) in a generated text. Function words (determiners, for example) are usually chosen during realisation.

Examples

The simplest type of lexical choice involves mapping a domain concept (perhaps represented in an

ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities ...

) to a word. For example, the concept Finger might be mapped to the word ''finger''. A more complex situation is when a domain concept is expressed using different words in different situations. For example, the domain concept Value-Change can be expressed in many ways * ''The temperature rose'': the verb ''rose'' is used for a Value-Change in temperature which increases the value * ''The temperature fell'': the verb ''fell'' is used for a Value-Change in temperature which decreases the value * ''The rain got heavier'': the phrase ''got heavier'' is used for a Value-Change in precipitation amount when the precipitation is rain. Sometimes words can communicate additional contextual information, for example * ''The temperature plummeted'': the verb ''plummeted'' is used for a Value-Change in temperature which decreases the value, when the change is rapid and large Contextual information is especially significant for vague terms such as ''tall''. For example, a 2m tall man is ''tall'', but a 2m tall horse is ''small''.

Linguistic perspective

Lexical choice modules must be informed by linguistic knowledge of how the system's input data maps onto words. This is a question of

semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and compu ...

, but it is also influenced by

syntactic In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituenc ...

factors (such as

collocation In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words ...

effects) and

pragmatic Pragmatism is a philosophical movement. Pragmatism or pragmatic may also refer to: *Pragmaticism, Charles Sanders Peirce's post-1905 branch of philosophy *Pragmatics, a subfield of linguistics and semiotics *''Pragmatics'', an academic journal in ...

factors (such as context). Hence NLG systems need linguistic models of how meaning is mapped to words in the target domain (

genre Genre () is any form or type of communication in any mode (written, spoken, digital, artistic, etc.) with socially-agreed-upon conventions developed over time. In popular usage, it normally describes a category of literature, music, or other ...

) of the NLG system. Genre tends to be very important; for example the verb ''veer'' has a very specific meaning in weather forecasts (wind direction is changing in a clockwise direction) which it does not have in general English, and a weather-forecast generator must be aware of this genre-specific meaning. In some cases there are major differences in how different people use the same word; for example, some people use ''by evening'' to mean 6PM and others use it to mean midnight. Psycholinguists have shown that when people speak to each other, they agree on a common interpretation via lexical alignment; this is not something which NLG systems can yet do. Ultimately, lexical choice must deal with the fundamental issue of how language relates to the non-linguistic world. For example, a system which chose colour terms such as ''red'' to describe objects in a digital image would need to know which RGB pixel values could generally be described as ''red''; how this was influenced by visual (lighting, other objects in the scene) and linguistic (other objects being discussed) context; what pragmatic connotations were associated with ''red'' (for example, when an apple is called ''red'', it is assumed to be ripe as well as have the colour red); and so forth.

Algorithms and models

A number of algorithms and models have been developed for lexical choice in the research community, for example Edmonds developed a model for choosing between near-synonyms (words with similar core meanings but different connotations).P Edmonds and G Hirst (2002). Near-Synonymy and Lexical Choice. ''Computational Linguistics'' 28:105-144

/ref> However such algorithms and models have not been widely used in applied NLG systems; such systems have instead often used quite simple computational models, and invested development effort in linguistic analysis instead of algorithm development.

References

{{DEFAULTSORT:Lexical Choice Computational linguistics Natural language processing