Content determination is the subtask of

natural language generation Natural language generation (NLG) is a software process that produces natural language output. In one of the most widely-cited survey of NLG methods, NLG is characterized as "the subfield of artificial intelligence and computational linguistics th ...

(NLG) that involves deciding on the information to be communicated in a generated text. It is closely related to the task of

document structuring Document Structuring is a subtask of Natural language generation, which involves deciding the order and grouping (for example into paragraphs) of sentences in a generated text. It is closely related to the Content determination NLG task. Example ...

Example

Consider an NLG system which summarises information about sick babies. Suppose this system has four pieces of information it can communicate # The baby is being given morphine via an IV drop # The baby's heart rate shows bradycardia's (temporary drops) # The baby's temperature is normal # The baby is crying Which of these bits of information should be included in the generated texts?

Issues

There are three general issues which almost always impact the content determination task, and can be illustrated with the above example. Perhaps the most fundamental issue is the ''communicative goal'' of the text, i.e. its ''purpose'' and ''reader''. In the above example, for instance, a doctor who wants to make a decision about medical treatment would probably be most interested in the heart rate bradycardias, while a parent who wanted to know how her child was doing would probably be more interested in the fact that the baby was being given morphine and was crying. The second issue is the ''size'' and ''level of detail'' of the generated text. For instance, a short summary which was sent to a doctor as a 160 character SMS text message might only mention the heart rate bradycardias, while a longer summary which was printed out as a multipage document might also mention the fact that the baby is on a morphine IV. The final issue is how ''unusual and unexpected'' the information is. For example, neither doctors nor parents would place a high priority on being told that the baby's temperature was normal, if they expected this to be the case. Regardless, content determination is very important to users, indeed in many cases the quality of content determination is the most important factor (from the user's perspective) in determining the overall quality of the generated text.

Techniques

There are three basic approaches to document structuring: schemas (content templates), statistical approaches, and explicit reasoning. ''Schemas'' are templates which explicitly specify the content of a generated text (as well as

information). Typically, they are constructed by manually analysing a

corpus Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ...

of human-written texts in the target genre, and extracting a content template from these texts. Schemas work well in practice in domains where content is somewhat standardised, but work less well in domains where content is more fluid (such as the medical example above). ''Statistical techniques'' use statistical corpus analysis techniques to automatically determine the content of the generated texts. Such work is in its infancy, and has mostly been applied to contexts where the communicative goal, reader, size, and level of detail are fixed. For example, generation of newswire summaries of sporting events. ''Explicit reasoning'' approaches have probably attracted the most attention from researchers. The basic idea is to use AI reasoning techniques (such as knowledge-based rules, planning, pattern detection,

case-based reasoning In artificial intelligence and philosophy, case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. In everyday life, an auto mechanic who fixes an engine by recallin ...

,P Gervás, B Díaz-Agudo, F Peinado, R Hervás (2005) Story plot generation based on CBR. Knowledge-Based Systems 18:235-242 etc.) to examine the information available to be communicated (including how unusual/unexpected it is), the communicative goal and reader, and the characteristics of the generated text (including target size), and decide on the optimal content for the generated text. A very wide range of techniques has been explored, but there is no consensus as to which is most effective.

References

{{DEFAULTSORT:Content Determination Computational linguistics Natural language processing Natural language generation