In
linguistics
Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Lingu ...
, realization is the process by which some kind of
surface representation is derived from its underlying representation; that is, the way in which some
abstract
Abstract may refer to:
* ''Abstract'' (album), 1962 album by Joe Harriott
* Abstract of title a summary of the documents affecting title to parcel of land
* Abstract (law), a summary of a legal document
* Abstract (summary), in academic publishi ...
object of linguistic analysis comes to be produced in actual language.
Phoneme
In phonology and linguistics, a phoneme () is a unit of sound that can distinguish one word from another in a particular language.
For example, in most dialects of English, with the notable exception of the West Midlands and the north-wes ...
s are often said to be ''realized'' by
speech sound
In phonetics and linguistics, a phone is any distinct speech sound or gesture, regardless of whether the exact sound is critical to the meanings of words.
In contrast, a phoneme is a speech sound in a given language that, if swapped with another ...
s. The different sounds that can realize a particular phoneme are called its
allophone
In phonology, an allophone (; from the Greek , , 'other' and , , 'voice, sound') is a set of multiple possible spoken soundsor ''phones''or signs used to pronounce a single phoneme in a particular language. For example, in English, (as in '' ...
s.
Realization is also a subtask of
natural language generation
Natural language generation (NLG) is a software process that produces natural language output. In one of the most widely-cited survey of NLG methods, NLG is characterized as "the subfield of artificial intelligence and computational linguistics tha ...
, which involves
creating an actual text in a human language (English, French, etc.) from a syntactic
representation. There are a number of software packages available for realization,
most of which have been developed by academic research groups in NLG. The remainder of this article concerns realization of this kind.
Example
For example, the following
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
code causes the simplenlg syste
[A Gatt and E Reiter (2009). SimpleNLG: A realisation engine for practical applications. ''Proceedings of ENLG09']
/ref> to print out the text ''The women do not smoke.'':
NPPhraseSpec subject = nlgFactory.createNounPhrase("the", "woman");
subject.setPlural(true);
SPhraseSpec sentence = nlgFactory.createClause(subject, "smoke");
sentence.setFeature(Feature.NEGATED, true);
System.out.println(realiser.realiseSentence(sentence));
In this example, the computer program has specified the linguistic constituents of the sentence (verb, subject), and also linguistic features (plural subject, negated), and from this information the realiser has constructed the actual sentence.
Processing
Realisation involves three kinds of processing:
Syntactic realisation: Using grammatical knowledge to choose inflections, add function words and also to decide the order of components. For example, in English the subject usually precedes the verb, and the negated form of ''smoke'' is ''do not smoke''.
Morphological realisation: Computing inflected forms, for example the plural form of ''woman'' is ''women'' (not ''womans'').
Orthographic realisation: Dealing with casing, punctuation
Punctuation (or sometimes interpunction) is the use of spacing, conventional signs (called punctuation marks), and certain typographical devices as aids to the understanding and correct reading of written text, whether read silently or aloud. A ...
, and formatting. For example, capitalising ''The'' because it is the first word of the sentence.
The above examples are very basic, most realisers are capable of considerably more complex processing.
Systems
A number of realisers have been developed over the past 20 years. These systems differ in terms of complexity and sophistication of their processing, robustness in dealing with unusual cases, and whether they are accessed programmatically via an API or whether they take a textual representation of a syntactic structure as their input.
There are also major differences in pragmatic factors such as documentation, support, licensing terms, speed and memory usage, etc.
It is not possible to describe all realisers here, but a few of the emerging areas are:
* Simplenlg''
a document realizing engine with an api which intended to be simple to learn and use, focused on limiting scope to only finding the surface area of a document.
* KPML''
this is the oldest realiser, which has been under development under different guises since the 1980s. It comes with grammars for ten different languages.
* FUF/SURGE''
a realiser which was widely used in the 1990s, and is still used in some projects today
* OpenCCG''
an open-source realiser which has a number of nice features, such as the ability to use statistical language models to make realisation decisions.
References
{{Reflist
External links
- ACL NLG Portal (contains links to the above and many other realisers)
Natural language processing
Computational linguistics