Rhetorical structure theory (RST) is a theory of text organization that describes relations that hold between parts of text. It was originally developed by
William Mann,
Sandra Thompson,
Christian M.I.M. Matthiessen and others at the
University of Southern California
The University of Southern California (USC, SC, or Southern Cal) is a Private university, private research university in Los Angeles, California, United States. Founded in 1880 by Robert M. Widney, it is the oldest private research university in C ...
's
Information Sciences Institute
The USC Information Sciences Institute (ISI) is a component of the University of Southern California (USC) Viterbi School of Engineering, and specializes in research and development in information processing, computing, and communications techno ...
(ISI) and defined in a 1988 paper.
The theory was developed as part of studies of computer-based
text generation. Natural language researchers later began using RST in
text summarization and other applications. It explains
coherence
Coherence, coherency, or coherent may refer to the following:
Physics
* Coherence (physics), an ideal property of waves that enables stationary (i.e. temporally and spatially constant) interference
* Coherence (units of measurement), a deriv ...
by postulating a hierarchical, connected structure of texts.
In 2000, Daniel Marcu, also of ISI, demonstrated that practical discourse
parsing
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
and text summarization also could be achieved using RST.
Rhetorical relations
Rhetorical relations or coherence relations or
discourse relation A discourse relation (also coherence relation or rhetorical relation) is a description of how two segments of discourse are logically and/or structurally connected to one another.
A widely upheld position is that in coherent discourse, every ind ...
s are
paratactic
Parataxis (from el, παράταξις, "act of placing side by side"; from παρα, ''para'' "beside" + τάξις, ''táxis'' "arrangement") is a literary technique, in writing or speaking, that favors short, simple sentences, without conju ...
(coordinate) or
hypotactic (subordinate) relations that hold across two or more text spans. It is widely accepted that notion of
coherence
Coherence, coherency, or coherent may refer to the following:
Physics
* Coherence (physics), an ideal property of waves that enables stationary (i.e. temporally and spatially constant) interference
* Coherence (units of measurement), a deriv ...
is through text relations like this. RST using rhetorical relations provide a systematic way for an analyst to analyse the text. An analysis is usually built by reading the text and constructing a tree using the relations. The following example is a title and
summary, appearing at the top of an article in ''
Scientific American
''Scientific American'', informally abbreviated ''SciAm'' or sometimes ''SA'', is an American popular science magazine. Many famous scientists, including Albert Einstein and Nikola Tesla, have contributed articles to it. In print since 1845, it i ...
'' magazine (Ramachandran and Anstis, 1986). The original text, broken into numbered units, is:
#
itle:The Perception of Apparent Motion
#
bstract:When the motion of an intermittently seen object is ambiguous
# the visual system resolves confusion
# by applying some tricks that reflect a built-in knowledge of properties of the physical world
In the figure, numbers 1,2,3,4 show the corresponding units as explained above.
The fourth unit and the third unit form a relation "Means". The third unit is the essential part of this relation, so it is called the nucleus of the relation and fourth unit is called the satellite of the relation. Similarly second unit to third and fourth unit is forming relation "Condition". All units are also spans and spans may be composed of more than one unit.
Nuclearity in discourse
RST establishes two different types of units. Nuclei are considered as the most important parts of text whereas satellites contribute to the nuclei and are secondary.
Nucleus contains basic information and satellite contains additional information about nucleus. The satellite is often incomprehensible without nucleus, whereas a text where a satellites have been deleted can be understood to a certain extent.
Hierarchy in the analysis
RST relations are applied recursively in a text, until all units in that text are constituents in an RST relation. The result of such analyses is that RST structure are typically represented as trees, with one top level relation that encompasses other relations at lower levels.
Why RST?
# From linguistic point of view, RST proposes a different view of text organization than most
linguistic theories.
# RST points to a tight relation between relations and
coherence
Coherence, coherency, or coherent may refer to the following:
Physics
* Coherence (physics), an ideal property of waves that enables stationary (i.e. temporally and spatially constant) interference
* Coherence (units of measurement), a deriv ...
in text
# From a computational point of view, it provides a characterization of text relations that has been implemented in different systems and for applications as
text generation and
summarization.
In design rationale
Computer scientists Ana Cristina Bicharra Garcia and Clarisse Sieckenius de Souz have used RST as the basis of a
design rationale
A design rationale is an explicit documentation of the reasons behind decisions made when designing a system or artifact. As initially developed by W.R. Kunz and Horst Rittel, design rationale seeks to provide argumentation-based structure to th ...
system called ADD+.
In ADD+, RST is used as the basis for the rhetorical organization of a
knowledge base
A knowledge base (KB) is a technology used to store complex structured and unstructured information used by a computer system. The initial use of the term was in connection with expert systems, which were the first knowledge-based systems.
Ori ...
, in a way comparable to other
knowledge representation
Knowledge representation and reasoning (KRR, KR&R, KR²) is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks such as diagnosing a medic ...
systems such as
issue-based information system
The issue-based information system (IBIS) is an argumentation-based approach to clarifying wicked problems—complex, ill-defined problems that involve multiple stakeholders. Diagrammatic visualization using IBIS notation is often called issue ...
(IBIS).
Similarly, RST has been used in representation schemes for
argumentation
Argumentation theory, or argumentation, is the interdisciplinary study of how conclusions can be supported or undermined by premises through logical reasoning. With historical origins in logic, dialectic, and rhetoric, argumentation theory, includ ...
.
See also
*
Argument mining Argument mining, or argumentation mining, is a research area within the natural-language processing field. The goal of argument mining is the automatic extraction and identification of argumentative structures from natural language text with the aid ...
*
Parse tree
A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in co ...
References
{{Computable knowledge
Argument technology
Discourse analysis
Knowledge representation
Natural language processing