An explanatory combinatorial dictionary (ECD) is a type of monolingual
dictionary
A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically (or by radical and stroke for ideographic languages), which may include information on definitions, usage, etymologies ...
designed to be part of a
meaning-text linguistic model of a natural language. It is intended to be a complete record of the lexicon of a given language. As such, it identifies and describes, in separate entries, each of the language's
lexeme
A lexeme () is a unit of lexical meaning that underlies a set of words that are related through inflection. It is a basic abstract unit of meaning, a unit of morphological analysis in linguistics that roughly corresponds to a set of forms taken ...
s (roughly speaking, each word or set of inflected forms based on a single stem) and
phraseme
A phraseme, also called a set phrase, idiomatic phrase, multi-word expression (in computational linguistics), or idiom, is a multi-word or multi-morphemic utterance whose components include at least one that is selectionally constrained or restric ...
s (roughly speaking, idioms and other multi-word fixed expressions). Among other things, each entry contains (1) a definition that incorporates a lexeme's semantic actants (for example, the definiendum of ''give'' takes the form ''X gives Y to Z'', where its three actants are expressed — the giver ''X'', the thing given ''Y'', and the person given to, ''Z'') (2) complete information on lexical co-occurrence (e.g. the entry for ''attack'' tells you that one of its collocations is ''launch an attack'', the entry for ''party'' provides ''throw a party'', and the entry for ''lecture'' provides ''deliver a lecture'' — enabling the user to avoid making an error like *''deliver a party''); (3) an extensive set of examples. The ECD is a production dictionary — that is, it aims to provide all the information needed for a foreign learner or automaton to produce perfectly formed utterances of the language. Since the lexemes and phrasemes of a natural language number in the hundreds of thousands, a complete ECD, in paper form, would occupy the space of a large encyclopaedia. Such a work has yet to be achieved; while ECDs of Russian and French have been published, each describes less than one percent of the vocabulary of the respective languages.
The ECD was proposed in the late 1960s by Aleksandr Žolkovskij and
Igor Mel'čuk
Igor Aleksandrovič Mel'čuk, sometimes ''Melchuk'' (russian: Игорь Александрович Мельчук; uk, Ігор Олександрович Мельчук; born 1932), is a Soviet and Canadian linguist, a retired professor at the ...
and was later further developed by Jurij Apresjan. Three ECDs are currently available in print, one for Russian, and two for French. A dictionary of Spanish
collocations
In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words t ...
—DICE (= Diccionario de colocaciones del español)—is under development.
Characteristics of an ECD
A complete ECD of a language would provide an entry for every lexeme,
construction
Construction is a general term meaning the art and science to form objects, systems, or organizations,"Construction" def. 1.a. 1.b. and 1.c. ''Oxford English Dictionary'' Second Edition on CD-ROM (v. 4.0) Oxford University Press 2009 and com ...
, or idiom—referred to collectively as "Lexical Units" (LUs)—in use in the language. Entries in the ECD are based on the semantic definition of an LU, and each entry contains a complete list of its
collocations
In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words t ...
and
lexical functions
A lexical function (LF) is a tool developed within Meaning-Text Theory for the description and systematization of semantic relationships, specifically collocations and lexical derivation, between particular lexical units (LUs) of a language.Fonte ...
as well.
Entries for historically-related Lexical Units which are homophones and share significant semantic component (i.e., meanings) are grouped into larger units called "vocables," thereby acknowledging
polysemy
Polysemy ( or ; ) is the capacity for a sign (e.g. a symbol, a morpheme, a word, or a phrase) to have multiple related meanings. For example, a word can have several word senses. Polysemy is distinct from ''monosemy'', where a word has a singl ...
while maintaining the distinct status of the independent items in question. The English vocable ''improve'', for example, includes six Lexical Units, each of which is provided a separate lexical entry:
IMPROVE, verb
:IMPROVEI.1a X improves ≡ ‘The value or the quality of X becomes higher’
::
'The weather suddenly improved; The system will improve over time'':IMPROVEI.1b X improves Y ≡ ‘X causes
1 that Y improvesI.1a’
::
'The most recent changes drastically improved the system'':IMPROVEI.2 X improves ≡ ‘The health of a sick person X improvesI.1a’
::
'Jim is steadily improving'':IMPROVEI.3 X improves at Y ≡ ‘X’s execution of Y improvesI.1a, which is caused
1 by X’s having practiced or practicing Y’
::
'Jim is steadily improving at algebra'':IMPROVEII X improves Y by Z-ing ≡ ‘X voluntarily causes
2 that the market value of a piece of real estate Y becomes higher by doing Z-ing to Y’
::
'Jim improved his house by installing indoor plumbing'':IMPROVEIII X improves upon Y ≡ ‘X creates a new Y´ by improvingI.1b Y’
::
'Jim has drastically improved upon Patrick’s translation''The lexicographic numbers (given in bold after the entry word) reflect degrees or levels of semantic distance between Lexical Units within a vocable: Roman numerals mark the highest-level semantic groupings, while Arabic numerals mark the next highest level, and letters indicate the lowest level distances. The four lexemes grouped under IMPROVEI, for example, are considered to be closer to each other than to IMPROVEII or IMPROVEIII, because the meanings of each of IMPROVEI.1b and IMPROVEI.2 actually include the meaning of IMPROVEI.1a. IMPROVEI.1a and IMPROVEI.1b are even more closely related because in English there are many pairs of words—specifically, labile or
ambitransitive verbs—that are related by the semantic alternation ’P’ ~ ‘cause
1 to P’ (as per above, ‘improve’ ~ ‘cause to improve’).
The
subscript and superscript
A subscript or superscript is a character (such as a number or letter) that is set slightly below or above the normal line of type, respectively. It is usually smaller than the rest of the text. Subscripts appear at or below the baseline, whil ...
numbers attached to words in the definition refer to subsenses (subscripts) and homophonous entries (superscripts) for a word as given in the ''Longman Dictionary of Contemporary English'' —thus, “device
11” refers to the first entry for ''device'' in this dictionary, first subsense.
Structure of the ECD entry
An ECD entry for a given Lexical Unit, let’s call it "L", is divided into three major sections or "zones":
The semantic zone
The semantic zone describes the semantic properties of L and consists of two sub-zones:
:1) the definition of L, which fully specifies L’s meaning; and
:2) L’s connotations (meanings that the language associates with L, but that are not part of its definition).
The phonological/graphematic zone
The phonological/graphematic zone gives all of the data on L’s phonological properties. Here again we find two sub-zones:
:1) L’s pronunciation, including its syllabification, and any non-standard prosodic properties; and
:2) orthographic information about L’s spelling variants, etc.
The co-occurrence zone
The co-occurrence zone presents all of the data on L’s combinatorial properties. It is organized into five sub-zones—morphological, syntactic, lexical, stylistic, and pragmatic.
:The morphological sub-zone contains inflectional data including conjugation/declension class, irregular forms, missing forms, permitted alternations, etc.
:The syntactic sub-zone has two parts:
:: a) Government pattern, which describes the elements that L can syntactically govern (arguments, complements, etc.);
:: b) Part of speech and syntactic features, which describes the constructions in which L can appear as a syntactic
dependent
A dependant is a person who relies on another as a primary source of income. A common-law spouse who is financially supported by their partner may also be included in this definition. In some jurisdictions, supporting a dependant may enabl ...
.
:The lexical sub-zone specifies the
lexical functions
A lexical function (LF) is a tool developed within Meaning-Text Theory for the description and systematization of semantic relationships, specifically collocations and lexical derivation, between particular lexical units (LUs) of a language.Fonte ...
that L participates in, covering both semantic
derivations
Derivation may refer to:
Language
* Morphological derivation, a word-formation process
* Parse tree or concrete syntax tree, representing a string's syntax in formal grammars
Law
* Derivative work, in copyright law
* Derivation proceeding, a proc ...
and
collocations
In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words t ...
of L with other individual LUs or very small and irregular groups of LUs.
:The stylistic sub-zone specifies L’s speech register (informal, colloquial, vulgar, poetic, etc.), temporal (obsolescent, archaic) and geographical (British, Indian, Australian) variability, and the like.
:The pragmatic sub-zone describes the real-life situations in which a particular expression is appropriate or inappropriate.
References
{{Reflist
Lexicography
Semantics
Meaning–text theory