OntoLex-Lemon
   HOME

TheInfoList



OR:

OntoLex is the short name of a vocabulary for
lexical resource In digital lexicography, natural language processing, and digital humanities, a lexical resource is a language resource consisting of data regarding the lexemes of the lexicon of one or more languages e.g., in the form of a database. Characterist ...
s in the web of data (OntoLex-Lemon) and the short name of the W3C community group that created it (W3C Ontology-Lexica Community Group).


OntoLex-Lemon vocabulary

The OntoLex-Lemon vocabulary represents a vocabulary for publishing lexical data as a
knowledge graph The Google Knowledge Graph is a knowledge base from which Google serves relevant information in an infobox beside its search results. This allows the user to see the answer in a glance. The data is generated automatically from a variety of so ...
, in an RDF format and/or as
Linguistic Linked Open Data In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with L ...
. Since its publication as a W3C Community report in 2016, it serves as ``a de facto standard to represent ontology-lexica on the Web´´. OntoLex-Lemon is a revision of the Lemon vocabulary originally proposed by McCrae et al. (2011). The core elements of OntoLex-Lemon, shown in Fig. 1, are: * lexical entry: unit of analysis of the lexicon, groups together one or more forms and one or more senses, resp. concepts. Can provide additional morphosyntactic information, e.g., one part of speech. Note that every lexical entry can have at most one part of speech, for representing groups of lexical entries with identical forms but different parts of speech, see the lexicography module. * lexical form: surface form of a particular lexical entry, e.g., its written representation * lexical sense: word sense of a particular lexical entry. Note that a OntoLex-Lemon senses are ''lexicalized'', i.e., they belong to exactly one lexical entry. For elements of meaning that can be expressed by different lexemes, use lexical concept. * lexical concept: elements of meaning with different lexicalizations. A typical example are WordNet synsets, where multiple synonymous words are grouped together in a single set. Aside from the core module (namespace http://www.w3.org/ns/lemon/ontolex#), other modules specify designated vocabulary for representing lexicon metadata (namespace http://www.w3.org/ns/lemon/lime#), lexical-semantic relations (e.g., translation and variation, namespace http://www.w3.org/ns/lemon/vartrans#), multi-word expressions (decomposition, namespace http://www.w3.org/ns/lemon/decomp#) and syntactic frames (namespace http://www.w3.org/ns/lemon/synsem#). The data structures of OntoLex-Lemon are comparable with those of other dictionary formats (see related vocabularies below). The innovative element about OntoLex-Lemon is that it provides such a data model as an RDF vocabulary, as this enables novel use cases that are based on web technologies rather than stand-alone dictionaries (e.g., translation inference, see applications below). For the foreseeable future, OntoLex-Lemon will also remain ''unique'' in this role, as the (Linguistic) Linked Open Data community strongly encourages to reuse existing vocabularies and as of Dec 2019, OntoLex-Lemon is the only established (i.e., published by W3C or another standardization initiative) vocabulary for its purpose. This is also reflected in recent extensions to the original OntoLex-Lemon specification, where novel modules have been developed to extend the use of OntoLex-Lemon to novel areas of application: * OntoLex-Lemon Lexicography Module, published as a W3C Community Group Report, extends OntoLex-Lemon with respect to requirements from digital lexicography. * OntoLex-Lemon Morphology Module, as of Dec 2019 under development, aims to facilitate multilinguality with the formalization of morphological dictionaries in OntoLex-Lemon, esp., for morphologically rich languages * OntoLex-Lemon Module for Frequency, Attestation and Corpus Information, as of Dec 2019 under development, aims to facilitate uses of OntoLex-Lemon in computational lexicography and natural language processing * Updates to LexInfo: LexInfo provides data categories for OntoLex-Lemon data. The current version is Lexinfo 3.0, older versions (prior to 2019) still depended on the older Monnet-Lemon vocabulary.


Applications

OntoLex-Lemon is widely used for lexical resources in the context of
Linguistic Linked Open Data In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with L ...
. Selected applications include * OASIS Lexicographic Infrastructure Data Model and API (LEXIDMA), a framework for internationally interoperable lexicographic work * European public multilingual knowledge infrastructure * LexO, a collaborative web editor used for the creation and management of (multilingual) lexical and terminological resources as linked data resources
VocBench
a web-based, multilingual, collaborative development platform for managing ontologies, thesauri, lexicons and RDF data * The Lexicala API by K Dictionaries that provides access to cross-lingual lexical data of 50 languages and 150 language pairs. * DiTMAO, a lexicographic editor developed for creating the Dictionary of Old Occitan medico-botanical terminology * a series of Shared Tasks on Translation Inference Across Dictionaries (TIAD-2017, TIAD-2019, TIAD-2020) * DBnary, RDF edition of 16 language editions of
Wiktionary Wiktionary ( , , rhyming with "dictionary") is a multilingual, web-based project to create a free content dictionary of terms (including words, phrases, proverbs, linguistic reconstructions, etc.) in all natural languages and in a number ...
* PanLex, a large-scale lexical network of about 2,500 dictionaries and more than 500 languages * Princeton WordNet 3.1, a large-scale, hierarchically and relationally structured lexical resource for English * Global WordNet Association, a community effort to produce, maintain and interlink multilingual
WordNet WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into '' synsets'' with short definition ...
s *
BabelNet BabelNet is a multilingual lexicalized semantic network and ontology developed at the NLP group of the Sapienza University of Rome.R. Navigli and S. P Ponzetto. 2012BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Cover ...
, a large-scale multilingual lexical network * LiLa, a knowledge base of linguistic resources for Latin based on a large lexicon consisting of a collection of citation forms OntoLex development is regularly addressed in scientific events dedicated to ontologies, linked data or lexicography. Since 2017, a designated workshop series on the OntoLex module is conducted biannually.


Related vocabularies

Related vocabularies that focus on standardizing and publishing lexical resources include
DICT DICT is a dictionary network protocol created by the DICT Development Group in 1997, described by RFC 2229. Its goal is to surpass the Webster protocol to allow clients to access a variety of dictionaries via a uniform interface. In section 3. ...
(text-based format), the XML Dictionary eXchange Format, TEI-Dict (XML) and the
Lexical Markup Framework Language resource management - Lexical markup framework (LMF; ISO 24613:2008), is the International Organization for Standardization ISO/TC37 standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scop ...
(abstract model usually serialized in XML; the Lemon vocabulary originally evolved from an RDF serialization of LMF). OntoLex-Lemon differs from these earlier models in being a native Linked Open Data vocabulary that does not (just) formalize structure and semantics of machine-readable dictionaries, but is designed to facilitate information integration between them.


References

{{Reflist


External links


OntoLex-Lemon specification

OntoLex-Lemon lexicography module

OntoLex GitHub repository
Vocabulary