HOME

TheInfoList



OR:

In digital
lexicography Lexicography is the study of lexicons, and is divided into two separate academic disciplines. It is the art of compiling dictionaries. * Practical lexicography is the art or craft of compiling, writing and editing dictionaries. * Theoretica ...
,
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
, and
digital humanities Digital humanities (DH) is an area of scholarly activity at the intersection of computing or Information technology, digital technologies and the disciplines of the humanities. It includes the systematic use of digital resources in the humanitie ...
, a
lexical Lexical may refer to: Linguistics * Lexical corpus or lexis, a complete set of all words in a language * Lexical item, a basic unit of lexicographical classification * Lexicon, the vocabulary of a person, language, or branch of knowledge * Lexical ...
resource is a
language resource In linguistics and language technology, a language resource is a " ompositionof linguistic material used in the construction, improvement and/or evaluation of language processing applications, (...) in language and language-mediated research stu ...
consisting of
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted ...
regarding the
lexeme A lexeme () is a unit of lexical meaning that underlies a set of words that are related through inflection. It is a basic abstract unit of meaning, a unit of morphological analysis in linguistics that roughly corresponds to a set of forms taken ...
s of the
lexicon A lexicon is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Koine Greek language, Greek word (), neuter of () ...
of one or more
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of met ...
s e.g., in the form of a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
.


Characteristics

Different standards for the machine-readable edition of lexical resources exist, e.g., Lexical Markup Framework (LMF) an
ISO standard The International Organization for Standardization (ISO ) is an international standard development organization composed of representatives from the national standards organizations of member countries. Membership requirements are given in Art ...
for encoding lexical resources, comprising an abstract data model and an
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
serialization, and OntoLex-Lemon, an RDF vocabulary for publishing lexical resources as
knowledge graph The Google Knowledge Graph is a knowledge base from which Google serves relevant information in an infobox beside its search results. This allows the user to see the answer in a glance. The data is generated automatically from a variety of so ...
s on the web, e.g., as
Linguistic Linked Open Data In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with L ...
. Depending on the type of languages that are addressed, a lexical resource may be qualified as
monolingual Monoglottism (Greek μόνος ''monos'', "alone, solitary", + γλῶττα , "tongue, language") or, more commonly, monolingualism or unilingualism, is the condition of being able to speak only a single language, as opposed to multilingualism. ...
,
bilingual Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. It is believed that multilingual speakers outnumber monolingual speakers in the world's population. More than half of all E ...
or
multilingual Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. It is believed that multilingual speakers outnumber monolingual speakers in the world's population. More than half of all E ...
. For bilingual and multilingual lexical resources, the words may be connected or not connected from one language to another. When connected, the equivalence from a language to another is performed through a bilingual link (for bilingual lexical resources, e.g., using the relation ''vartrans:translatableAs'' in OntoLex-Lemon) or through multilingual notations (for multilingual lexical resources, e.g., by reference to the same ''ontolex:Concept'' in OntoLex-Lemon). It is possible also to build and manage a lexical resource consisting of different lexicons of the same language, for instance, one dictionary for general words and one or several dictionaries for different specialized domains.


Machine-readable dictionary vs. NLP dictionary

Lexical resources in digital
lexicography Lexicography is the study of lexicons, and is divided into two separate academic disciplines. It is the art of compiling dictionaries. * Practical lexicography is the art or craft of compiling, writing and editing dictionaries. * Theoretica ...
are often referred to as machine-readable dictionary (''MRD''), a
dictionary A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged alphabetically (or by radical and stroke for ideographic languages), which may include information on definitions, usage, etymologies ...
stored as machine (computer) data instead of being printed on paper. It is an
electronic dictionary An electronic dictionary is a dictionary whose data exists in digital form and can be accessed through a number of different media. Electronic dictionaries can be found in several forms, including software installed on tablet or desktop computers ...
and lexical database. The term MRD is often contrasted with NLP dictionary, in the sense that an MRD is the electronic form of a dictionary which was printed before on paper. Although being both used by programs, in contrast, the term NLP dictionary is preferred when the dictionary was built from scratch with NLP in mind.Gil Francopoulo (edited by) LMF Lexical Markup Framework, ISTE / Wiley 2013 ()


Lexical database

A lexical database is a lexical resource which has an associated software environment
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
which permits access to its contents. The database may be custom-designed for the lexical information or a general-purpose database into which lexical information has been entered. Information typically stored in a lexical database includes
spelling Spelling is a set of conventions that regulate the way of using graphemes (writing system) to represent a language in its written form. In other words, spelling is the rendering of speech sound (phoneme) into writing (grapheme). Spelling is one ...
,
lexical category In grammar, a part of speech or part-of-speech (abbreviated as POS or PoS, also known as word class or grammatical category) is a category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are ass ...
and
synonyms A synonym is a word, morpheme, or phrase that means exactly or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are all ...
of words, as well as
semantic Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
and
phonological Phonology is the branch of linguistics that studies how languages or dialects systematically organize their sounds or, for sign languages, their constituent parts of signs. The term can also refer specifically to the sound or sign system of a ...
relations between different words or sets of words.


See also

* Lexical Markup Framework (LMF),
ISO standard The International Organization for Standardization (ISO ) is an international standard development organization composed of representatives from the national standards organizations of member countries. Membership requirements are given in Art ...
for encoding lexical resources, comprising an abstract data model and an
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
serialization * OntoLex-Lemon, RDF vocabulary for publishing lexical resources on the web, e.g., as
Linguistic Linked Open Data In natural language processing, linguistics, and neighboring fields, Linguistic Linked Open Data (LLOD) describes a method and an interdisciplinary community concerned with creating, sharing, and (re-)using language resources in accordance with L ...
*
LREC The International Conference on Language Resources and Evaluation is an international conference organised by the European Language Resources Association every other year (on even years) with the support of institutions and organisations involved ...
conference series *
Machine-readable dictionary Machine-readable dictionary (''MRD'') is a dictionary stored as machine (computer) data instead of being printed on paper. It is an electronic dictionary and lexical database. A machine-readable dictionary is a dictionary in an electronic form th ...
*
WordNet WordNet is a lexical database of semantic relations between words in more than 200 languages. WordNet links words into semantic relations including synonyms, hyponyms, and meronyms. The synonyms are grouped into '' synsets'' with short definition ...
*
Arabic Ontology Arabic Ontology is a linguistic ontology for the Arabic language, which can be used as an Arabic Wordnet with ontologically-clean content. People use it also as a tree (i.e. classification) of the concepts/meanings of the Arabic terms. It is a fo ...


References


External links


The WordNet Home Page

Lexicographic Search Engine
{{Natural language processing Lexis (linguistics) Translation databases Computational linguistics Dictionaries by type