LRE Map
   HOME

TheInfoList



OR:

The LRE Map (Language Resources and Evaluation) is a freely accessible large database on resources dedicated to
Natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
. The original feature of LRE Map is that the records are collected during the submission of different major
Natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
conferences. The records are then cleaned and gathered into a global database called "LRE Map". The LRE Map is intended to be an instrument for collecting information about language resources and to become, at the same time, a community for users, a place to share and discover resources, discuss opinions, provide feedback, discover new trends, etc. It is an instrument for discovering, searching and documenting language resources, here intended in a broad sense, as both data and tools. The large amount of information contained in the Map can be analyzed in many different ways. For instance, the LRE Map can provide information about the most frequent type of resource, the most represented language, the applications for which resources are used or are being developed, the proportion of new resources vs. already existing ones, or the way in which resources are distributed to the community.


Context

Several institutions worldwide maintain catalogues of language resources ( ELRA, LDC, NICT Universal Catalogue, ACL Data and Code Repository,
OLAC OLAC, the Open Language Archives Community, is an initiative to create a unified means of searching online databases of language resources for linguistic research. The information about resources is stored in XML format for easy searching. OLAC wa ...
, LT World, etc.) However, it has been estimated that only 10% of existing resources are known, either through distribution catalogues or via direct publicity by providers (web sites and the like). The rest remains hidden, the only occasions where it briefly emerges being when a resource is presented in the context of a research paper or report at some conference. Even in this case, nevertheless, it might be that a resource remains in the background simply because the focus of the research is not on the resource ''per se''.


History

The LRE Map originated under the name "LREC Map" during the preparation of
LREC The International Conference on Language Resources and Evaluation is an international conference organised by the European Language Resources Association every other year (on even years) with the support of institutions and organisations involved ...
2010 conference. More specifically, the idea was discussed within the FlaReNet project, and in collaboration wit
ELRA
and th
Institute of Computational Linguistics of CNR in Pisa
the Map was put in place at LREC 2010. The LREC organizers asked the authors to provide some basic information about all the resources (in a broad sense, i.e. including tools, standards and evaluation packages), either used or created, described in their papers. All these descriptors were then gathered in a global matrix called the LREC Map. The same methodology and requirements from the authors has been then applied and extended to other conferences, namely COLING-2010, EMNLP-2010, RANLP-2011, LREC 2012, LREC 2014 and LREC 2016.
After this generalization to other conferences, the LREC Map has been renamed as the LRE Map.


Size and content

The size of the database increases over time. The data collected amount to 4776 entries. Each resource is described according to the following attributes: * Resource type, e.g.
lexicon A lexicon is the vocabulary of a language or branch of knowledge (such as nautical or medical). In linguistics, a lexicon is a language's inventory of lexemes. The word ''lexicon'' derives from Koine Greek language, Greek word (), neuter of () ...
, annotation tool, tagger/parser. * Resource production status, e.g. newly created finished, existing-updated. * Resource availability, e.g. freely available, from data center. * Resource modality, e.g.
speech Speech is a human vocal communication using language. Each language uses Phonetics, phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if ...
, written,
sign language Sign languages (also known as signed languages) are languages that use the visual-manual modality to convey meaning, instead of spoken words. Sign languages are expressed through manual articulation in combination with non-manual markers. Sign l ...
. * Resource use, e.g.
named entity recognition Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre ...
,
language identification In natural language processing, language identification or language guessing is the problem of determining which natural language given content is in. Computational approaches to this problem view it as a special case of text categorization, solv ...
,
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
. * Resource language, e.g. English, 23 European Union languages, official languages of India.


Uses

The LRE map is a very important tool to chart the NLP field. Compared to other studied based on subjective scorings, the LRE map is made of real facts. The map has a great potential for many uses, in addition to being an information gathering tool: * It is a great instrument for monitoring the evolution of the field (useful for funders), if applied in different contexts and times. * It can be seen as a huge joint effort, the beginning of an even larger cooperative action not just among few leaders but among all the researchers. * It is also an "educational" means towards the broad acknowledgment of the need of meta-research activities with the active involvement of many. * It is also instrumental in introducing the new notion of "citation of resources" that could provide an award and a means of scholarly recognition for researchers engaged in resource creation. * It is used to help the organization of the conferences of the field like
LREC The International Conference on Language Resources and Evaluation is an international conference organised by the European Language Resources Association every other year (on even years) with the support of institutions and organisations involved ...
.


Derived matrices

The data were then cleaned and sorted by
Joseph Mariani Joseph Mariani (born Joseph-Jean Mariani; 1 February 1950) is a French computer science researcher and pioneer in the field of speech processing. Education and career After obtaining a Doctor of Engineering degree in 1977 from the Pierre and Mar ...
(CNRS-LIMSI IMMI) and
Gil Francopoulo Gil or GIL may refer to: Places * Gil Island (disambiguation), one of several islands by that name * Gil, Iran, a village in Hormozgan Province, Iran * Hil, Azerbaijan, also spelled ''Gil, a village in Azerbaijan * Hiloba, also spelled ''Gil, ...
(CNRS-LIMSI IMMI + Tagmatica) in order to compute the various matrices of the final FLaReNet reports. One of them, the matrix for written data at LREC 2010 is as follows:
English is the most studied language. Secondly, come French and German languages and then Italian and Spanish.


Future

The LRE Map has been extended to Language Resources and Evaluation JournalLanguage Resources and Evaluation Journal Ed. Springer
/ref> and other conferences.


References

{{Reflist


External links


LREC Map research page
Natural language processing 2010 establishments