Croatian Language Corpus
   HOME
*





Croatian Language Corpus
The Croatian Language Corpus (CLC) ( hr, Hrvatski jezični korpus, HJK) is a corpus of Croatian compiled at the Institute of Croatian Language and Linguistics ( IHJJ). Background The CLC was initially funded as a sub-project of the research program ''Riznica'' (''Croatian Language Repository'') by the Ministry of Science, Education, and Sports of the Republic of Croatia ( MZOŠ) (project no. 0212010) from May 2005. In a second development phase, since 2007, the further extension and development of the CLC was embedded within the research program ''The Croatian Language Repository'' (CLR) that was granted by the MZOŠ (cf. Ćavar and Brozović Rončević, 2012). Being a research program (PI Dunja Brozović Rončević) with numerous subsumed independent research projects that make use of the CLC, the corpus is mainly developed as a by-product of those research projects within the CLR. Currently Dunja Brozović Rončević and Damir Ćavar are in charge of the corpus development. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Text Corpus
In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical analysis and statistical hypothesis testing, hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. In Search engine (computing), search technology, a corpus is the collection of documents which is being searched. Overview A corpus may contain texts in a single language (''monolingual corpus'') or text data in multiple languages (''multilingual corpus''). In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotation. An example of annotating a corpus is part-of-speech tagging, or ''POS-tagging'', in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form o ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  



MORE