Slovenian National Corpus FidaPLUS is the 621 million words (tokens)

corpus Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ...

of the Slovenian language, gathered from selected texts written in Slovenian of different genres and styles, mainly from books and newspapers. The FidaPLUS database is an upgrade of the older (FIDA) corpus, which was developed between 1997 and 2000, with added texts that were published up to 2006 and was the result of the applicative research project of the Faculty of Arts, Faculty of Social Sciences, both

University of Ljubljana The University of Ljubljana ( sl, Univerza v Ljubljani, , la, Universitas Labacensis), often referred to as UL, is the oldest and largest university in Slovenia. It has approximately 39,000 enrolled students. History Beginnings Although certain ...

, and Jožef Stefan Institute's Department of Knowledge Technologies. Corpus is available via a corpus manager

Sketch Engine Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing CZ s.r.o. since 2003. Its purpose is to enable people studying language behaviour ( lexicographers, researchers in corpus linguistics, translators or lan ...

.FidaPLUS corpus in ''Sketch Engine''
/ref> This version FidaPLUS corpus contains Word sketches, an automatic corpus-derived overview of word's grammatical and collocational behaviour.

References

External links

Slovenian National Corpus website FidaPLUS
{{Corpus linguistics Corpora Slovene language Online databases Applied linguistics Linguistic research