HOME

TheInfoList



OR:

The Bijankhan corpus ( fa, پیکرهٔ بی‌جن‌خان) is a
tagged Tagged may refer to: * Tagged (website), a social discovery website * Tagged (web series), an American teen psychological thriller web series {{disambiguation ...
corpus Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ...
that is suitable for natural language processing (NLP) research on the
Persian language Persian (), also known by its endonym Farsi (, ', ), is a Western Iranian language belonging to the Iranian branch of the Indo-Iranian subdivision of the Indo-European languages. Persian is a pluricentric language predominantly spoken a ...
. This collection is gathered from daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural, etc.; in about 4300 different subject categories. The corpus contains about 2.6 million manually tagged words with a tag set that contains 550 Persian part-of-speech tags. The Bijankhan corpus was created by the Database Research Group at the
University of Tehran The University of Tehran (Tehran University or UT, fa, دانشگاه تهران) is the most prominent university located in Tehran, Iran. Based on its historical, socio-cultural, and political pedigree, as well as its research and teaching pro ...
. The corpus is non- free in that it is not free for commercial use, although these restrictions vary by country. The Bijankhan corpus is named after
Mahmood Bijankhan Mahmood Bijankhan ( fa, محمود بی‌جن‌خان; born 1958 in Abadan) is an Iranian linguist and professor of linguistics at the University of Tehran. He is the creator of Bijankhan Corpus and a winner of Khwarizmi International Award. B ...
, professor of linguistics at the University of Tehran due to his contributions in this area.


See also

*
Hamshahri Corpus The Hamshahri Corpus ( fa, پیکره همشهری) is a sizable Persian language, Persian Text corpus, corpus based on the Iranian newspaper ''Hamshahri'', one of the first online Persian-language newspapers in Iran. It was initially collected and ...
* Persian Today Corpus


References


External links



Persian corpora Applied linguistics Linguistic research {{ie-lang-stub