Russian National Corpus
   HOME
*





Russian National Corpus
The Russian National Corpus (russian: Национальный корпус русского языка, , National Corpus of the Russian language) is a corpus of the Russian language that has been partially accessible through a query interface online since April 29, 2004. It is being created by the Institute of Russian language, Russian Academy of Sciences. It currently contains more than 1 billion word forms that are automatically lemmatized and POS-/grammeme-tagged, i.e. all the possible morphological analyses for each orthographic form are ascribed to it. Lemmata, POS, grammatical items, and their combinations are searchable. Additionally, 6 million word forms are in the subcorpus with manually resolved homonymy. The subcorpus with resolved morphological homonymy is also automatically accentuated. The whole corpus has a searchable tagging concerning lexical semantics (LS), including morphosemantic POS subclasses (proper noun, reflexive pronoun etc.), LS characteristics p ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Text Corpus
In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical analysis and statistical hypothesis testing, hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. In Search engine (computing), search technology, a corpus is the collection of documents which is being searched. Overview A corpus may contain texts in a single language (''monolingual corpus'') or text data in multiple languages (''multilingual corpus''). In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotation. An example of annotating a corpus is part-of-speech tagging, or ''POS-tagging'', in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form o ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Syntax
In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituency), agreement, the nature of crosslinguistic variation, and the relationship between form and meaning (semantics). There are numerous approaches to syntax that differ in their central assumptions and goals. Etymology The word ''syntax'' comes from Ancient Greek roots: "coordination", which consists of ''syn'', "together", and ''táxis'', "ordering". Topics The field of syntax contains a number of various topics that a syntactic theory is often designed to handle. The relation between the topics is treated differently in different theories, and some of them may not be considered to be distinct but instead to be derived from one another (i.e. word order can be seen as the result of movement rules derived from grammatical relations). Se ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Corpora
Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ''Corpus'' (album), by Sebastian Santa Maria * Corpus Delicti (band), also known simply as Corpus Medicine * Corpus callosum, a structure in the brain * Corpus cavernosum (other), a pair of structures in human genitals * Corpus luteum, a temporary endocrine structure in mammals * Corpus gastricum, the Latin term referring to the body of the stomach * Corpus alienum, a foreign object originating outside the body * Corpus albicans * Corpora amylacea * Corpora arenacea Other uses * ''Corpus'' (Bernini), a 1650 sculpture of Christ by Gian Lorenzo Bernini * Corpus (museum), a human body themed museum in the Netherlands * Corpus Clock, a large sculptural clock * Corpus (dance troupe), a Canadian dance troupe * Corpus (typography) ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




General Internet Corpus Of Russian
General Internet Corpus of Russian (GICR) is a corpus of Russian internet texts that has been accessible on request through an online query interface since 2013. The corpus includes rich text materials from the blogosphere, social networks, major news sources and literary magazines. Goals of the project The project has the status of an educational and scientific one, and many tasks of computational linguistics are solved by independent researchers and research groups with the materials obtained by GICR. While other corpus projects of Russian are focused on fiction and edited texts, General Internet Corpus provides linguists timely opportunity to learn the language as it is, with all the slang and regional peculiarities. Corpus gives the opportunity to carry out research in * Linguistic research of a wide range: dialectological research, study of word distribution, study of the language of the social networks, study of the influence of gender, age and other factors on the lang ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Dialect
The term dialect (from Latin , , from the Ancient Greek word , 'discourse', from , 'through' and , 'I speak') can refer to either of two distinctly different types of Linguistics, linguistic phenomena: One usage refers to a variety (linguistics), variety of a language that is a characteristic of a particular group of the language's speakers. Under this definition, the dialects or varieties of a particular language are closely related and, despite their differences, are most often largely Mutual intelligibility, mutually intelligible, especially if close to one another on the dialect continuum. The term is applied most often to regional speech patterns, but a dialect may also be defined by other factors, such as social class or ethnicity. A dialect that is associated with a particular social class can be termed a sociolect, a dialect that is associated with a particular ethnic group can be termed an ethnolect, and a geographical/regional dialect may be termed a regiolectWolfram, ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Prosody (poetry)
In poetry, metre ( Commonwealth spelling) or meter (American spelling; see spelling differences) is the basic rhythmic structure of a verse or lines in verse. Many traditional verse forms prescribe a specific verse metre, or a certain set of metres alternating in a particular order. The study and the actual use of metres and forms of versification are both known as prosody. (Within linguistics, " prosody" is used in a more general sense that includes not only poetic metre but also the rhythmic aspects of prose, whether formal or informal, that vary from language to language, and sometimes between poetic traditions.) Characteristics An assortment of features can be identified when classifying poetry and its metre. Qualitative versus quantitative metre The metre of most poetry of the Western world and elsewhere is based on patterns of syllables of particular types. The familiar type of metre in English-language poetry is called qualitative metre, with stressed syllables comin ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Poetry
Poetry (derived from the Greek ''poiesis'', "making"), also called verse, is a form of literature that uses aesthetic and often rhythmic qualities of language − such as phonaesthetics, sound symbolism, and metre − to evoke meanings in addition to, or in place of, a prosaic ostensible meaning. A poem is a literary composition, written by a poet, using this principle. Poetry has a long and varied history, evolving differentially across the globe. It dates back at least to prehistoric times with hunting poetry in Africa and to panegyric and elegiac court poetry of the empires of the Nile, Niger, and Volta River valleys. Some of the earliest written poetry in Africa occurs among the Pyramid Texts written during the 25th century BCE. The earliest surviving Western Asian epic poetry, the '' Epic of Gilgamesh'', was written in Sumerian. Early poems in the Eurasian continent evolved from folk songs such as the Chinese ''Shijing'', as well as religious hymns (the S ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Parallel Corpus
A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Library are two examples of dual-language series of texts. Reference Bibles may contain the original languages and a translation, or several translations by themselves, for ease of comparison and study; Origen's Hexapla (Greek for "sixfold") placed six versions of the Old Testament side by side. A famous example is the Rosetta Stone, whose discovery allowed the Ancient Egyptian language to begin being deciphered. Large collections of parallel texts are called parallel corpora (see text corpus). Alignments of parallel corpora at sentence level are prerequisite for many areas of linguistic research. During translation, sentences can be split, merged, deleted, inserted or reordered by the translator. This makes alignment a non-trivial task. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Igor Mel'čuk
Igor Aleksandrovič Mel'čuk, sometimes ''Melchuk'' (russian: Игорь Александрович Мельчук; uk, Ігор Олександрович Мельчук; born 1932), is a Soviet and Canadian linguist, a retired professor at the Department of Linguistics and Translation, Université de Montréal. Biography He graduated from the Moscow State University's Philological department and worked from 1956 till 1976 for the Institute of Linguistics in Moscow. He is known as one of the developers of Meaning–text theory with the seminal book published in 1974. He is also the author of '' Cours de morphologie générale'' in 5 volumes. After making statements in support of Soviet dissidents Andrey Sinyavsky and Yuli Daniel he was fired from the Institute, and subsequently emigrated from the Soviet Union in 1976. Since 1977 he has lived and worked in Canada. Melchuk is Jew Jews ( he, יְהוּדִים, , ) or Jewish people are an ethnoreligious group and na ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Treebank
In linguistics, a treebank is a parsed text corpus that annotates syntactic or semantic sentence structure. The construction of parsed corpora in the early 1990s revolutionized computational linguistics, which benefitted from large-scale empirical data. Etymology The term ''treebank'' was coined by linguist Geoffrey Leech in the 1980s, by analogy to other repositories such as a seedbank or bloodbank. This is because both syntactic and semantic structure are commonly represented compositionally as a tree structure. The term ''parsed corpus'' is often used interchangeably with the term treebank, with the emphasis on the primacy of sentences rather than trees. Construction Treebanks are often created on top of a corpus that has already been annotated with part-of-speech tags. In turn, treebanks are sometimes enhanced with semantic or other linguistic information. Treebanks can be created completely manually, where linguists annotate each sentence with syntactic structure, ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Russian Language
Russian (russian: русский язык, russkij jazyk, link=no, ) is an East Slavic languages, East Slavic language mainly spoken in Russia. It is the First language, native language of the Russians, and belongs to the Indo-European languages, Indo-European language family. It is one of four living East Slavic languages, and is also a part of the larger Balto-Slavic languages. Besides Russia itself, Russian is an official language in Belarus, Kazakhstan, and Kyrgyzstan, and is used widely as a lingua franca throughout Ukraine, the Caucasus, Central Asia, and to some extent in the Baltic states. It was the De facto#National languages, ''de facto'' language of the former Soviet Union,1977 Soviet Constitution, Constitution and Fundamental Law of the Union of Soviet Socialist Republics, 1977: Section II, Chapter 6, Article 36 and continues to be used in public life with varying proficiency in all of the post-Soviet states. Russian has over 258 million total speakers worldwide. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]