HOME
*





Czech National Corpus
The Czech National Corpus (CNC) (Czech : Český národní korpus) is a large electronic corpus of written and spoken Czech language, developed by the Institute of the Czech National Corpus (ICNC) in the Faculty of Arts at Charles University in Prague. The collection is used for teaching and research in corpus linguistics. The ICNC collaborates with over 200 researchers and students (mainly for spoken and parallel data acquisition), 270 publishers (as text providers), and other similar research projects. Areas of focus The Czech National Corpus focuses systematically on the following areas: * Synchronic written corpora: the SYN-series corpora maps the Czech language of the 20th and 21st century (esp. the last twenty years) and forms the core of the project. Texts are enriched with metadata, lemmatization, and morphological tagging. * Contemporary spontaneous spoken Czech: The ORAL-series corpora contain contemporary, spontaneous spoken language used in informal situations through ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Text Corpus
In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical analysis and statistical hypothesis testing, hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. In Search engine (computing), search technology, a corpus is the collection of documents which is being searched. Overview A corpus may contain texts in a single language (''monolingual corpus'') or text data in multiple languages (''multilingual corpus''). In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotation. An example of annotating a corpus is part-of-speech tagging, or ''POS-tagging'', in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form o ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Czech Language
Czech (; Czech ), historically also Bohemian (; ''lingua Bohemica'' in Latin), is a West Slavic language of the Czech–Slovak group, written in Latin script. Spoken by over 10 million people, it serves as the official language of the Czech Republic. Czech is closely related to Slovak, to the point of high mutual intelligibility, as well as to Polish to a lesser degree. Czech is a fusional language with a rich system of morphology and relatively flexible word order. Its vocabulary has been extensively influenced by Latin and German. The Czech–Slovak group developed within West Slavic in the high medieval period, and the standardization of Czech and Slovak within the Czech–Slovak dialect continuum emerged in the early modern period. In the later 18th to mid-19th century, the modern written standard became codified in the context of the Czech National Revival. The main non-standard variety, known as Common Czech, is based on the vernacular of Prague, but is now spoken as an ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Charles University
) , image_name = Carolinum_Logo.svg , image_size = 200px , established = , type = Public, Ancient , budget = 8.9 billion CZK , rector = Milena Králíčková , faculty = 4,057 , administrative_staff = 4,026 , students = 51,438 , undergrad = 32,520 , postgrad = 9,288 , doctoral = 7,428 , city = Prague , country = Czech Republic , campus = Urban , colors = , affiliations = Coimbra Group EUA Europaeum , website = Charles University ( cs, Univerzita Karlova, UK; la, Universitas Carolina; german: Karls-Universität), also known as Charles University in Prague or historically as the University of Prague ( la, Universitas Pragensis, links=no), is the oldest and largest university in the Czech Republic. It is one of the oldest universities in Europe in continuous operation. Today, the university consists of 17 faculties located in Prague, Hradec Králové, and Plzeň. Charles University belongs among the top three universities in Central and Eastern Europe. It is ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Prague
Prague ( ; cs, Praha ; german: Prag, ; la, Praga) is the capital and largest city in the Czech Republic, and the historical capital of Bohemia. On the Vltava river, Prague is home to about 1.3 million people. The city has a temperate oceanic climate, with relatively warm summers and chilly winters. Prague is a political, cultural, and economic hub of central Europe, with a rich history and Romanesque, Gothic, Renaissance and Baroque architectures. It was the capital of the Kingdom of Bohemia and residence of several Holy Roman Emperors, most notably Charles IV (r. 1346–1378). It was an important city to the Habsburg monarchy and Austro-Hungarian Empire. The city played major roles in the Bohemian and the Protestant Reformations, the Thirty Years' War and in 20th-century history as the capital of Czechoslovakia between the World Wars and the post-war Communist era. Prague is home to a number of well-known cultural attractions, many of which survived the ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Corpus Linguistics
Corpus linguistics is the study of language, study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. The text-corpus method uses the body of texts written in any natural language to derive the set of abstract rules which govern that language. Those results can be used to explore the relationships between that subject language and other languages which have undergone a similar analysis. The first such corpora were manually derived from source texts, but now that work is automated. Corpora have not only been used for linguistics research, they have also been used to compile dictionaries (starting with ''The American Heritage Dictionary of the English Language'' in 1969) and grammar guides, such as ''A Compreh ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification. It includes elements such as title, abstract, author, and keywords. * Structural metadata – metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters. It describes the types, versions, relationships, and other characteristics of digital materials. * Administrative metadata – the information to help manage a resource, like resource type, permissions, and when and how it was created. * Reference metadata – the information about the contents and quality of statistical data. * Statistical metadata – also called process data, may describe processes that collect, process, or produce st ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Lemmatization
Lemmatisation ( or lemmatization) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form. In computational linguistics, lemmatisation is the algorithmic process of determining the lemma of a word based on its intended meaning. Unlike stemming, lemmatisation depends on correctly identifying the intended part of speech and meaning of a word in a sentence, as well as within the larger context surrounding that sentence, such as neighboring sentences or even an entire document. As a result, developing efficient lemmatisation algorithms is an open area of research. Description In many languages, words appear in several ''inflected'' forms. For example, in English, the verb 'to walk' may appear as 'walk', 'walked', 'walks' or 'walking'. The base form, 'walk', that one might look up in a dictionary, is called the ''lemma'' for the word. The association of the base form ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Czech Republic
The Czech Republic, or simply Czechia, is a landlocked country in Central Europe. Historically known as Bohemia, it is bordered by Austria to the south, Germany to the west, Poland to the northeast, and Slovakia to the southeast. The Czech Republic has a hilly landscape that covers an area of with a mostly temperate continental and oceanic climate. The capital and largest city is Prague; other major cities and urban areas include Brno, Ostrava, Plzeň and Liberec. The Duchy of Bohemia was founded in the late 9th century under Great Moravia. It was formally recognized as an Imperial State of the Holy Roman Empire in 1002 and became a kingdom in 1198. Following the Battle of Mohács in 1526, the whole Crown of Bohemia was gradually integrated into the Habsburg monarchy. The Protestant Bohemian Revolt led to the Thirty Years' War. After the Battle of White Mountain, the Habsburgs consolidated their rule. With the dissolution of the Holy Empire in 1806, the Cro ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Corpora
Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ''Corpus'' (album), by Sebastian Santa Maria * Corpus Delicti (band), also known simply as Corpus Medicine * Corpus callosum, a structure in the brain * Corpus cavernosum (other), a pair of structures in human genitals * Corpus luteum, a temporary endocrine structure in mammals * Corpus gastricum, the Latin term referring to the body of the stomach * Corpus alienum, a foreign object originating outside the body * Corpus albicans * Corpora amylacea * Corpora arenacea Other uses * ''Corpus'' (Bernini), a 1650 sculpture of Christ by Gian Lorenzo Bernini * Corpus (museum), a human body themed museum in the Netherlands * Corpus Clock, a large sculptural clock * Corpus (dance troupe), a Canadian dance troupe * Corpus (typography) ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Linguistic Research
Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguistics is concerned with both the cognitive and social aspects of language. It is considered a scientific field as well as an academic discipline; it has been classified as a social science, natural science, cognitive science,Thagard, PaulCognitive Science, The Stanford Encyclopedia of Philosophy (Fall 2008 Edition), Edward N. Zalta (ed.). or part of the humanities. Traditional areas of linguistic analysis correspond to phenomena found in human linguistic systems, such as syntax (rules governing the structure of sentences); semantics (meaning); morphology (structure of words); phonetics (speech sounds and equivalent gestures in sign languages); phonology (the abstract sound system of a particular language); and pragmatics (how social contex ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]