HOME
*





Corpus Manager
A corpus manager (corpus browser or corpus query system) is a tool for multilingual corpus analysis, which allows effective searching in corpora. A corpus manager usually represents a complex tool that allows one to perform searches for language forms or sequences. It may provide information about the context or allow the user to search by positional attributes, such as lemma, tag, etc. These are called concordances. Other features include the ability to search for Collocations, frequency statistics as well as metadata information about the processed text. The narrower meaning of corpus manager refers only to the server side or the corpus query engine, whereas the client side is simply called the user interface. A corpus manager can be software installed on a personal computer or it might be provided as a web service. List of corpus managers * BNCweb – a web-based interface for the British National Corpus * CQPweb - a web-based interface for the study of a large variety of co ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Corpus Linguistics
Corpus linguistics is the study of language, study of a language as that language is expressed in its text corpus (plural ''corpora''), its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. The text-corpus method uses the body of texts written in any natural language to derive the set of abstract rules which govern that language. Those results can be used to explore the relationships between that subject language and other languages which have undergone a similar analysis. The first such corpora were manually derived from source texts, but now that work is automated. Corpora have not only been used for linguistics research, they have also been used to compile dictionaries (starting with ''The American Heritage Dictionary of the English Language'' in 1969) and grammar guides, such as ''A Compreh ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Lemma (morphology)
In morphology and lexicography, a lemma (plural ''lemmas'' or ''lemmata'') is the canonical form, dictionary form, or citation form of a set of word forms. In English, for example, ''break'', ''breaks'', ''broke'', ''broken'' and ''breaking'' are forms of the same lexeme, with ''break'' as the lemma by which they are indexed. ''Lexeme'', in this context, refers to the set of all the inflected or alternating forms in the paradigm of a single word, and ''lemma'' refers to the particular form that is chosen by convention to represent the lexeme. Lemmas have special significance in highly inflected languages such as Arabic, Turkish and Russian. The process of determining the ''lemma'' for a given lexeme is called lemmatisation. The lemma can be viewed as the chief of the principal parts, although lemmatisation is at least partly arbitrary. Morphology The form of a word that is chosen to serve as the lemma is usually the least marked form, but there are several exceptions such as ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Part-of-speech Tagging
In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context. A simplified form of this is commonly taught to school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc. Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, by a set of descriptive tags. POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms. Principle Part-of-speech tagging is harder than just having a list of words and their parts of speech, because some words can represent more than one part of speech at different times, ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Concordance (publishing)
A concordance is an alphabetical list of the principal words used in a book or body of work, listing every instance of each word with its immediate context. Concordances have been compiled only for works of special importance, such as the Vedas, Bible, Qur'an or the works of Shakespeare, James Joyce or classical Latin and Greek authors, because of the time, difficulty, and expense involved in creating a concordance in the pre-computer era. A concordance is more than an index, with additional material such as commentary, definitions and topical cross-indexing which makes producing one a labor-intensive process even when assisted by computers. In the precomputing era, search technology was unavailable, and a concordance offered readers of long works such as the Bible something comparable to search results for every word that they would have been likely to search for. Today, the ability to combine the result of queries concerning multiple terms (such as searching for words near ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Collocation
In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words that make it up. This contrasts with an idiom, where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated. An example of a phraseological collocation is the expression ''strong tea''. While the same meaning could be conveyed by the roughly equivalent ''powerful tea'', this adjective does not modify ''tea'' frequently enough for English speakers to become accustomed to its co-occurrence and regard it as idiomatic or unmarked. (By way of counterexample, ''powerful'' is idiomatically preferred to ''strong'' when modifying a ''computer'' or a ''car''.) There are about six main types of collocations: adjective + noun, noun + noun (such as collective nouns), verb + noun, adverb ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

British National Corpus
The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time. It is used in corpus linguistic for analysis of corpora History The project to create the BNC involved the collaboration of three publishers (with the Oxford University Press as the lead collaborator, Longman and W. & R. Chambers), two universities (the University of Oxford and Lancaster University), and the British Library. The creation of the BNC started in 1991 under the management of the BNC consortium, and the project was finished by 1994. There have been no additions of new samples after 1994, but the BNC underwent slight revisions before the release of the second edition BNC World (2001) and the third edition BNC XML Edition (2007).
[...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Brigham Young University
Brigham Young University (BYU, sometimes referred to colloquially as The Y) is a private research university in Provo, Utah. It was founded in 1875 by religious leader Brigham Young and is sponsored by the Church of Jesus Christ of Latter-day Saints (LDS Church). BYU offers a variety of academic programs including those in the liberal arts, engineering, agriculture, management, physical and mathematical sciences, nursing, and law. It has 186 undergraduate majors, 64 master's programs, and 26 doctoral programs. It is broadly organized into 11 colleges or schools at its main Provo campus, with some colleges and divisions defining their own admission standards. The university also administers two satellite campuses, one in Jerusalem and one in Salt Lake City, while its parent organization the Church Educational System (CES) sponsors sister schools in Hawaii and Idaho. The university is accredited by the Northwest Commission on Colleges and Universities. Almost all BYU students ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


EXMARaLDA
EXMARaLDA (Extensible Markup Language for Discourse Annotation) is a set of free software tools for creating, managing and analyzing spoken language corpora. It consists of a transcription tool (comparable to tools like Praat or Transcriber), a tool for administering corpus meta data and a tool for doing queries ( KWIC searches) on spoken language corpora. EXMARaLDA is used for doing conversation and discourse analysis, dialectology, phonology and research into first and second language acquisition in children and adults. EXMARaLDA is based on the open standards XML and Unicode and programmed in Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List .... References * Schmidt, Thomas and Wörner, Kai (2009). "EXMARaLDA – Creating, analysing and sharing spoken language corpora for p ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Sketch Engine
Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing CZ s.r.o. since 2003. Its purpose is to enable people studying language behaviour ( lexicographers, researchers in corpus linguistics, translators or language learners) to search large text collections according to complex and linguistically motivated queries. Sketch Engine gained its name after one of the key features, word sketches: one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour. Currently, it supports and provides corpora in 90+ languages. History of development Sketch Engine is a product of Lexical Computing Limited, a company founded in 2003 by the lexicographer and research scientist Adam Kilgarriff. He started a collaboration with Pavel Rychlý, a computer scientist working at the Natural Language Processing Centre, Masaryk University, and the developer of Manatee and Bonito (two major parts of the software suite), and introduced ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

University Of Wollongong
The University of Wollongong (abbreviated as UOW) is an Australian public research university located in the coastal city of Wollongong, New South Wales, approximately 80 kilometres south of Sydney. As of 2017, the university had an enrolment of more than 32,000 students (including over 12,800 international students from 134 countries), an alumni base of more than 131,859 and over 2,400 staff members. In 1951, a division of the New South Wales University of Technology (known as the University of New South Wales from 1958) was established in Wollongong for the conduct of diploma courses. In 1961, the Wollongong University College of the University of New South Wales was constituted and the college was officially opened in 1962. In 1975 the University of Wollongong was established as an independent institution. Since its establishment, the university has conferred more than 120,000 degrees, diplomas and certificates. Its students, originally predominantly from the local Illawarra r ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


WordSmith (software)
WordSmith Tools is a software package primarily for linguists, in particular for work in the field of corpus linguistics. It is a collection of modules for searching patterns in a language. The software handles many languages. Development and acquisition The program suite was developed by the British linguist Mike Scott at the University of Liverpool and released as version 1.0 in 1996. It was based on MicroConcord co-developed by Mike Scott and Tim Johns, published by Oxford University Press in 1993. Versions 1.0 through 4.0 were sold exclusively by Oxford University Press, the current version 8.0 and previous versions are now also distributed by Lexical Analysis Software Limited. The software runs under Windows. WordSmith is a download-only product which is registered by entering a code costing 50 pounds sterling for a single user license. However, WordSmith 4.0 can now be downloaded and used free. Functionality and applications The core areas of the software package incl ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Linguists
Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguistics is concerned with both the cognitive and social aspects of language. It is considered a scientific field as well as an academic discipline; it has been classified as a social science, natural science, cognitive science,Thagard, PaulCognitive Science, The Stanford Encyclopedia of Philosophy (Fall 2008 Edition), Edward N. Zalta (ed.). or part of the humanities. Traditional areas of linguistic analysis correspond to phenomena found in human linguistic systems, such as syntax (rules governing the structure of sentences); semantics (meaning); morphology (structure of words); phonetics (speech sounds and equivalent gestures in sign languages); phonology (the abstract sound system of a particular language); and pragmatics (how social contex ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]