Word Sketch
   HOME
*



picture info

Word Sketch
A word sketch is a one-page, automatic, corpus-derived summary of a word’s grammatical and collocational behaviour. Word sketches were first introduced by the British corpus linguist Adam KilgarriffKilgarriff, Adam; Rychlý, Pavel; Smrž, Pavel; Tugwell, David (2004) The Sketch Engine. Information Technology, 2004 and exploited within the Sketch Engine corpus management system. They are an extension of the general collocation concept used in corpus linguistics in that they group collocations according to particular grammatical relations (e.g. subject, object, modifier etc.). The collocation candidates in a word sketch are sorted either by their frequency or using a lexicographic association score like Dice, T-score or MI-score. Since the introduction, word sketches have been used by lexicographers to develop modern corpus-based dictionaries by major publishing houses including Oxford English Dictionary, Macmillan English Dictionary and comprising dozens of languages including ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Word Sketch Of Verb "read"
A word is a basic element of language that carries an semantics, objective or pragmatics, practical semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguistics, linguists on its definition and numerous attempts to find specific criteria of the concept remain controversial. Different standards have been proposed, depending on the theoretical background and descriptive context; these do not converge on a single definition. Some specific definitions of the term "word" are employed to convey its different meanings at different levels of description, for example based on phonology, phonological, grammar, grammatical or orthography, orthographic basis. Others suggest that the concept is simply a convention used in everyday situations. The concept of "word" is distinguished from that of a morpheme, which is the smallest unit of language that has a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Adam Kilgarriff
Adam Kilgarriff (12 February 1960 – 16 May 2015) was a corpus linguist, lexicographer, and co-author of Sketch Engine. Life His parents were booksellers. He spent one year as a volunteer in Kenya 1978–1979 then began studying at Cambridge University, graduating with a first class BA degree in philosophy and engineering in 1982. His first job was as a Housing Officer for the London and Quadrant Housing Trust. At the same time he studied at the South West London College. In 1987, he left his job and started an MSc in intelligent knowledge-based systems at the University of Sussex, from where he graduated the following year, continuing a DPhil in computational linguistics with thesis Polysemy (1992). In 2008 he made a return trip to Kenya with his old friend Raphael. He was also a participant in the Hastings Half Marathon for many years. In November 2014, he was diagnosed with stage 4 bowel cancer which he succumbed to in May 2015. After the diagnosis he started his own bl ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Sketch Engine
Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing CZ s.r.o. since 2003. Its purpose is to enable people studying language behaviour ( lexicographers, researchers in corpus linguistics, translators or language learners) to search large text collections according to complex and linguistically motivated queries. Sketch Engine gained its name after one of the key features, word sketches: one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour. Currently, it supports and provides corpora in 90+ languages. History of development Sketch Engine is a product of Lexical Computing Limited, a company founded in 2003 by the lexicographer and research scientist Adam Kilgarriff. He started a collaboration with Pavel Rychlý, a computer scientist working at the Natural Language Processing Centre, Masaryk University, and the developer of Manatee and Bonito (two major parts of the software suite), and introduced ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Collocation
In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words that make it up. This contrasts with an idiom, where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated. An example of a phraseological collocation is the expression ''strong tea''. While the same meaning could be conveyed by the roughly equivalent ''powerful tea'', this adjective does not modify ''tea'' frequently enough for English speakers to become accustomed to its co-occurrence and regard it as idiomatic or unmarked. (By way of counterexample, ''powerful'' is idiomatically preferred to ''strong'' when modifying a ''computer'' or a ''car''.) There are about six main types of collocations: adjective + noun, noun + noun (such as collective nouns), verb + noun, adverb ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Sørensen–Dice Coefficient
The Sørensen–Dice coefficient (see below for other names) is a statistic used to gauge the similarity of two samples. It was independently developed by the botanists Thorvald Sørensen and Lee Raymond Dice, who published in 1948 and 1945 respectively. Name The index is known by several other names, especially Sørensen–Dice index, Sørensen index and Dice's coefficient. Other variations include the "similarity coefficient" or "index", such as Dice similarity coefficient (DSC). Common alternate spellings for Sørensen are ''Sorenson'', ''Soerenson'' and ''Sörenson'', and all three can also be seen with the ''–sen'' ending. Other names include: * F1 score * Czekanowski's binary (non-quantitative) index * Measure of genetic similarity * Zijdenbos similarity index, referring to a 1994 paper of Zijdenbos et al. Formula Sørensen's original formula was intended to be applied to discrete data. Given two sets, X and Y, it is defined as : DSC = \frac where , ''X'', and , ''Y' ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

T-score
In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores. It is calculated by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. This process of converting a raw score into a standard score is called standardizing or normalizing (however, "normalizing" can refer to many types of ratios; see normalization for more). Standard scores are most commonly called ''z''-scores; the two terms may be used interchangeably, as they are in this article. Other equivalent terms in use include z-values, normal scores, standardized variables and pull in high energy physics. Computing a z-score requires knowledge of the mean and standard devia ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Text Corpus
In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical analysis and statistical hypothesis testing, hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. In Search engine (computing), search technology, a corpus is the collection of documents which is being searched. Overview A corpus may contain texts in a single language (''monolingual corpus'') or text data in multiple languages (''multilingual corpus''). In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotation. An example of annotating a corpus is part-of-speech tagging, or ''POS-tagging'', in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form o ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]