Stemming
In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. Algorithms for stemming have been studied in computer science since the 1960s. Many search engines treat words with the same stem as synonyms as a kind of query expansion, a process called conflation. A computer program or subroutine that stems word may be called a ''stemming program'', ''stemming algorithm'', or ''stemmer''. Examples A stemmer for English operating on the stem ''cat'' should identify such strings as ''cats'', ''catlike'', and ''catty''. A stemming algorithm might also reduce the words ''fishing'', ''fished'', and ''fisher'' to the stem ''fish''. The stem need not be a word, for example ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Julie Beth Lovins
Julie Beth Lovins (October 19, 1945, in Washington, D.C. – January 26, 2018, in Mountain View, California) was a computational linguist who published The Lovins Stemming Algorithm - a type of stemming algorithm for word matching - in 1968. The Lovins Stemmer is a single pass, context sensitive stemmer, which removes endings based on the longest-match principle. The stemmer was the first to be published and was extremely well developed considering the date of its release, having been the main influence on a large amount of the future work in the area. -Adam G., et al Background Born on October 19, 1945, in Washington, D.C., Lovins grew up in Amherst, Massachusetts. Her father Gerald H. Lovins was an engineer and her mother, Miriam Lovins, a social services administrator. Lovins' brother Amory Lovins is the co-founder and chief environmental scientist of Rocky Mountain Institute. For her undergraduate degree, Lovins attended Pembroke College, the women's college of Brown Univ ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Lemmatisation
Lemmatization (or less commonly lemmatisation) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form. In computational linguistics, lemmatization is the algorithmic process of determining the lemma of a word based on its intended meaning. Unlike stemming, lemmatization depends on correctly identifying the intended part of speech and meaning of a word in a sentence, as well as within the larger context surrounding that sentence, such as neighbouring sentences or even an entire document. As a result, developing efficient lemmatization algorithms is an open area of research. Description In many languages, words appear in several ''inflected'' forms. For example, in English, the verb 'to walk' may appear as 'walk', 'walked', 'walks' or 'walking'. The base form, 'walk', that one might look up in a dictionary, is called the ''lemma'' for the word. The association o ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Martin Porter
Martin F. Porter is the inventor of the Porter Stemmer, one of the most common algorithms for stemming English, and the Snowball programming framework. His 1980 paper "An algorithm for suffix stripping", proposing the stemming algorithm, has been cited over 8000 times (Google Scholar). The Muscat search engine comes from research performed by Porter at the University of Cambridge and was commercialized in 1984 by Cambridge CD Publishing; it was subsequently sold to MAID which became the Dialog Corporation. Part of Dialog was then spun off to become BrightStation in 2000, which transitioned Open Muscat to a closed-source development model in 2001. Subsequently, a group of developers led by Porter initiated a project based on Open Muscat called Xapian and released the first official version on September 30, 2002. In 2000 he was awarded the Tony Kent Strix award. Porter read mathematics at St John's College, Cambridge (1963–66) and went to get a Diploma in Computer Science (196 ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Query Expansion
Query expansion (QE) is the process of reformulating a given query to improve retrieval performance in information retrieval operations, particularly in the context of query understanding. In the context of search engines, query expansion involves evaluating a user's input (what words were typed into the search query area, and sometimes other types of data) and expanding the search query to match additional documents. Query expansion involves techniques such as: * Finding synonyms of words, and searching for the synonyms as well * Finding semantically related words (e.g. antonyms, meronyms, hyponyms, hypernyms) * Finding all the various morphological forms of words by stemming each word in the search query * Fixing spelling errors and automatically searching for the corrected form or suggesting it in the results * Re-weighting the terms in the original query Query expansion is a methodology studied in the field of computer science, particularly within the realm of natural lan ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Snowball Programming Language
Snowball is a small string processing programming language designed for creating stemming algorithms for use in information retrieval."Snowball" Martin Porter, web page. Retrieved 2 September 2014. The name Snowball was chosen as a tribute to the programming language, "with which it shares the concept of string patterns delivering signals that are used to control the flow of the program." The creator of Snowball, Dr. Martin Porter, "toyed with the idea of calling it 'strippergram,'" because it "effectively provides a 'suffix STRIPPER GRAMmar.'" The Snowball compiler translates a Snowball sc ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Stochastic
Stochastic (; ) is the property of being well-described by a random probability distribution. ''Stochasticity'' and ''randomness'' are technically distinct concepts: the former refers to a modeling approach, while the latter describes phenomena; in everyday conversation, however, these terms are often used interchangeably. In probability theory, the formal concept of a '' stochastic process'' is also referred to as a ''random process''. Stochasticity is used in many different fields, including image processing, signal processing, computer science, information theory, telecommunications, chemistry, ecology, neuroscience, physics, and cryptography. It is also used in finance (e.g., stochastic oscillator), due to seemingly random changes in the different markets within the financial sector and in medicine, linguistics, music, media, colour theory, botany, manufacturing and geomorphology. Etymology The word ''stochastic'' in English was originally used as an adjective with the ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Free Software
Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribute it and any adapted versions. Free software is a matter of liberty, not price; all users are legally free to do what they want with their copies of a free software (including profiting from them) regardless of how much is paid to obtain the program.Selling Free Software (GNU) Computer programs are deemed "free" if they give end-users (not just the developer) ultimate control over the software and, subsequently, over their devices. The right to study and modify a computer program entails that the source code—the preferred ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
BSD Licenses
BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD license was used for its namesake, the Berkeley Software Distribution (BSD), a Unix-like operating system. The original version has since been revised, and its descendants are referred to as modified BSD licenses. BSD is both a license and a class of license (generally referred to as BSD-like). The modified BSD license (in wide use today) is very similar to the license originally used for the BSD version of Unix. The BSD license is a simple license that merely requires that all code retain the BSD license notice if redistributed in source code format, or reproduce the notice if redistributed in binary format. The BSD license (unlike some other licenses e.g. GPL) does not require that source code be distributed at all. Terms In addition to ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Christopher D
Christopher is the English version of a Europe-wide name derived from the Greek name Χριστόφορος (''Christophoros'' or '' Christoforos''). The constituent parts are Χριστός (''Christós''), "Christ" or "Anointed", and φέρειν (''phérein''), "to bear"; hence the "Christ-bearer". As a given name, 'Christopher' has been in use since the 10th century. In English, Christopher may be abbreviated as "Chris", "Topher", and sometimes " Kit". It was frequently the most popular male first name in the United Kingdom, having been in the top twenty in England and Wales from the 1940s until 1995, although it has since dropped out of the top 100. Within the United Kingdom, the name is most common in England and not so common in Wales, Scotland, or Northern Ireland. Cognates in other languages *Afrikaans: Christoffel, Christoforus * Albanian: Kristofer, Kristofor, Kristoforid, Kristo *Arabic: كريستوفر (''Krīstafor, Kristūfar, Krístufer''), اصطفر (''ʔi� ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Linguistic Morphology
In linguistics, morphology is the study of words, including the principles by which they are formed, and how they relate to one another within a language. Most approaches to morphology investigate the structure of words in terms of morphemes, which are the smallest units in a language with some independent meaning. Morphemes include roots that can exist as words by themselves, but also categories such as affixes that can only appear as part of a larger word. For example, in English the root ''catch'' and the suffix ''-ing'' are both morphemes; ''catch'' may appear as its own word, or it may be combined with ''-ing'' to form the new word ''catching''. Morphology also analyzes how words behave as parts of speech, and how they may be inflected to express grammatical categories including number, tense, and aspect. Concepts such as productivity are concerned with how speakers create words in specific contexts, which evolves over the history of a language. The basic fields of lingu ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Lookup Table
In computer science, a lookup table (LUT) is an array data structure, array that replaces runtime (program lifecycle phase), runtime computation of a mathematical function (mathematics), function with a simpler array indexing operation, in a process termed as ''direct addressing''. The savings in processing time can be significant, because retrieving a value from memory is often faster than carrying out an "expensive" computation or input/output operation. The tables may be precomputation, precalculated and stored in static memory allocation, static program storage, calculated (or prefetcher, "pre-fetched") as part of a program's initialization phase (''memoization''), or even stored in hardware in application-specific platforms. Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid) items in an array and, in some programming languages, may include pointer functions (or offsets to labels) to process the matching input. Fiel ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |