ROUGE (metric)

	ROUGE (metric) ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation. Metrics The following five evaluation metrics are available. ROUGE-N: Overlap of n-grams between the system and reference summaries. ROUGE-1 refers to the overlap of ''unigram'' ''(each word)'' between the system and reference summaries. ROUGE-2 refers to the overlap of ''bigrams'' between the system and reference summaries. ROUGE-L: Longest Common Subsequence (LCS) based statistics. Longest common subsequence problem takes into account sentence level structure similarity naturally and identifies longest co-occurring in sequence n-grams automatically. ROUGE-W: Weighted LCS-based statistics that favors consecutive L ... [...More Info...] [...Related Items...] OR:* [Wikipedia] [Google] [Baidu]
	Automatic Summarization Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Artificial intelligence algorithms are commonly developed and employed to achieve this, specialized for different types of data. Text summarization is usually implemented by natural language processing methods, designed to locate the most informative sentences in a given document. On the other hand, visual content can be summarized using computer vision algorithms. Image summarization is the subject of ongoing research; existing approaches typically attempt to display the most representative images from a given image collection, or generate a video that only includes the most important content from the entire collection. Video summarization algorithms identify and extract from the original video content the most important frames (''key-frames''), and/or the most important video segm ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Machine Translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another. On a basic level, MT performs mechanical substitution of words in one language for words in another, but that alone rarely produces a good translation because recognition of whole phrases and their closest counterparts in the target language is needed. Not all words in one language have equivalent words in another language, and many words have more than one meaning. Solving this problem with corpus statistical and neural techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies. Current machine translation software often allows for customizat ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Natural Language Processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves. Challenges in natural language processing frequently involve speech recognition, natural-language understanding, and natural-language generation. History Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence, t ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	N-gram In the fields of computational linguistics and probability, an ''n''-gram (sometimes also called Q-gram) is a contiguous sequence of ''n'' items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The ''n''-grams typically are collected from a text or speech corpus. When the items are words, -grams may also be called ''shingles''. Using Latin numerical prefixes, an ''n''-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram"); size 3 is a "trigram". English cardinal numbers are sometimes used, e.g., "four-gram", "five-gram", and so on. In computational biology, a polymer or oligomer of a known size is called a ''k''-mer instead of an ''n''-gram, with specific names using Greek numerical prefixes such as "monomer", "dimer", "trimer", "tetramer", "pentamer", etc., or English cardinal numbers, "one-mer", "two-mer", "three-mer", etc. Applications ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Longest Common Subsequence Problem The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring problem: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. The longest common subsequence problem is a classic computer science problem, the basis of data comparison programs such as the diff utility, and has applications in computational linguistics and bioinformatics. It is also widely used by revision control systems such as Git for reconciling multiple changes made to a revision-controlled collection of files. For example, consider the sequences (ABCD) and (ACBAD). They have 5 length-2 common subsequences: (AB), (AC), (AD), (BD), and (CD); 2 length-3 common subsequences: (ABD) and (ACD); and no longer common subsequences. So (ABD) and (ACD) are their longest common subsequences. Complexity For the ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Bigram A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an ''n''-gram for ''n''=2. The frequency distribution of every bigram in a string is commonly used for simple statistical analysis of text in many applications, including in computational linguistics, cryptography, speech recognition, and so on. ''Gappy bigrams'' or ''skipping bigrams'' are word pairs which allow gaps (perhaps avoiding connecting words, or allowing some simulation of dependencies, as in a dependency grammar). ''Head word bigrams'' are gappy bigrams with an explicit dependency relationship. Details Bigrams help provide the conditional probability of a token given the preceding token, when the relation of the conditional probability is applied: P(W_n, W_) = That is, the probability P() of a token W_n given the preceding token W_ is equal to the probability of their bigram, or the co-occurrence of the two tokens P ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	BLEU Bleu or BLEU may refer to: * the French word for blue * '' Three Colors: Blue'', a 1993 movie * BLEU (Bilingual Evaluation Understudy), a machine translation evaluation metric * Belgium–Luxembourg Economic Union * Blue cheese, a type of cheese * Parti bleu, 19th century political group in Quebec, Canada * ''Bleu'' (blue-rare), synonymous with "extra rare", indicating a barely-cooked meat preparation; very red and cold * ''Le Bleu'' (2001 album) album by Justin King People * Bleu (musician), a member of pop-group L.E.O. * Corbin Bleu, an American actor, model, dancer and vocalist * Deis, a character from the ''Breath of Fire'' role-playing videogame series who is known as "Bleu" in the English versions See also * Blue (other) * Lebleu (other) * Les Bleus (other) Les Bleus may refer to: National team of France ''Les Bleus'' (French for "The Blues") is often used in a French sporting context, and in particular may refer to: * France's national team: ... [...More Info...] [...Related Items...] OR:** [Wikipedia] [Google] [Baidu]
picture info	F1 Score In statistical analysis of binary classification, the F-score or F-measure is a measure of a test's accuracy. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the number of all positive results, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. Precision is also known as positive predictive value, and recall is also known as sensitivity in diagnostic binary classification. The F1 score is the harmonic mean of the precision and recall. The more generic F_\beta score applies additional weights, valuing one of precision or recall more than the other. The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if either precision or recall are zero. Etymology The name F-measure is believed to be named after ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	METEOR A meteoroid () is a small rocky or metallic body in outer space. Meteoroids are defined as objects significantly smaller than asteroids, ranging in size from grains to objects up to a meter wide. Objects smaller than this are classified as micrometeoroids or space dust. Most are fragments from comets or asteroids, whereas others are collision impact debris ejected from bodies such as the Moon or Mars. When a meteoroid, comet, or asteroid enters Earth's atmosphere at a speed typically in excess of , aerodynamic heating of that object produces a streak of light, both from the glowing object and the trail of glowing particles that it leaves in its wake. This phenomenon is called a meteor or "shooting star". Meteors typically become visible when they are about 100 km above sea level. A series of many meteors appearing seconds or minutes apart and appearing to originate from the same fixed point in the sky is called a meteor shower. A meteorite is the remains of a meteoroid th ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	NIST (metric) NIST is a method for evaluation of machine translation, evaluating the quality of text which has been translated using machine translation. Its name comes from the US National Institute of Standards and Technology. It is based on the Bilingual evaluation understudy, BLEU metric, but with some alterations. Where Bilingual evaluation understudy, BLEU simply calculates n-gram precision adding equal weight to each one, NIST also calculates how informative a particular n-gram is. That is to say when a correct n-gram is found, the rarer that n-gram is, the more weight it will be given. For example, if the bigram "on the" is correctly matched, it will receive lower weight than the correct matching of bigram "interesting calculations", as this is less likely to occur. NIST also differs from Bilingual evaluation understudy, BLEU in its calculation of the brevity penalty insofar as small variations in translation length do not impact the overall score as much. See also * BLEU * F1 score, F ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Phrase Chunking Phrase chunking is a phase of natural language processing that separates and segments a sentence into its subconstituents, such as noun, verb, and prepositional phrases, abbreviated as NP, VP, and PP, respectively. Typically, each subconstituent or chunk is denoted by brackets.Erik F. Tjong Kim Sang and Sabine Buchholz. 2000. Introduction to the conll-2000 shared task: chunking. In ''Proceedings of the 2nd workshop on Learning language in logic and the 4th CONLL'', pages 127–132, Morristown, NJ, USA. Association for Computational Linguistics. See also Terminology extraction Part-of-speech tagging Constituent (linguistics) In syntactic analysis, a constituent is a word or a group of words that function as a single unit within a hierarchical structure. The constituent structure of sentences is identified using ''tests for constituents''. These tests apply to a portio ... External linksTermExtractor [...More Info...] [...Related Items...] OR:* [Wikipedia] [Google] [Baidu]
	Word Error Rate Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system. The general difficulty of measuring performance lies in the fact that the recognized word sequence can have a different length from the reference word sequence (supposedly the correct one). The WER is derived from the Levenshtein distance, working at the word level instead of the phoneme level. The WER is a valuable tool for comparing different systems as well as for evaluating improvements within one system. This kind of measurement, however, provides no details on the nature of translation errors and further work is therefore required to identify the main source(s) of error and to focus any research effort. This problem is solved by first aligning the recognized word sequence with the reference (spoken) word sequence using dynamic string alignment. Examination of this issue is seen through a theory called the power law that states the correlation between perplexity an ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]