ROUGE (metric)
   HOME

TheInfoList



OR:

ROUGE, or Recall-Oriented Understudy for Gisting Evaluation, is a set of metrics and a software package used for evaluating
automatic summarization Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content. Artificial intelligence algorithms are commo ...
and
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
software in
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
. The metrics compare an automatically produced summary or translation against a reference or a set of references (human-produced) summary or translation.


Metrics

The following five evaluation metrics are available. *ROUGE-N: Overlap of n-gramsLin, Chin-Yew and E.H. Hovy 2003. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May 27 - June 1, 2003.
/ref> between the system and reference summaries. **ROUGE-1 refers to the overlap of ''unigram'' ''(each word)'' between the system and reference summaries. **ROUGE-2 refers to the overlap of ''bigrams'' between the system and reference summaries. *ROUGE-L: Longest Common Subsequence (LCS)Lin, Chin-Yew and Franz Josef Och. 2004. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), Barcelona, Spain, July 21 - 26, 2004.
/ref> based statistics.
Longest common subsequence problem The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring problem: unlike substrings, sub ...
takes into account sentence level structure similarity naturally and identifies longest co-occurring in sequence n-grams automatically. *ROUGE-W: Weighted LCS-based statistics that favors consecutive LCSes . *ROUGE-S: Skip-
bigram A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an ''n''-gram for ''n''=2. The frequency distribution of every bigram in a string is commonly used f ...
based co-occurrence statistics. Skip-bigram is any pair of words in their sentence order. *ROUGE-SU: Skip-bigram plus unigram-based co-occurrence statistics.


See also

*
BLEU Bleu or BLEU may refer to: * the French word for blue * '' Three Colors: Blue'', a 1993 movie * BLEU (Bilingual Evaluation Understudy), a machine translation evaluation metric * Belgium–Luxembourg Economic Union * Blue cheese, a type of cheese ...
*
F-Measure In statistical analysis of binary classification, the F-score or F-measure is a measure of a test's accuracy. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the n ...
*
METEOR A meteoroid () is a small rocky or metallic body in outer space. Meteoroids are defined as objects significantly smaller than asteroids, ranging in size from grains to objects up to a meter wide. Objects smaller than this are classified as micr ...
* NIST (metric) * Noun-phrase chunking * Word error rate (WER)


References

{{Reflist


External links


ROUGE Usage TutorialJava Implementation of ROUGE
Machine translation Computational linguistics Natural language processing software Data mining