F-measure

picture info	F-measure In statistical analysis of binary classification, the F-score or F-measure is a measure of a test's accuracy. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the number of all positive results, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. Precision is also known as positive predictive value, and recall is also known as sensitivity in diagnostic binary classification. The F1 score is the harmonic mean of the precision and recall. The more generic F_\beta score applies additional weights, valuing one of precision or recall more than the other. The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if either precision or recall are zero. Etymology The name F-measure is believed to be named after ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Precision (information Retrieval) In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space. Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while recall (also known as sensitivity) is the fraction of relevant instances that were retrieved. Both precision and recall are therefore based on relevance. Consider a computer program for recognizing dogs (the relevant element) in a digital photograph. Upon processing a picture which contains ten cats and twelve dogs, the program identifies eight dogs. Of the eight elements identified as dogs, only five actually are dogs (true positives), while the other three are cats (false positives). Seven dogs were missed (false negatives), and seven cats were correctly excluded (true negatives). The program's precision is then 5/8 (true positives / sel ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Named Entity Recognition Named-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. Most research on NER/NEE systems has been structured as taking an unannotated block of text, such as this one: And producing an annotated block of text that highlights the names of entities: In this example, a person name consisting of one token, a two-token company name and a temporal expression have been detected and classified. State-of-the-art NER systems for English produce near-human performance. For example, the best system entering MUC-7 scored 93.39% of F-measure while human annotators scored 97.60% and 96.95%. Named-entity recognition platforms Notable NER platforms include: ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Accuracy And Precision Accuracy and precision are two measures of '' observational error''. ''Accuracy'' is how close a given set of measurements (observations or readings) are to their '' true value'', while ''precision'' is how close the measurements are to each other. In other words, ''precision'' is a description of '' random errors'', a measure of statistical variability. ''Accuracy'' has two definitions: # More commonly, it is a description of only ''systematic errors'', a measure of statistical bias of a given measure of central tendency; low accuracy causes a difference between a result and a true value; ISO calls this ''trueness''. # Alternatively, ISO defines accuracy as describing a combination of both types of observational error (random and systematic), so high accuracy requires both high precision and high trueness. In the first, more common definition of "accuracy" above, the concept is independent of "precision", so a particular set of data can be said to be accurate, precise, both ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Harmonic Mean In mathematics, the harmonic mean is one of several kinds of average, and in particular, one of the Pythagorean means. It is sometimes appropriate for situations when the average rate is desired. The harmonic mean can be expressed as the reciprocal of the arithmetic mean of the reciprocals of the given set of observations. As a simple example, the harmonic mean of 1, 4, and 4 is : \left(\frac\right)^ = \frac = \frac = 2\,. Definition The harmonic mean ''H'' of the positive real numbers x_1, x_2, \ldots, x_n is defined to be :H = \frac = \frac = \left(\frac\right)^. The third formula in the above equation expresses the harmonic mean as the reciprocal of the arithmetic mean of the reciprocals. From the following formula: :H = \frac. it is more apparent that the harmonic mean is related to the arithmetic and geometric means. It is the reciprocal dual of the arithmetic mean for positive inputs: :1/H(1/x_1 \ldots 1/x_n) = A(x_1 \ldots x_n) The harmonic mean is a Schur-conca ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Matthews Correlation Coefficient In statistics, the phi coefficient (or mean square contingency coefficient and denoted by φ or rφ) is a measure of association for two binary variables. In machine learning, it is known as the Matthews correlation coefficient (MCC) and used as a measure of the quality of binary (two-class) classifications, introduced by biochemist Brian W. Matthews in 1975. Introduced by Karl Pearson, and also known as the ''Yule phi coefficient'' from its introduction by Udny Yule in 1912 this measure is similar to the Pearson correlation coefficient in its interpretation. In fact, a Pearson correlation coefficient estimated for two binary variables will return the phi coefficient. Two binary variables are considered positively associated if most of the data falls along the diagonal cells. In contrast, two binary variables are considered negatively associated if most of the data falls off the diagonal. If we have a 2×2 table for two random variables ''x'' and ''y'' where ''n''11, ''n'' ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Harmonic Mean In mathematics, the harmonic mean is one of several kinds of average, and in particular, one of the Pythagorean means. It is sometimes appropriate for situations when the average rate is desired. The harmonic mean can be expressed as the reciprocal of the arithmetic mean of the reciprocals of the given set of observations. As a simple example, the harmonic mean of 1, 4, and 4 is : \left(\frac\right)^ = \frac = \frac = 2\,. Definition The harmonic mean ''H'' of the positive real numbers x_1, x_2, \ldots, x_n is defined to be :H = \frac = \frac = \left(\frac\right)^. The third formula in the above equation expresses the harmonic mean as the reciprocal of the arithmetic mean of the reciprocals. From the following formula: :H = \frac. it is more apparent that the harmonic mean is related to the arithmetic and geometric means. It is the reciprocal dual of the arithmetic mean for positive inputs: :1/H(1/x_1 \ldots 1/x_n) = A(x_1 \ldots x_n) The harmonic mean is a Schur-conca ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	David Hand (statistician) David John Hand (born 30 June 1950 in Peterborough)Prof David Hand Authorised Biography at Debrett's People of Today, Debrett's ''People of Today''. Accessed 2011-01-27. is a British statistician. His research interests include multivariate statistics, statistical classification, classification methods, pattern recognition, computational statistics and the foundations of statistics. He has written technical books on statistics, data mining, finance, classification methods, and measuring wellbeing, as well as science popularisation books including ''The Improbability Principle: Why Coincidences, Miracles, and Rare Events Happen Every Day''; ''Dark Data: Why What You Don’t Know Matters''; and ''Statistics: A Very Short Introductio ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Word Segmentation Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. The problem is non-trivial, because while some written languages have explicit word boundary markers, such as the word spaces of written English and the distinctive initial, medial and final letter shapes of Arabic, such signals are sometimes ambiguous and not present in all written languages. Compare speech segmentation, the process of dividing speech into linguistically meaningful portions. Segmentation problems Word segmentation Word segmentation is the problem of dividing a string of written language into its component words. In English and many other languages using some form of the Latin alphabet, the space is a good approximation of a word divider (word delimiter), al ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Dice Coefficient Dice (singular die or dice) are small, throwable objects with marked sides that can rest in multiple positions. They are used for generating random values, commonly as part of tabletop games, including dice games, board games, role-playing games, and games of chance. A traditional die is a cube with each of its six faces marked with a different number of dots ( pips) from one to six. When thrown or rolled, the die comes to rest showing a random integer from one to six on its upper surface, with each value being equally likely. Dice may also have polyhedral or irregular shapes, may have faces marked with numerals or symbols instead of pips and may have their numbers carved out from the material of the dice instead of marked on it. Loaded dice are designed to favor some results over others for cheating or entertainment. History Dice have been used since before recorded history, and it is uncertain where they originated. It is theorized that dice developed from the practice of ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Fowlkes–Mallows Index The Fowlkes–Mallows index is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm), and also a metric to measure confusion matrices. This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher value for the Fowlkes–Mallows index indicates a greater similarity between the clusters and the benchmark classifications. It was invented by Bell Labs statisticians Edward Fowlkes and Collin Mallows in 1983. Preliminaries The Fowlkes–Mallows index, when results of two clustering algorithms are used to evaluate the results, is defined as : FM = \sqrt= \sqrt where TP is the number of true positives, FP is the number of false positives, and FN is the number of false negatives. TPR is the ''true positive rate'', also called ''sensitivity'' or '' recall'', and PPV is the ''positive predictive rate'', also known as '' ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]