Sørensen–Dice Coefficient
   HOME
*





Sørensen–Dice Coefficient
The Sørensen–Dice coefficient (see below for other names) is a statistic used to gauge the similarity of two samples. It was independently developed by the botanists Thorvald Sørensen and Lee Raymond Dice, who published in 1948 and 1945 respectively. Name The index is known by several other names, especially Sørensen–Dice index, Sørensen index and Dice's coefficient. Other variations include the "similarity coefficient" or "index", such as Dice similarity coefficient (DSC). Common alternate spellings for Sørensen are ''Sorenson'', ''Soerenson'' and ''Sörenson'', and all three can also be seen with the ''–sen'' ending. Other names include: * F1 score * Czekanowski's binary (non-quantitative) index * Measure of genetic similarity * Zijdenbos similarity index, referring to a 1994 paper of Zijdenbos et al. Formula Sørensen's original formula was intended to be applied to discrete data. Given two sets, X and Y, it is defined as : DSC = \frac where , ''X'', and , ''Y' ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Statistic
A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypothesis. The average (or mean) of sample values is a statistic. The term statistic is used both for the function and for the value of the function on a given sample. When a statistic is being used for a specific purpose, it may be referred to by a name indicating its purpose. When a statistic is used for estimating a population parameter, the statistic is called an ''estimator''. A population parameter is any characteristic of a population under study, but when it is not feasible to directly measure the value of a population parameter, statistical methods are used to infer the likely value of the parameter on the basis of a statistic computed from a sample taken from the population. For example, the sample mean is an unbiased estimator of ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Fuzzy Set
In mathematics, fuzzy sets (a.k.a. uncertain sets) are sets whose elements have degrees of membership. Fuzzy sets were introduced independently by Lotfi A. Zadeh in 1965 as an extension of the classical notion of set. At the same time, defined a more general kind of structure called an ''L''-relation, which he studied in an abstract algebraic context. Fuzzy relations, which are now used throughout fuzzy mathematics and have applications in areas such as linguistics , decision-making , and clustering , are special cases of ''L''-relations when ''L'' is the unit interval , 1 In classical set theory, the membership of elements in a set is assessed in binary terms according to a bivalent condition—an element either belongs or does not belong to the set. By contrast, fuzzy set theory permits the gradual assessment of the membership of elements in a set; this is described with the aid of a membership function valued in the real unit interval , 1 Fuzzy sets generali ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Overlap Coefficient
The overlap coefficient, or Szymkiewicz–Simpson coefficient, is a similarity measure that measures the overlap between two finite sets. It is related to the Jaccard index and is defined as the size of the intersection divided by the smaller of the size of the two sets: :\operatorname(X,Y) = \frac If set ''X'' is a subset In mathematics, Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they are ... of ''Y'' or the converse then the overlap coefficient is equal to 1. References {{Reflist Information retrieval techniques Information retrieval evaluation String metrics Measure theory Similarity measures ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Most Frequent K Characters
Most or Möst or ''variation'', may refer to: Places * Most, Kardzhali Province, a village in Bulgaria * Most (city), a city in the Czech Republic ** Most District, a district surrounding the city ** Most Basin, a lowland named after the city ** Autodrom Most, motorsport race track near Most * Möst, Khovd, a district in Khovd, Mongolia * Most, Mokronog-Trebelno, a settlement in Slovenia Other uses * Most (surname), including a list of people with the surname * Franz Welser-Möst (born 1960), Austrian conductor * ''Most'' (1969 film), a film about WWII Yugoslavian partisans * ''Most'' (2003 film), a Czech film * '' Most!'', 2018 Czech TV series * Most (grape) or Chasselas * most (Unix), a terminal pager on Unix and Unix-like systems * Most (wine) or Apfelwein * ''most'', an English degree determiner * Monolithic System Technology (MoST), a defunct American fabless semiconductor company See also * MOST (other) * The Most (other) * Must (other) * Moest ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Morisita's Overlap Index
Morisita's overlap index, named after Masaaki Morisita, is a statistical measure of dispersion of individuals in a population. It is used to compare overlap among samples (Morisita 1959). This formula is based on the assumption that increasing the size of the samples will increase the diversity because it will include different habitats (i.e. different faunas). Formula: : C_D= \frac : ''x''''i'' is the number of times species ''i'' is represented in the total ''X'' from one sample. : ''y''''i'' is the number of times species ''i'' is represented in the total ''Y'' from another sample. : ''D''''x'' and ''D''''y'' are the Simpson's index values for the ''x'' and ''y'' samples respectively. : ''S'' is the number of unique species ''C''''D'' = 0 if the two samples do not overlap in terms of species, and ''C''''D'' = 1 if the species occur in the same proportions in both samples. Horn's modification of the index is (Horn 1966): :C_H= \frac \,. Note, not to be confused with Mo ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Mantel Test
The Mantel test, named after Nathan Mantel, is a statistical test of the correlation between two matrices. The matrices must be of the same dimension; in most applications, they are matrices of interrelations between the same vectors of objects. The test was first published by Nathan Mantel, a biostatistician at the National Institutes of Health, in 1967. Accounts of it can be found in advanced statistics books (e.g., Sokal & Rohlf 1995). Usage The test is commonly used in ecology, where the data are usually estimates of the "distance" between objects such as species of organisms. For example, one matrix might contain estimates of the genetic distances (i.e., the amount of difference between two different genomes) between all possible pairs of species in the study, obtained by the methods of molecular systematics; while the other might contain estimates of the geographical distance between the ranges of each species to every other species. In this case, the hypothesis being test ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Hamming Distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to change one string into the other, or the minimum number of ''errors'' that could have transformed one string into the other. In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. It is named after the American mathematician Richard Hamming. A major application is in coding theory, more specifically to block codes, in which the equal-length strings are vectors over a finite field. Definition The Hamming distance between two equal-length strings of symbols is the number of positions at which the corresponding symbols are different. Examples The symbols may be letters, bits, or decimal digits, among other possibilities. For example, the Hamming distance between: ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Correlation
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are ''linearly'' related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve. Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example, there is a causal relationship, because extreme weather causes people to use more electricity for heating or cooling. However ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Hellinger Distance
In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of ''f''-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909. It is sometimes called the Jeffreys distance. Definition Measure theory To define the Hellinger distance in terms of measure theory, let P and Q denote two probability measures on a measure space \mathcal that are absolutely continuous with respect to an auxiliary measure \lambda. Such a measure always exists, e.g \lambda = (P + Q). The square of the Hellinger distance between P and Q is defined as the quantity :H^2(P,Q) = \frac\displaystyle \int_ \left(\sqrt - \sqrt\right)^2 \lambda(dx). Here, P(dx) = p(x)\lambda(dx) and Q(dx) = q(x) \lambda(dx), i.e. p and q(x) = are the Radon–Nikodym derivatives of ''P'' and ''Q'' respe ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Hugo Steinhaus
Hugo Dyonizy Steinhaus ( ; ; January 14, 1887 – February 25, 1972) was a Polish mathematician and educator. Steinhaus obtained his PhD under David Hilbert at Göttingen University in 1911 and later became a professor at the Jan Kazimierz University in Lwów (now Lviv, Ukraine), where he helped establish what later became known as the Lwów School of Mathematics. He is credited with "discovering" mathematician Stefan Banach, with whom he gave a notable contribution to functional analysis through the Banach–Steinhaus theorem. After World War II Steinhaus played an important part in the establishment of the mathematics department at Wrocław University and in the revival of Polish mathematics from the destruction of the war. Author of around 170 scientific articles and books, Steinhaus has left his legacy and contribution in many branches of mathematics, such as functional analysis, geometry, mathematical logic, and trigonometry. Notably he is regarded as one of the early found ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Bray–Curtis Dissimilarity
In ecology and biology, the Bray–Curtis dissimilarity is a statistic used to quantify the dissimilarity in species composition between two different sites, based on counts at each site. It is named after J. Roger Bray and John T. Curtis who first presented it in a paper in 1957. The Bray-Curtis dissimilarity BC_ between two sites j and k is : BC_ = 1 - \frac = 1 - \frac where N_ is the number of specimens of species i at site j, N_ is the number of specimens of species i at site k, and p the total number of species in the samples. In the alternative shorthand notation C_ is the sum of the lesser counts of each species. S_j and S_k are the total number of specimens counted at both sites. The index can be simplified to 1-2C/2 = 1-C when the abundances at each site are expressed as proportions, though the two forms of the equation only produce matching results when the total number of specimens counted at both sites are the same. Further treatment can be found in Legend ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Abundance (ecology)
In ecology, local abundance is the relative representation of a species in a particular ecosystem. It is usually measured as the number of individuals found per sample. The ratio of abundance of one species to one or multiple other species living in an ecosystem is referred to as relative species abundances. Both indicators are relevant for computing biodiversity. A variety of sampling methods are used to measure abundance. For larger animals, these may include spotlight counts, track counts and roadkill counts, as well as presence at monitoring stations. In many plant communities the abundances of plant species are measured by plant cover, i.e. the relative area covered by different plant species in a small plot. Abundance is in simplest terms usually measured by identifying and counting every individual of every species in a given sector. It is common for the distribution of species to be skewed so that a few species take up the bulk of individuals collected. Relative species ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]