Simple Matching Coefficient

	Simple Matching Coefficient The simple matching coefficient (SMC) or Rand similarity coefficient is a statistic used for comparing the similarity and diversity of sample sets. Given two objects, A and B, each with ''n'' binary attributes, SMC is defined as: : \begin \text & = \frac \\ pt& = \frac \end where: :M_ is the total number of attributes where ''A'' and ''B'' both have a value of 0. :M_ is the total number of attributes where ''A'' and ''B'' both have a value of 1. :M_ is the total number of attributes where the attribute of ''A'' is 0 and the attribute of ''B'' is 1. :M_ is the total number of attributes where the attribute of ''A'' is 1 and the attribute of ''B'' is 0. The simple matching distance (SMD), which measures dissimilarity between sample sets, is given by 1 - \text. SMC is linearly related to Hamann similarity: SMC = (Hamann+1)/2. Also, SMC = 1-D^2/n, where D^2 is the squared Euclidean distance between the two objects (binary vectors) and n is the number of attributes. The SMC ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Statistic A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypothesis. The average (or mean) of sample values is a statistic. The term statistic is used both for the function and for the value of the function on a given sample. When a statistic is being used for a specific purpose, it may be referred to by a name indicating its purpose. When a statistic is used for estimating a population parameter, the statistic is called an ''estimator''. A population parameter is any characteristic of a population under study, but when it is not feasible to directly measure the value of a population parameter, statistical methods are used to infer the likely value of the parameter on the basis of a statistic computed from a sample taken from the population. For example, the sample mean is an unbiased estimator of ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Similarity Measure In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such measures are in some sense the inverse of distance metrics: they take on large values for similar objects and either zero or a negative value for very dissimilar objects. Though, in more broad terms, a similarity function may also satisfy metric axioms. Cosine similarity is a commonly used similarity measure for real-valued vectors, used in (among other fields) information retrieval to score the similarity of documents in the vector space model. In machine learning, common kernel functions such as the RBF kernel can be viewed as similarity functions. Use in clustering In spectral clustering, a similarity, or affinity, measure is used to transform data to overcome difficulties related to lack of convexity in the shape of the data distribut ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Diversity Index A diversity index is a quantitative measure that reflects how many different types (such as species) there are in a dataset (a community), and that can simultaneously take into account the phylogenetic relations among the individuals distributed among those types, such as ''richness'', ''divergence'' or ''evenness''. These indices are statistical representations of biodiversity in different aspects ( richness, evenness, and dominance). Effective number of species or Hill numbers When diversity indices are used in ecology, the types of interest are usually species, but they can also be other categories, such as genera, families, functional types, or haplotypes. The entities of interest are usually individual plants or animals, and the measure of abundance can be, for example, number of individuals, biomass or coverage. In demography, the entities of interest can be people, and the types of interest various demographic groups. In information science, the entities can be chara ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Sample (statistics) In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attempt to collect samples that are representative of the population in question. Sampling has lower costs and faster data collection than measuring the entire population and can provide insights in cases where it is infeasible to measure an entire population. Each observation measures one or more properties (such as weight, location, colour or mass) of independent objects or individuals. In survey sampling, weights can be applied to the data to adjust for the sample design, particularly in stratified sampling. Results from probability theory and statistical theory are employed to guide the practice. In business and medical research, sampling is widely used for gathering information about a population. Acceptance sampling is used to determine ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Jaccard Index The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It was developed by Grove Karl Gilbert in 1884 as his ratio of verification (v) and now is frequently referred to as the Critical Success Index in meteorology. It was later developed independently by Paul Jaccard, originally giving the French name ''coefficient de communauté'', and independently formulated again by T. Tanimoto. Thus, the Tanimoto index or Tanimoto coefficient are also used in some fields. However, they are identical in generally taking the ratio of Intersection over Union. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets: : J(A,B) = = . Note that by design, 0\le J(A,B)\le 1. If ''A'' intersection ''B'' is empty, then ''J''(''A'',''B'') = 0. The Jaccard coefficient is widely used in c ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Dummy Variable (statistics) In regression analysis, a dummy variable (also known as indicator variable or just dummy) is one that takes the values 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome. For example, if we were studying the relationship between gender and income, we could use a dummy variable to represent the gender of each individual in the study. The variable would take on a value of 1 for males and 0 for females. Dummy variables are commonly used in regression analysis to represent categorical variables that have more than two levels, such as education level or occupation. In this case, multiple dummy variables would be created to represent each level of the variable, and only one dummy variable would take on a value of 1 for each observation. Dummy variables are useful because they allow us to include categorical variables in our analysis, which would otherwise be difficult to include due to their non-numeric nature. They can also h ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Probability Measure In mathematics, a probability measure is a real-valued function defined on a set of events in a probability space that satisfies measure properties such as ''countable additivity''. The difference between a probability measure and the more general notion of measure (which includes concepts like area or volume) is that a probability measure must assign value 1 to the entire probability space. Intuitively, the additivity property says that the probability assigned to the union of two disjoint events by the measure should be the sum of the probabilities of the events; for example, the value assigned to "1 or 2" in a throw of a dice should be the sum of the values assigned to "1" and "2". Probability measures have applications in diverse fields, from physics to finance and biology. Definition The requirements for a function \mu to be a probability measure on a probability space are that: * \mu must return results in the unit interval , 1 returning 0 for the empty set and 1 for t ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Rand Index The RAND Corporation (from the phrase "research and development") is an American nonprofit global policy think tank created in 1948 by Douglas Aircraft Company to offer research and analysis to the United States Armed Forces. It is financed by the U.S. government and private endowment, corporations, universities and private individuals. The company assists other governments, international organizations, private companies and foundations with a host of defense and non-defense issues, including healthcare. RAND aims for interdisciplinary and quantitative problem solving by translating theoretical concepts from formal economics and the physical sciences into novel applications in other areas, using applied science and operations research. Overview RAND has approximately 1,850 employees. Its American locations include: Santa Monica, California (headquarters); Arlington, Virginia; Pittsburgh, Pennsylvania; and Boston, Massachusetts. The RAND Gulf States Policy Institute has an ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Index Numbers In Statistics, Economics and Finance, an index is a statistical measure of change in a representative group of individual data points. These data may be derived from any number of sources, including company performance, prices, productivity, and employment. Economic indices track economic health from different perspectives. Influential global financial indices such as the Global Dow, and the NASDAQ Composite track the performance of selected large and powerful companies in order to evaluate and predict economic trends. The Dow Jones Industrial Average and the S&P 500 primarily track U.S. markets, though some legacy international companies are included. The consumer price index tracks the variation in prices for different consumer goods and services over time in a constant geographical location and is integral to calculations used to adjust salaries, bond interest rates, and tax thresholds for inflation. The GDP Deflator Index, or real GDP, measures the level of prices of all-n ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Measure Theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures ( length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many similarities and can often be treated together in a single mathematical context. Measures are foundational in probability theory, integration theory, and can be generalized to assume negative values, as with electrical charge. Far-reaching generalizations (such as spectral measures and projection-valued measures) of measure are widely used in quantum physics and physics in general. The intuition behind this concept dates back to ancient Greece, when Archimedes tried to calculate the area of a circle. But it was not until the late 19th and early 20th centuries that measure theory became a branch of mathematics. The foundations of modern measure theory were laid in the works of Émile Borel, Henri Lebesgue, Nikolai Luzin, Johann Radon, Const ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Clustering Criteria Clustering can refer to the following: In computing: Computer cluster, the technique of linking many computers together to act like a single computer Data cluster, an allocation of contiguous storage in databases and file systems Cluster analysis, the statistical task of grouping a set of objects in such a way that objects in the same group are placed closer together (such as the k-means clustering) In hash tables, the mapping of keys to nearby slots In economics: Business cluster, a geographic concentration of interconnected businesses, suppliers, and associated institutions in a particular field In graph theory: The formation of clusters of linked nodes in a network, measured by the clustering coefficient See also Cluster (other) may refer to: Science and technology Astronomy Cluster (spacecraft), constellation of four European Space Agency spacecraft * Asteroid cluster, a small asteroid family * Cluster II (spacecraft), a European Space Agency missio ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	String Metrics In mathematics and computer science, a string metric (also known as a string similarity metric or string distance function) is a metric that measures distance ("inverse similarity") between two text strings for approximate string matching or comparison and in fuzzy string searching. A requirement for a string ''metric'' (e.g. in contrast to string matching) is fulfillment of the triangle inequality. For example, the strings "Sam" and "Samuel" can be considered to be close. A string metric provides a number indicating an algorithm-specific indication of distance. The most widely known string metric is a rudimentary one called the Levenshtein distance (also known as edit distance). It operates between two input strings, returning a number equivalent to the number of substitutions and deletions needed in order to transform one input string into another. Simplistic string metrics such as Levenshtein distance have expanded to include phonetic, token, grammatical and character-based m ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]