Flajolet–Martin Algorithm
The Flajolet–Martin algorithm is an algorithm for approximating the number of distinct elements in a stream with a single pass and space-consumption logarithmic in the maximal number of possible distinct elements in the stream (the count-distinct problem). The algorithm was introduced by Philippe Flajolet and G. Nigel Martin in their 1984 article "Probabilistic Counting Algorithms for Data Base Applications". Later it has been refined in "LogLog counting of large cardinalities" by Marianne Durand and Philippe Flajolet, and "HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm" by Philippe Flajolet et al. In their 2010 article "An optimal algorithm for the distinct elements problem", Daniel M. Kane, Jelani Nelson and David P. Woodruff give an improved algorithm, which uses nearly optimal space and has optimal ''O''(1) update and reporting times. The algorithm Assume that we are given a hash function \mathrm(x) that maps input x to integers in the range ; ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specifications for performing calculations and data processing. More advanced algorithms can perform automated deductions (referred to as automated reasoning) and use mathematical and logical tests to divert the code execution through various routes (referred to as automated decision-making). Using human characteristics as descriptors of machines in metaphorical ways was already practiced by Alan Turing with terms such as "memory", "search" and "stimulus". In contrast, a Heuristic (computer science), heuristic is an approach to problem solving that may not be fully specified or may not guarantee correct or optimal results, especially in problem domains where there is no well-defined correct or optimal result. As an effective method, an algorithm ca ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Count-distinct Problem
In computer science, the count-distinct problem (also known in applied mathematics as the cardinality estimation problem) is the problem of finding the number of distinct elements in a data stream with repeated elements. This is a well-known problem with numerous applications. The elements might represent IP addresses of packets passing through a router, unique visitors to a web site, elements in a large database, motifs in a DNA sequence, or elements of RFID/sensor networks. Formal definition : Instance: A stream of elements x_1,x_2,\ldots,x_s with repetitions, and an integer m . Let n be the number of distinct elements, namely n = , \left\, , and let these elements be \left\ . : Objective: Find an estimate \widehat of n using only m storage units, where m \ll n . An example of an instance for the cardinality estimation problem is the stream: a,b,a,c,d,b,d . For this instance, n = , \left\, = 4 . Naive solution The naive solution to the problem is as follows: ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Philippe Flajolet
Philippe Flajolet (; 1 December 1948 – 22 March 2011) was a French computer scientist. Biography A former student of École Polytechnique, Philippe Flajolet received his PhD in computer science from University Paris Diderot in 1973 and state doctorate from Paris-Sud 11 University in 1979. Most of Philippe Flajolet's research work was dedicated towards general methods for analyzing the computational complexity of algorithms, including the theory of average-case complexity. He introduced the theory of analytic combinatorics. With Robert Sedgewick of Princeton University, he wrote the first book-length treatment of the topic, the 2009 book entitled ''Analytic Combinatorics''. In 1993, together with Rainer Kemp, Helmut Prodinger and Robert Sedgewick, Flajolet initiated the successful series of workshops and conferences which was key to the development of a research community around the analysis of algorithms, and which evolved into the AofA—International Meeting on Combinatori ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Marianne Durand
Marianne () has been the national personification of the French Republic since the French Revolution, as a personification of liberty, equality, fraternity and reason, as well as a portrayal of the Goddess of Liberty. Marianne is displayed in many places in France and holds a place of honour in town halls and law courts. She is depicted in the ''Triumph of the Republic'', a bronze sculpture overlooking the Place de la Nation in Paris, as well as represented with another Parisian statue on the Place de la République. Her profile stands out on the official government logo of the country, appears on French euro coins and on French postage stamps. She was also featured on the former franc currency and is officially used on most government documents. Marianne is a significant republican symbol; her French monarchist equivalent is often Joan of Arc. As a national icon Marianne represents opposition to monarchy and the championship of freedom and democracy against all forms of o ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
HyperLogLog
HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset. Calculating the ''exact'' cardinality of the distinct elements of a multiset requires an amount of memory proportional to the cardinality, which is impractical for very large data sets. Probabilistic cardinality estimators, such as the HyperLogLog algorithm, use significantly less memory than this, at the cost of obtaining only an approximation of the cardinality. The HyperLogLog algorithm is able to estimate cardinalities of > 109 with a typical accuracy (standard error) of 2%, using 1.5 kB of memory. HyperLogLog is an extension of the earlier LogLog algorithm, itself deriving from the 1984 Flajolet–Martin algorithm. Terminology In the original paper by Flajolet ''et al.'' and in related literature on the count-distinct problem, the term "cardinality" is used to mean the number of distinct elements in a data stream with repeated elements. Howev ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Nancy, France
Nancy ; Lorraine Franconian: ''Nanzisch'' is the Prefectures in France, prefecture of the northeastern Departments of France, French department of Meurthe-et-Moselle. It was the capital of the Duchy of Lorraine, which was Lorraine and Barrois, annexed by France under King Louis XV in 1766 and replaced by a Provinces of France, province, with Nancy maintained as capital. Following its rise to prominence in the Age of Enlightenment, it was nicknamed the "capital of Eastern France" in the late 19th century. The metropolitan area of Nancy had a population of 511,257 inhabitants at the 2018 census, making it the 16th-largest functional area (France), functional urban area in France and Lorraine's largest. The population of the city of Nancy proper is 104,885. The motto of the city is , —a reference to the thistle, which is a symbol of Lorraine. Place Stanislas, a large square built between 1752 and 1756 by architect Emmanuel Héré under the direction of Stanislaus I of Poland to lin ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Jelani Nelson
Jelani Osei Nelson ( am, ጄላኒ ኔልሰን; born June 28, 1984) is an Ethiopian-American Professor of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He won the 2014 Presidential Early Career Award for Scientists and Engineers. Nelson is the creator of ''AddisCoder'', a computer science summer program for Ethiopian high school students in Addis Ababa. Early life and education Nelson was born to an Ethiopian mother and an African-American father in Los Angeles, then grew up in St. Thomas, U.S. Virgin Islands. He studied mathematics and computer science at the Massachusetts Institute of Technology and remained there to complete his doctoral studies in computer science. His Master's dissertation, ''External-Memory Search Trees with Fast Insertions'', was supervised by Bradley C. Kuszmaul and Charles E. Leiserson. He was a member of the theory of computation group, working on efficient algorithms for massive datasets. His doctoral dissert ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Hash Function
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually used to index a fixed-size table called a ''hash table''. Use of a hash function to index a hash table is called ''hashing'' or ''scatter storage addressing''. Hash functions and their associated hash tables are used in data storage and retrieval applications to access data in a small and nearly constant time per retrieval. They require an amount of storage space only fractionally greater than the total space required for the data or records themselves. Hashing is a computationally and storage space-efficient form of data access that avoids the non-constant access time of ordered and unordered lists and structured trees, and the often exponential storage requirements of direct access of state spaces of large or variable-length keys. Use of ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Discrete Uniform Distribution
In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution wherein a finite number of values are equally likely to be observed; every one of ''n'' values has equal probability 1/''n''. Another way of saying "discrete uniform distribution" would be "a known, finite number of outcomes equally likely to happen". A simple example of the discrete uniform distribution is throwing a fair dice. The possible values are 1, 2, 3, 4, 5, 6, and each time the die is thrown the probability of a given score is 1/6. If two dice are thrown and their values added, the resulting distribution is no longer uniform because not all sums have equal probability. Although it is convenient to describe discrete uniform distributions over integers, such as this, one can also consider discrete uniform distributions over any finite set. For instance, a random permutation is a permutation generated uniformly from the permutations of a given length, and a unif ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Multiset
In mathematics, a multiset (or bag, or mset) is a modification of the concept of a set that, unlike a set, allows for multiple instances for each of its elements. The number of instances given for each element is called the multiplicity of that element in the multiset. As a consequence, an infinite number of multisets exist which contain only elements and , but vary in the multiplicities of their elements: * The set contains only elements and , each having multiplicity 1 when is seen as a multiset. * In the multiset , the element has multiplicity 2, and has multiplicity 1. * In the multiset , and both have multiplicity 3. These objects are all different when viewed as multisets, although they are the same set, since they all consist of the same elements. As with sets, and in contrast to tuples, order does not matter in discriminating multisets, so and denote the same multiset. To distinguish between sets and multisets, a notation that incorporates square brackets is s ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Median
In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic feature of the median in describing data compared to the mean (often simply described as the "average") is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of a "typical" value. Median income, for example, may be a better way to suggest what a "typical" income is, because income distribution can be very skewed. The median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data are contaminated, the median is not an arbitrarily large or small result. Finite data set of numbers The median of a finite list of numbers is the "middle" number, when those numbers are list ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Harmonic Mean
In mathematics, the harmonic mean is one of several kinds of average, and in particular, one of the Pythagorean means. It is sometimes appropriate for situations when the average rate is desired. The harmonic mean can be expressed as the reciprocal of the arithmetic mean of the reciprocals of the given set of observations. As a simple example, the harmonic mean of 1, 4, and 4 is : \left(\frac\right)^ = \frac = \frac = 2\,. Definition The harmonic mean ''H'' of the positive real numbers x_1, x_2, \ldots, x_n is defined to be :H = \frac = \frac = \left(\frac\right)^. The third formula in the above equation expresses the harmonic mean as the reciprocal of the arithmetic mean of the reciprocals. From the following formula: :H = \frac. it is more apparent that the harmonic mean is related to the arithmetic and geometric means. It is the reciprocal dual of the arithmetic mean for positive inputs: :1/H(1/x_1 \ldots 1/x_n) = A(x_1 \ldots x_n) The harmonic mean is a Schur-con ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |