Counting Bloom Filter
   HOME
*





Counting Bloom Filter
A counting Bloom filter is a generalized data structure of Bloom filter, that is used to test whether a count number of a given element is smaller than a given threshold when a sequence of elements is given. As a generalized form of Bloom filter, false positive matches are possible, but false negatives are not – in other words, a query returns either "possibly bigger or equal than the threshold" or "definitely smaller than the threshold". Algorithm description Most of the parameters are defined same with Bloom filter, such as ''m, k. m'' is the number of counters in counting Bloom filter, which is expansion of ''m'' bits in Bloom filter. An ''empty counting Bloom filter'' is a ''m'' counters, all set to 0. Similar to Bloom filter, there must also be ''k'' different hash functions defined, each of which maps or hashes some set element to one of the ''m'' counter array positions, generating a uniform random distribution. It is also similar that ''k'' is a constant, much smaller th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Data Structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data, i.e., it is an algebraic structure about data. Usage Data structures serve as the basis for abstract data types (ADT). The ADT defines the logical form of the data type. The data structure implements the physical form of the data type. Different types of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks. For example, relational databases commonly use B-tree indexes for data retrieval, while compiler implementations usually use hash tables to look up identifiers. Data structures provide a means to manage large amounts of data efficiently for uses such as large databases and internet indexing services. Usually, ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Bloom Filter
A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with the counting Bloom filter variant); the more items added, the larger the probability of false positives. Bloom proposed the technique for applications where the amount of source data would require an impractically large amount of memory if "conventional" error-free hashing techniques were applied. He gave the example of a hyphenation algorithm for a dictionary of 500,000 words, out of which 90% follow simple hyphenation rules, but the remaining 10% require expensive disk accesses to retrieve specific hyphenation patterns. With sufficient core memory, an error-free hash coul ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Element (mathematics)
In mathematics, an element (or member) of a set is any one of the distinct objects that belong to that set. Sets Writing A = \ means that the elements of the set are the numbers 1, 2, 3 and 4. Sets of elements of , for example \, are subsets of . Sets can themselves be elements. For example, consider the set B = \. The elements of are ''not'' 1, 2, 3, and 4. Rather, there are only three elements of , namely the numbers 1 and 2, and the set \. The elements of a set can be anything. For example, C = \ is the set whose elements are the colors , and . Notation and terminology The relation "is an element of", also called set membership, is denoted by the symbol "∈". Writing :x \in A means that "''x'' is an element of ''A''". Equivalent expressions are "''x'' is a member of ''A''", "''x'' belongs to ''A''", "''x'' is in ''A''" and "''x'' lies in ''A''". The expressions "''A'' includes ''x''" and "''A'' contains ''x''" are also used to mea ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Type I And Type II Errors
In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the failure to reject a null hypothesis that is actually false (also known as a "false negative" finding or conclusion; example: "a guilty person is not convicted"). Much of statistical theory revolves around the minimization of one or both of these errors, though the complete elimination of either is a statistical impossibility if the outcome is not determined by a known, observable causal process. By selecting a low threshold (cut-off) value and modifying the alpha (α) level, the quality of the hypothesis test can be increased. The knowledge of type I errors and type II errors is widely used in medical science, biometrics and computer science. Intuitively, type I errors can be thought of as errors of ''commission'', i.e. the researcher unluck ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Hash Function
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually used to index a fixed-size table called a ''hash table''. Use of a hash function to index a hash table is called ''hashing'' or ''scatter storage addressing''. Hash functions and their associated hash tables are used in data storage and retrieval applications to access data in a small and nearly constant time per retrieval. They require an amount of storage space only fractionally greater than the total space required for the data or records themselves. Hashing is a computationally and storage space-efficient form of data access that avoids the non-constant access time of ordered and unordered lists and structured trees, and the often exponential storage requirements of direct access of state spaces of large or variable-length keys. Use of ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Map (mathematics)
In mathematics, a map or mapping is a function in its general sense. These terms may have originated as from the process of making a geographical map: ''mapping'' the Earth surface to a sheet of paper. The term ''map'' may be used to distinguish some special types of functions, such as homomorphisms. For example, a linear map is a homomorphism of vector spaces, while the term linear function may have this meaning or it may mean a linear polynomial. In category theory, a map may refer to a morphism. The term ''transformation'' can be used interchangeably, but ''transformation'' often refers to a function from a set to itself. There are also a few less common uses in logic and graph theory. Maps as functions In many branches of mathematics, the term ''map'' is used to mean a function, sometimes with a specific property of particular importance to that branch. For instance, a "map" is a " continuous function" in topology, a "linear transformation" in linear algebra, etc. Some ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


False Positive
A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test result incorrectly indicates the absence of a condition when it is actually present. These are the two kinds of errors in a binary test, in contrast to the two kinds of correct result (a and a ). They are also known in medicine as a false positive (or false negative) diagnosis, and in statistical classification as a false positive (or false negative) error. In statistical hypothesis testing the analogous concepts are known as type I and type II errors, where a positive result corresponds to rejecting the null hypothesis, and a negative result corresponds to not rejecting the null hypothesis. The terms are often used interchangeably, but there are differences in detail and interpretation due to the differences between medical testing and statist ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Count–min Sketch
In computing, the count–min sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions. The count–min sketch was invented in 2003 by Graham Cormode and S. Muthu Muthukrishnan and described by them in a 2005 paper. Count–min sketch is an alternative to count sketch and AMS sketch and can be considered an implementation of a counting Bloom filter (Fan et al., 1998) or multistage-filter. However, they are used differently and therefore sized differently: a count–min sketch typically has a sublinear number of cells, related to the desired approximation quality of the sketch, while a counting Bloom filter is more typically sized to match the number of elements in the set. Data structure The goal of the basic version of the count–min sketch is to con ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]