HOME

TheInfoList



OR:

The lossy count algorithm is an
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
to identify elements in a
data stream In connection-oriented communication, a data stream is the transmission of a sequence of digitally encoded coherent signals to convey information. Typically, the transmitted symbols are grouped into a series of packets. Data streaming has bec ...
whose
frequency Frequency is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as ''temporal frequency'' for clarity, and is distinct from ''angular frequency''. Frequency is measured in hertz (Hz) which is eq ...
exceeds a user-given threshold. The algorithm works by dividing the data stream into
buckets A bucket is typically a watertight, vertical cylinder or truncated cone or square, with an open top and a flat bottom, attached to a semicircular carrying handle called the ''bail''. A bucket is usually an open-top container. In contrast, a ...
for frequent items, but fill as many buckets as possible in main memory one time. The frequency computed by this algorithm is not always accurate, but has an error threshold that can be specified by the user. The
run time Run(s) or RUN may refer to: Places * Run (island), one of the Banda Islands in Indonesia * Run (stream), a stream in the Dutch province of North Brabant People * Run (rapper), Joseph Simmons, now known as "Reverend Run", from the hip-hop group ...
and
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consider ...
required by the algorithm is inversely proportional to the specified error threshold; hence the larger the error, the smaller the footprint. The algorithm was created by
computer scientist A computer scientist is a person who is trained in the academic study of computer science. Computer scientists typically work on the theoretical side of computation, as opposed to the hardware side on which computer engineers mainly focus (al ...
s
Rajeev Motwani Rajeev Motwani (Hindi: राजीव मोटवानी , March 24, 1962 – June 5, 2009) was an Indian American professor of Computer Science at Stanford University whose research focused on theoretical computer science. He was an early ad ...
and Gurmeet Singh Manku. It finds applications in computations where data takes the form of a continuous data stream instead of a finite
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
, such as
network traffic measurement In computer networks, network traffic measurement is the process of measuring the amount and type of traffic on a particular network. This is especially important with regard to effective bandwidth management. Techniques Network performance could ...
s,
web server A web server is computer software and underlying hardware that accepts requests via HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiate ...
logs, and
clickstream A click path or clickstream is the sequence of hyperlinks one or more website visitors follows on a given site, presented in the order viewed. A visitor's click path may start within the website or at a separate third party website, often a search e ...
s.


Algorithm

The general algorithm is as follows *Step 1: Divide the incoming data stream into buckets of width w = 1/\epsilon, where \epsilon is mentioned by user as the error bound (along with minimum support threshold = \sigma). *Step 2: Increment the frequency count of each item according to the new bucket values. After each bucket, decrement all counters by 1. *Step 3: Repeat – Update counters and after each bucket, decrement all counters by 1.


References

* {{cs-stub Streaming algorithms