# Lossy Count Algorithm

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Lossy_Count_Algorithm
> Markdown URL: https://mediated.wiki/source/Lossy_Count_Algorithm.md
> Source: https://en.wikipedia.org/wiki/Lossy_Count_Algorithm
> Source revision: 1301293900
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

The '''lossy count algorithm''' is an [algorithm](/source/algorithm) to identify elements in a [data stream](/source/data_stream) whose [frequency](/source/frequency) exceeds a user-given threshold. The algorithm works by dividing the data stream into buckets for frequent items, but fill as many buckets as possible in main memory one time. The frequency computed by this algorithm is not always accurate, but has an error threshold that can be specified by the user. The [run time](/source/Time_complexity) and [space](/source/space_complexity) required by the algorithm is inversely proportional to the specified error threshold; hence the larger the error, the smaller the footprint. 

The algorithm was created by [computer scientist](/source/computer_scientist)s [Rajeev Motwani](/source/Rajeev_Motwani) and Gurmeet Singh Manku. It finds applications in computations where data takes the form of a continuous data stream instead of a finite [data set](/source/data_set), such as [network traffic measurement](/source/network_traffic_measurement)s, [web server](/source/web_server) logs, and [clickstream](/source/clickstream)s.

== Algorithm ==

The general algorithm is as follows<ref>{{Cite book|last=Han, Jiawei.|title=Data mining : concepts and techniques|date=2006|publisher=Elsevier|others=Kamber, Micheline.|isbn=978-0-08-047558-5|edition=2nd|location=Amsterdam|oclc=143252170}}</ref>
*'''Step 1:''' Divide the incoming data stream into buckets of width <math>w = 1/\epsilon</math>, where <math>\epsilon</math> is mentioned by user as the error bound (along with minimum support threshold = <math>\sigma</math>).
*'''Step 2:''' Increment the frequency count of each item according to the new bucket values. After each bucket, decrement all counters by 1.
*'''Step 3:''' Repeat – Update counters and after each bucket, decrement all counters by 1.

==References==
{{Refbegin}}
<references />

*{{Cite journal
 |first1= R 
 |last1= Motwani
 |first2= G.S
 |last2= Manku
 |title= Approximate frequency counts over data streams
 |journal=  VLDB '02 Proceedings of the 28th International Conference on Very Large Data Bases 
 |year= 2002
 |pages= 346&ndash;357
 }}
{{Refend}}

{{cs-stub}}

Category:Streaming algorithms

---
Adapted from the Wikipedia article [Lossy Count Algorithm](https://en.wikipedia.org/wiki/Lossy_Count_Algorithm) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Lossy_Count_Algorithm?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
