countminsketch

package
v0.0.0-...-3fd0ae8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 29, 2023 License: Apache-2.0 Imports: 2 Imported by: 0

README

Count-Min Sketch

The Count-Min Sketch is a probabilistic data structure used for approximate counting of events in a stream of data. It is particularly useful in situations where it is impractical or too expensive to store the entire stream of data, and an approximate count of events is sufficient. The Count-Min Sketch trades off a small probability of error for reduced memory requirements.

How it Works

  1. Initialization: The sketch is initialized with a fixed number of hash functions and an array (table) of counters.

  2. Incremental Counting: Each item in the stream is hashed using multiple hash functions, and the corresponding counters in the table are incremented. The hash functions spread the counts across the table, preventing collisions.

  3. Estimation: To estimate the count of a specific item, the algorithm hashes the item using the same hash functions and retrieves the minimum count among the corresponding counters. Since different items may hash to the same counter, there is a possibility of overcounting, but the algorithm minimizes this error.

Characteristics

  • Trade-off: The Count-Min Sketch allows for a trade-off between accuracy and memory. By adjusting the width and depth of the table, you can control the sketch's accuracy and memory consumption. A larger width or depth reduces the error probability but increases memory usage.

  • Approximation: The counts provided by the Count-Min Sketch are approximate and may have a small probability of error. However, the error is controlled and can be quantified based on the parameters chosen during initialization.

Use Cases

  1. Network Traffic Monitoring: Count-Min Sketches are used to estimate the frequency of different types of network packets in real-time, allowing for efficient network traffic monitoring.

  2. Web Analytics: In scenarios where counting exact user interactions or events is impractical due to the high volume of data, Count-Min Sketches can be used to estimate page views, clicks, or other user activities on a website.

  3. Frequency Analysis: When dealing with large datasets, Count-Min Sketches are employed for approximate frequency analysis of items, such as identifying popular search queries.

  4. Distributed Systems: In distributed systems, Count-Min Sketches can be used to estimate the cardinality of a set across multiple nodes without requiring them to share the entire set.

  5. Big Data Processing: Count-Min Sketches are useful in big data scenarios where maintaining an exact count of events is infeasible due to the sheer volume of data.

In summary, Count-Min Sketches provide a powerful and scalable solution for approximate counting in scenarios where maintaining precise counts is impractical or too resource-intensive.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CountMinSketch

type CountMinSketch struct {
	// contains filtered or unexported fields
}

func NewCountMinSketch

func NewCountMinSketch(width, depth int) (*CountMinSketch, error)

func (*CountMinSketch) Estimate

func (cms *CountMinSketch) Estimate(item []byte) int

func (*CountMinSketch) Increment

func (cms *CountMinSketch) Increment(item []byte)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL