topk

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 15, 2022 License: MIT Imports: 5 Imported by: 0

README

HeavyKeeper

This package implements the HeavyKeeper algorithm for efficiently finding top-K flows in an unbounded set of flows.

Paper: https://www.usenix.org/system/files/conference/atc18/atc18-gong.pdf

The reference implementation is pretty difficult to follow. I found the RedisBloom implementation to be far eaier to read, if you're interested in seeing a more battle-tested implementation. The RedisBloom implementation also has some optimizations that might be worth looking at (a decay lookup table, for example) and it supports arbitrary increments greater than one, which I've implemented here.

This implementation uses a default width and depth to simplify usage:

  • width = k * log(k) (minimum of 256)
  • height = log(k) (minimum of 3)

Usage

hk := topk.New(100, 0.9)

hk.Add("foo", 1)
hk.Add("bar", 5)
hk.Add("baz", 1)
hk.Add("baz", 1)

for _, fc := range hk.Top() {
  fmt.Printf("%s = %d\n", fc.Flow, fc.Count)
}

// bar = 5
// baz = 2
// foo = 1

count, ok := hk.Count("bar")
fmt.Println(count, ok)
// 5 true

Benchmarks

The algorithm itself is rather efficient on its own; I haven't invested any time in further optimizing things (yet).

goos: darwin
goarch: amd64
pkg: github.com/segmentio/topk
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkAdd/K=10-12         	17065675	        79.38 ns/op	       0 B/op	       0 allocs/op
BenchmarkAdd/K=50-12         	11193319	       106.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkAdd/K=100-12        	 9880362	       131.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkAdd/K=500-12        	 7442464	       159.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkAdd/K=1000-12       	 7125268	       167.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkAdd/K=5000-12       	 5797017	       206.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkAdd/K=10000-12      	 5218218	       233.2 ns/op	       0 B/op	       0 allocs/op

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type FlowCount

type FlowCount struct {
	Flow  string
	Count uint32
}

FlowCount is a tuple of flow and estimated count.

type HeavyKeeper

type HeavyKeeper struct {
	// contains filtered or unexported fields
}

HeavyKeeper implements the Top-K algorithm described in "HeavyKeeper: An Accurate Algorithm for Finding Top-k Elephant Flows" at https://www.usenix.org/system/files/conference/atc18/atc18-gong.pdf

HeavyKeeper is not safe for concurrent use.

func New

func New(k int, decay float64) *HeavyKeeper

New returns a HeavyKeeper that tracks the k largest flows. Decay determines the chance that a collision will cause the existing flow count to decay. A decay of 0.9 is a good starting point.

Width is `k * log(k)` (minimum of 256) and depth is `log(k)` (minimum of 3).

func (*HeavyKeeper) Count

func (hk *HeavyKeeper) Count(flow string) (count uint32, ok bool)

Count returns the estimated count of the given flow if it is in the top K flows.

func (*HeavyKeeper) DecayAll

func (hk *HeavyKeeper) DecayAll(pct float64)

DecayAll decays all flows by the given percentage.

func (*HeavyKeeper) Reset

func (hk *HeavyKeeper) Reset()

Reset returns the HeavyKeeper to a like-new state with no flows and no counts.

func (*HeavyKeeper) Sample

func (hk *HeavyKeeper) Sample(flow string, incr uint32) bool

Sample increments the given flow's count by the given amount. It returns true if the flow is in the top K elements.

func (*HeavyKeeper) Top

func (hk *HeavyKeeper) Top() []FlowCount

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL