hyperloglog

package module
v0.0.0-...-1806d9b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 4, 2022 License: MIT Imports: 5 Imported by: 0

README

hyperloglog

Package hyperloglog implements the HyperLogLog algorithm for cardinality estimation. In English: it counts things. It counts things using very small amounts of memory compared to the number of objects it is counting.

For a full description of the algorithm, see the paper HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm by Flajolet, et. al. at http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf

For documentation see http://godoc.org/github.com/DataDog/hyperloglog

Included are a set of fast implementations for murmurhash suitable for use on 32 and 64 bit integers on little endian machines.

Quick start

$ go get github.com/DataDog/hyperloglog
$ cd $GOPATH/src/github.com/DataDog/hyperloglog
$ go test -test.v
$ go test -bench=.

License

hyperloglog is licensed under the MIT license.

Documentation

Overview

Package hyperloglog implements the HyperLogLog algorithm for cardinality estimation. In English: it counts things. It counts things using very small amounts of memory compared to the number of objects it is counting.

For a full description of the algorithm, see the paper HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm by Flajolet, et. al.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Murmur128

func Murmur128(i, j uint64) uint32

Murmur128 implements a fast version of the murmur hash function for two uint64s for little endian machines. Suitable for adding a 128bit value to an HLL counter.

func Murmur32

func Murmur32(i uint32) uint32

Murmur32 implements a fast version of the murmur hash function for uint32 for little endian machines. Suitable for adding 32bit integers to a HLL counter.

func Murmur64

func Murmur64(i uint64) uint32

Murmur64 implements a fast version of the murmur hash function for uint64 for little endian machines. Suitable for adding 64bit integers to a HLL counter.

func MurmurBytes

func MurmurBytes(bkey []byte) uint32

MurmurBytes implements a fast version of the murmur hash function for bytes for little endian machines. Suitable for adding strings to HLL counter.

func MurmurString

func MurmurString(key string) uint32

MurmurString implements a fast version of the murmur hash function for strings for little endian machines. Suitable for adding strings to HLL counter.

Types

type HyperLogLog

type HyperLogLog struct {
	M         uint    // Number of registers
	B         uint32  // Number of bits used to determine register index
	Alpha     float64 // Bias correction constant
	Registers []uint8
}

A HyperLogLog is a deterministic cardinality estimator. This version exports its fields so that it is suitable for saving eg. to a database.

func New

func New(registers uint) (*HyperLogLog, error)

New creates a HyperLogLog with the given number of registers. More registers leads to lower error in your estimated count, at the expense of memory.

Choose a power of two number of registers, depending on the amount of memory you're willing to use and the error you're willing to tolerate. Each register uses one byte of memory.

Standard error will be: σ ≈ 1.04 / sqrt(registers) The estimates provided by hyperloglog are expected to be within σ, 2σ, 3σ of the exact count in respectively 65%, 95%, 99% of all the cases.

func (*HyperLogLog) Add

func (h *HyperLogLog) Add(val uint32)

Add to the count. val should be a 32 bit unsigned integer from a good hash function.

func (*HyperLogLog) Count

func (h *HyperLogLog) Count() uint64

Count returns the estimated cardinality.

func (*HyperLogLog) CountWithoutLargeRangeCorrection

func (h *HyperLogLog) CountWithoutLargeRangeCorrection() uint64

CountWithoutLargeRangeCorrection returns the estimated cardinality, without applying the large range correction proposed by Flajolet et al. as it can lead to significant overcounting.

See https://github.com/DataDog/hyperloglog/pull/15

func (*HyperLogLog) Merge

func (h *HyperLogLog) Merge(other *HyperLogLog) error

Merge another HyperLogLog into this one. The number of registers in each must be the same.

func (*HyperLogLog) Reset

func (h *HyperLogLog) Reset()

Reset all internal variables and set the count to zero.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL