hyper

package module
v1.0.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 6, 2024 License: MIT Imports: 3 Imported by: 3

README

Hashing N-dimensional float vectors

Search nearest neighbour vectors in n-dimensional space with hashes. There are no dependencies in this package.

The algorithm is based on the assumption that two real numbers can be considered equal within certain equality distance. Then quantization is used for comparison. To make sure points near or at quantization borders are also comparable, a vector can be discretized into more than one hash, as described here (also as PDF). The method indirectly clusters given vectors by hypercubes.

How to use

  1. Provided a float vector []float64, use CubeSet and CentralCube functions to generate hypercube coordinates []int. The difference between the two functions is that one corresponds to hash-table record and the other to a query or vice versa, depending on performance/memory preference.
  2. HashSet and DecimalHash/FNV1aHash are used to get corresponding hash set and central hash from the hypercube coordinates above. There are 2 alternative hash functions: DecimalHash and FNV1aHash. DecimalHash does not have collisions, but is not suitable for cases with large number of buckets or dimensions. FNV1aHash is applicable for all cases.

Example for similar image search and clustering.

Go doc for full code documentation.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Cube

type Cube []int

Hypercube is represented by a slice of its coordinates.

func CentralCube

func CentralCube(vector []float64, params Params) (central Cube)

CentralCube returns the hypercube containing the vector end. Arguments are the same as for the CubeSet function.

func (Cube) DecimalHash

func (cube Cube) DecimalHash() (h uint64)

DecimalHash hashes hypercubes without collisions. IMPORTANT: To work correctly, the number of buckets must be less than 11 and the number of dimensions less than 20. Else at certain unexpected moment you might get a hash value overflow.

func (Cube) FNV1aHash

func (cube Cube) FNV1aHash() uint64

FNV1aHash hashes hypercubes with rare collisions, and should be used when Decimal cannot be used because of very large number of buckets or dimensions.

type Cubes

type Cubes []Cube

func CubeSet

func CubeSet(vector []float64, params Params) (set Cubes)

CubeSet returns a set of hypercubes, which represent fuzzy discretization of one n-dimensional vector, as described in https://vitali-fedulov.github.io/similar.pictures/algorithm-for-hashing-high-dimensional-float-vectors.html One hupercube is defined by bucket numbers in each dimension. min and max are minimum and maximum possible values of the vector components. The assumption is that min and max are the same for all dimensions.

func (Cubes) HashSet

func (cubeSet Cubes) HashSet(hashFunc HashFunc) (
	hs []uint64)

Hash64Set returns a set of hashes for a hypercube set and a concrete hash function.

type HashFunc

type HashFunc func(cube Cube) uint64

HashFunc can be any function (also user-defined).

type Params

type Params struct {
	// Value limits per dimension. For example 0, 255 for pixel values.
	Min, Max float64
	// Uncertainty interval expressed as a fraction of bucketWidth
	// (for example 0.25 for eps = 1/4 of bucketWidth).
	EpsPercent float64
	// Number of buckets per dimension.
	NumBuckets int
}

Parameters of space discretization.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL