hyper

package module

v1.0.6 Latest Latest Go to latest Published: Apr 6, 2024 License: MIT Imports: 3 Imported by: 3

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/vitali-fedulov/hyper

Links

Open Source Insights

README ¶

Hashing N-dimensional float vectors

Search nearest neighbour vectors in n-dimensional space with hashes. There are no dependencies in this package.

The algorithm is based on the assumption that two real numbers can be considered equal within certain equality distance. Then quantization is used for comparison. To make sure points near or at quantization borders are also comparable, a vector can be discretized into more than one hash, as described here (also as PDF). The method indirectly clusters given vectors by hypercubes.

How to use

Provided a float vector []float64, use CubeSet and CentralCube functions to generate hypercube coordinates []int. The difference between the two functions is that one corresponds to hash-table record and the other to a query or vice versa, depending on performance/memory preference.
HashSet and DecimalHash/FNV1aHash are used to get corresponding hash set and central hash from the hypercube coordinates above. There are 2 alternative hash functions: DecimalHash and FNV1aHash. DecimalHash does not have collisions, but is not suitable for cases with large number of buckets or dimensions. FNV1aHash is applicable for all cases.

Example for similar image search and clustering.

Go doc for full code documentation.

Documentation ¶

Index ¶

type Cube
- func CentralCube(vector []float64, params Params) (central Cube)
- func (cube Cube) DecimalHash() (h uint64)
- func (cube Cube) FNV1aHash() uint64
type Cubes
- func CubeSet(vector []float64, params Params) (set Cubes)
- func (cubeSet Cubes) HashSet(hashFunc HashFunc) (hs []uint64)
type HashFunc
type Params

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Cube ¶

type Cube []int

Hypercube is represented by a slice of its coordinates.

func CentralCube ¶

func CentralCube(vector []float64, params Params) (central Cube)

CentralCube returns the hypercube containing the vector end. Arguments are the same as for the CubeSet function.

func (Cube) DecimalHash ¶

func (cube Cube) DecimalHash() (h uint64)

DecimalHash hashes hypercubes without collisions. IMPORTANT: To work correctly, the number of buckets must be less than 11 and the number of dimensions less than 20. Else at certain unexpected moment you might get a hash value overflow.

func (Cube) FNV1aHash ¶

func (cube Cube) FNV1aHash() uint64

FNV1aHash hashes hypercubes with rare collisions, and should be used when Decimal cannot be used because of very large number of buckets or dimensions.

type Cubes ¶

type Cubes []Cube

func CubeSet ¶

func CubeSet(vector []float64, params Params) (set Cubes)

CubeSet returns a set of hypercubes, which represent fuzzy discretization of one n-dimensional vector, as described in https://vitali-fedulov.github.io/similar.pictures/algorithm-for-hashing-high-dimensional-float-vectors.html One hupercube is defined by bucket numbers in each dimension. min and max are minimum and maximum possible values of the vector components. The assumption is that min and max are the same for all dimensions.

func (Cubes) HashSet ¶

func (cubeSet Cubes) HashSet(hashFunc HashFunc) (
	hs []uint64)

Hash64Set returns a set of hashes for a hypercube set and a concrete hash function.

type HashFunc ¶

type HashFunc func(cube Cube) uint64

HashFunc can be any function (also user-defined).

type Params ¶

type Params struct {
	// Value limits per dimension. For example 0, 255 for pixel values.
	Min, Max float64
	// Uncertainty interval expressed as a fraction of bucketWidth
	// (for example 0.25 for eps = 1/4 of bucketWidth).
	EpsPercent float64
	// Number of buckets per dimension.
	NumBuckets int
}

Parameters of space discretization.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL