gomaddness

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 22, 2022 License: BSD-2-Clause Imports: 5 Imported by: 1

README

GoMADDNESS

Go Reference License Go GitHub Workflow Codecov Go Report Card Maintainability

Go package implementing MADDNESS (Multiply-ADDitioN-lESS) algorithm.

References

  • Davis Blalock, John Guttag. Multiplying Matrices Without Multiplying arXiv:2106.10860 [cs.LG]

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ArgMaxHeap

type ArgMaxHeap[F Float] struct {
	// contains filtered or unexported fields
}

ArgMaxHeap is a max-heap working on the indices of a Vector, without modifying the Vector itself.

func NewArgMaxHeap

func NewArgMaxHeap[F Float](v Vector[F]) *ArgMaxHeap[F]

NewArgMaxHeap creates and initializes a new ArgMaxHeap object.

func (*ArgMaxHeap[_]) FirstArgsMax

func (h *ArgMaxHeap[_]) FirstArgsMax(n int) []int

FirstArgsMax returns the first n arguments of the maxima (arg max) of v.

If n is greater than Len, only Len indices are returned.

func (*ArgMaxHeap[_]) Len

func (h *ArgMaxHeap[_]) Len() int

Len is the number of indices in the collection.

func (*ArgMaxHeap[_]) Less

func (h *ArgMaxHeap[_]) Less(i, j int) bool

Less reports whether the index at i must sort before the index of j. It returns true only if the vector's value at the i-th index is grater than the value at the j-th index.

func (*ArgMaxHeap[_]) Pop

func (h *ArgMaxHeap[_]) Pop() any

Pop removes and returns the index of the maximum element (according to Less) from the heap.

func (*ArgMaxHeap[_]) Push

func (h *ArgMaxHeap[_]) Push(any)

Push always panics: in order to simplify the implementation, it is not allowed to add new elements.

func (*ArgMaxHeap[_]) Swap

func (h *ArgMaxHeap[_]) Swap(i, j int)

Swap swaps the i-th and j-th indices.

type Bucket

type Bucket[F Float] struct {
	Level     int
	NodeIndex int
	Vectors   Vectors[F]
}

A Bucket is a set of Vectors, supporting the learning process of MADDNESS hash function parameters.

The tree Level, and the NodeIndex within that level, are included as a convenient for debugging or logging purposes.

type Buckets

type Buckets[F Float] []*Bucket[F]

Buckets is a set of Bucket items. Usually, they all belong to the same level of the hashing tree, built during the learning process.

func (Buckets[_]) HeuristicSelectIndices

func (bs Buckets[_]) HeuristicSelectIndices() []int

HeuristicSelectIndices selects a set of indices to be evaluated during the construction of a hashing-tree level.

It returns a maximum of top-four indices that contribute the most loss, summed across all Buckets.

func (Buckets[F]) Prototypes

func (bs Buckets[F]) Prototypes() Vectors[F]

Prototypes creates the prototype Vectors.

For each Bucket, a prototype Vector is created by computing the mean of its sub-vectors.

type Float

type Float interface {
	~float32 | ~float64
}

Float is a constraint that permits any floating-point type.

type Hash

type Hash[F Float] struct {
	TreeLevels []*HashingTreeLevel[F]
	Prototypes Vectors[F]
}

Hash is the data structure for MADDNESS hash function. It holds the learned balanced binary regression tree and the prototype vectors.

func TrainHash

func TrainHash[F Float](examples Vectors[F]) *Hash[F]

TrainHash runs the learning process for MADDNESS hash function parameters, and return a new trained Hash.

func (*Hash[F]) Hash

func (h *Hash[F]) Hash(v Vector[F]) uint8

Hash maps the given vector to an index, applying MADDNESS hash function.

type HashingTreeLevel

type HashingTreeLevel[F Float] struct {
	SplitIndex      int
	SplitThresholds Vector[F]
}

HashingTreeLevel is one level of the binary tree from a Hash.

type LookupTable

type LookupTable[F Float] struct {
	Bias  F
	Scale F
	// Matrix, represented in row-major order, with row index = subspace,
	// and column index = prototype.
	Data []uint8
}

LookupTable holds a table of pre-computed dot products, quantized to 8 bits, and the parameters needed for de-quantization during the product quantization's aggregation step.

type Maddness

type Maddness[F Float] struct {
	NumSubspaces  int
	VectorSize    int
	SubVectorSize int
	Hashes        []*Hash[F]
	LookupTables  []*LookupTable[F]
}

Maddness is the primary structure that holds parameters and implements methods of the whole MADDNESS algorithm.

func TrainMaddness

func TrainMaddness[F Float](dataExamples, queryVectors Vectors[F], numSubspaces int) *Maddness[F]

TrainMaddness runs the learning process for MADDNESS product quantization and hash functions parameters, returning a new trained Maddness object.

func (*Maddness[F]) DotProduct

func (m *Maddness[F]) DotProduct(lutIndices []uint16, queryVectorIndex int) F

DotProduct computes the approximated dot product between a data vector, identified by the lookup-table indices obtained from the vector's quantization, and the query vector represented by queryVectorIndex.

func (*Maddness[F]) LookupTableIndices

func (m *Maddness[F]) LookupTableIndices(q []uint8) []uint16

LookupTableIndices transforms a list of hash indices, as returned from Quantize, into a corresponding list of lookup-table indices, for accessing LookupTable.Data.

func (*Maddness[F]) Quantize

func (m *Maddness[F]) Quantize(v Vector[F]) []uint8

Quantize splits the given vector into subspaces and returns a slice of hash indices, one for each subspace.

func (*Maddness[F]) Reconstruct

func (m *Maddness[F]) Reconstruct(q []uint8) Vector[F]

Reconstruct builds a vector from a list of hash indices, reconstructed using the learned prototypes for each subspace.

type Vector

type Vector[F Float] []F

A Vector is a slice of floating point values.

func (Vector[F]) Add

func (v Vector[F]) Add(other Vector[F]) Vector[F]

Add modifies v adding to it the values from other, element-wise. It returns v.

func (Vector[F]) AddSquares

func (v Vector[F]) AddSquares(other Vector[F]) Vector[F]

AddSquares modifies v adding to it the squared values from other, element-wise. It returns v.

func (Vector[_]) ArgMin

func (v Vector[_]) ArgMin() int

ArgMin returns the index of the minimum value from the vector.

If the identical minimum value is present more than once in the vector, the index of its first occurrence (i.e. the lowest index) is returned.

func (Vector[F]) Copy

func (v Vector[F]) Copy() Vector[F]

Copy returns a copy of the vector.

func (Vector[F]) DivScalar

func (v Vector[F]) DivScalar(x F) Vector[F]

DivScalar modifies v dividing each element by x. It returns v.

func (Vector[F]) DotProduct

func (v Vector[F]) DotProduct(other Vector[F]) (y F)

DotProduct computes the dot product between v and other.

func (Vector[F]) Reverse

func (v Vector[F]) Reverse() Vector[F]

Reverse reverses v in place. It returns v.

func (Vector[F]) Square

func (v Vector[F]) Square() Vector[F]

Square modifies v computing the square of each value. It returns v.

func (Vector[F]) Sub

func (v Vector[F]) Sub(other Vector[F]) Vector[F]

Sub modifies v subtracting from its values the values of other, element-wise. It returns v.

type Vectors

type Vectors[F Float] []Vector[F]

Vectors is a collection of Vector objects.

All Vector objects MUST be of the same size.

Given a collection of N vectors, each of size M, this can also be seen as a matrix with dimensions N×M (N rows, M columns).

func (Vectors[F]) ColumnWiseVariance

func (vs Vectors[F]) ColumnWiseVariance() Vector[F]

ColumnWiseVariance returns a new Vector containing the column-wise variance computed across all vectors.

func (Vectors[F]) Copy

func (vs Vectors[F]) Copy() Vectors[F]

Copy returns a shallow copy of the collection.

func (Vectors[F]) CumulativeSSE

func (vs Vectors[F]) CumulativeSSE() Vector[F]

CumulativeSSE returns a new Vector, computing the error sum of squares (SSE) over the set of Vectors.

func (Vectors[F]) Mean

func (vs Vectors[F]) Mean() Vector[F]

Mean calculates and returns the mean vector.

func (Vectors[F]) OptimalSplitThreshold

func (vs Vectors[F]) OptimalSplitThreshold(splitIndex int) (threshold, loss F)

OptimalSplitThreshold finds the optimal split threshold within the vectors, computed over values at the given split-index column.

func (Vectors[F]) Reverse

func (vs Vectors[F]) Reverse() Vectors[F]

Reverse reverses vs in place. It returns vs.

func (Vectors[F]) SortByColumn

func (vs Vectors[F]) SortByColumn(index int) Vectors[F]

SortByColumn sorts vs in place, according to the value at each vector's index (column), in ascending order. It returns vs.

func (Vectors[F]) SplitByThreshold

func (vs Vectors[F]) SplitByThreshold(index int, threshold F) (lt, gte Vectors[F])

SplitByThreshold splits the collection of vectors by a threshold value, compared with the elements at the given index.

Two new collections are returned:

  • lt: vectors whose value at "index" is lower than "threshold"
  • gte: vectors whose value at "index" is greater than or equal to "threshold"

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL