gomaddness

package module

v1.0.0 Latest Latest Go to latest Published: Mar 22, 2022 License: BSD-2-Clause Imports: 5 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/nlpodyssey/gomaddness

Links

Open Source Insights

README ¶

GoMADDNESS

Go package implementing MADDNESS (Multiply-ADDitioN-lESS) algorithm.

References

Davis Blalock, John Guttag. Multiplying Matrices Without Multiplying arXiv:2106.10860 [cs.LG]

Documentation ¶

Index ¶

type ArgMaxHeap
- func NewArgMaxHeap[F Float](v Vector[F]) *ArgMaxHeap[F]
type Bucket
type Buckets
- func (bs Buckets[_]) HeuristicSelectIndices() []int
- func (bs Buckets[F]) Prototypes() Vectors[F]
type Float
type Hash
- func TrainHash[F Float](examples Vectors[F]) *Hash[F]
- func (h *Hash[F]) Hash(v Vector[F]) uint8
type HashingTreeLevel
type LookupTable
type Maddness
- func TrainMaddness[F Float](dataExamples, queryVectors Vectors[F], numSubspaces int) *Maddness[F]
type Vector
type Vectors

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type ArgMaxHeap ¶

type ArgMaxHeap[F Float] struct {
	// contains filtered or unexported fields
}

ArgMaxHeap is a max-heap working on the indices of a Vector, without modifying the Vector itself.

func NewArgMaxHeap ¶

func NewArgMaxHeap[F Float](v Vector[F]) *ArgMaxHeap[F]

NewArgMaxHeap creates and initializes a new ArgMaxHeap object.

func (*ArgMaxHeap[_]) FirstArgsMax ¶

func (h *ArgMaxHeap[_]) FirstArgsMax(n int) []int

FirstArgsMax returns the first n arguments of the maxima (arg max) of v.

If n is greater than Len, only Len indices are returned.

func (*ArgMaxHeap[_]) Len ¶

func (h *ArgMaxHeap[_]) Len() int

Len is the number of indices in the collection.

func (*ArgMaxHeap[_]) Less ¶

func (h *ArgMaxHeap[_]) Less(i, j int) bool

Less reports whether the index at i must sort before the index of j. It returns true only if the vector's value at the i-th index is grater than the value at the j-th index.

func (*ArgMaxHeap[_]) Pop ¶

func (h *ArgMaxHeap[_]) Pop() any

Pop removes and returns the index of the maximum element (according to Less) from the heap.

func (*ArgMaxHeap[_]) Push ¶

func (h *ArgMaxHeap[_]) Push(any)

Push always panics: in order to simplify the implementation, it is not allowed to add new elements.

func (*ArgMaxHeap[_]) Swap ¶

func (h *ArgMaxHeap[_]) Swap(i, j int)

Swap swaps the i-th and j-th indices.

type Bucket ¶

type Bucket[F Float] struct {
	Level     int
	NodeIndex int
	Vectors   Vectors[F]
}

A Bucket is a set of Vectors, supporting the learning process of MADDNESS hash function parameters.

The tree Level, and the NodeIndex within that level, are included as a convenient for debugging or logging purposes.

type Buckets ¶

type Buckets[F Float] []*Bucket[F]

Buckets is a set of Bucket items. Usually, they all belong to the same level of the hashing tree, built during the learning process.

func (Buckets[_]) HeuristicSelectIndices ¶

func (bs Buckets[_]) HeuristicSelectIndices() []int

HeuristicSelectIndices selects a set of indices to be evaluated during the construction of a hashing-tree level.

It returns a maximum of top-four indices that contribute the most loss, summed across all Buckets.

func (Buckets[F]) Prototypes ¶

func (bs Buckets[F]) Prototypes() Vectors[F]

Prototypes creates the prototype Vectors.

For each Bucket, a prototype Vector is created by computing the mean of its sub-vectors.

type Float ¶

type Float interface {
	~float32 | ~float64
}

Float is a constraint that permits any floating-point type.

type Hash ¶

type Hash[F Float] struct {
	TreeLevels []*HashingTreeLevel[F]
	Prototypes Vectors[F]
}

Hash is the data structure for MADDNESS hash function. It holds the learned balanced binary regression tree and the prototype vectors.

func TrainHash ¶

func TrainHash[F Float](examples Vectors[F]) *Hash[F]

TrainHash runs the learning process for MADDNESS hash function parameters, and return a new trained Hash.

func (*Hash[F]) Hash ¶

func (h *Hash[F]) Hash(v Vector[F]) uint8

Hash maps the given vector to an index, applying MADDNESS hash function.

type HashingTreeLevel ¶

type HashingTreeLevel[F Float] struct {
	SplitIndex      int
	SplitThresholds Vector[F]
}

HashingTreeLevel is one level of the binary tree from a Hash.

type LookupTable ¶

type LookupTable[F Float] struct {
	Bias  F
	Scale F
	// Matrix, represented in row-major order, with row index = subspace,
	// and column index = prototype.
	Data []uint8
}

LookupTable holds a table of pre-computed dot products, quantized to 8 bits, and the parameters needed for de-quantization during the product quantization's aggregation step.

type Maddness ¶

type Maddness[F Float] struct {
	NumSubspaces  int
	VectorSize    int
	SubVectorSize int
	Hashes        []*Hash[F]
	LookupTables  []*LookupTable[F]
}

Maddness is the primary structure that holds parameters and implements methods of the whole MADDNESS algorithm.

func TrainMaddness ¶

func TrainMaddness[F Float](dataExamples, queryVectors Vectors[F], numSubspaces int) *Maddness[F]

TrainMaddness runs the learning process for MADDNESS product quantization and hash functions parameters, returning a new trained Maddness object.

func (*Maddness[F]) DotProduct ¶

func (m *Maddness[F]) DotProduct(lutIndices []uint16, queryVectorIndex int) F

DotProduct computes the approximated dot product between a data vector, identified by the lookup-table indices obtained from the vector's quantization, and the query vector represented by queryVectorIndex.

func (*Maddness[F]) LookupTableIndices ¶

func (m *Maddness[F]) LookupTableIndices(q []uint8) []uint16

LookupTableIndices transforms a list of hash indices, as returned from Quantize, into a corresponding list of lookup-table indices, for accessing LookupTable.Data.

func (*Maddness[F]) Quantize ¶

func (m *Maddness[F]) Quantize(v Vector[F]) []uint8

Quantize splits the given vector into subspaces and returns a slice of hash indices, one for each subspace.

func (*Maddness[F]) Reconstruct ¶

func (m *Maddness[F]) Reconstruct(q []uint8) Vector[F]

Reconstruct builds a vector from a list of hash indices, reconstructed using the learned prototypes for each subspace.

type Vector ¶

type Vector[F Float] []F

A Vector is a slice of floating point values.

func (Vector[F]) Add ¶

func (v Vector[F]) Add(other Vector[F]) Vector[F]

Add modifies v adding to it the values from other, element-wise. It returns v.

func (Vector[F]) AddSquares ¶

func (v Vector[F]) AddSquares(other Vector[F]) Vector[F]

AddSquares modifies v adding to it the squared values from other, element-wise. It returns v.

func (Vector[_]) ArgMin ¶

func (v Vector[_]) ArgMin() int

ArgMin returns the index of the minimum value from the vector.

If the identical minimum value is present more than once in the vector, the index of its first occurrence (i.e. the lowest index) is returned.

func (Vector[F]) Copy ¶

func (v Vector[F]) Copy() Vector[F]

Copy returns a copy of the vector.

func (Vector[F]) DivScalar ¶

func (v Vector[F]) DivScalar(x F) Vector[F]

DivScalar modifies v dividing each element by x. It returns v.

func (Vector[F]) DotProduct ¶

func (v Vector[F]) DotProduct(other Vector[F]) (y F)

DotProduct computes the dot product between v and other.

func (Vector[F]) Reverse ¶

func (v Vector[F]) Reverse() Vector[F]

Reverse reverses v in place. It returns v.

func (Vector[F]) Square ¶

func (v Vector[F]) Square() Vector[F]

Square modifies v computing the square of each value. It returns v.

func (Vector[F]) Sub ¶

func (v Vector[F]) Sub(other Vector[F]) Vector[F]

Sub modifies v subtracting from its values the values of other, element-wise. It returns v.

type Vectors ¶

type Vectors[F Float] []Vector[F]

Vectors is a collection of Vector objects.

All Vector objects MUST be of the same size.

Given a collection of N vectors, each of size M, this can also be seen as a matrix with dimensions N×M (N rows, M columns).

func (Vectors[F]) ColumnWiseVariance ¶

func (vs Vectors[F]) ColumnWiseVariance() Vector[F]

ColumnWiseVariance returns a new Vector containing the column-wise variance computed across all vectors.

func (Vectors[F]) Copy ¶

func (vs Vectors[F]) Copy() Vectors[F]

Copy returns a shallow copy of the collection.

func (Vectors[F]) CumulativeSSE ¶

func (vs Vectors[F]) CumulativeSSE() Vector[F]

CumulativeSSE returns a new Vector, computing the error sum of squares (SSE) over the set of Vectors.

func (Vectors[F]) Mean ¶

func (vs Vectors[F]) Mean() Vector[F]

Mean calculates and returns the mean vector.

func (Vectors[F]) OptimalSplitThreshold ¶

func (vs Vectors[F]) OptimalSplitThreshold(splitIndex int) (threshold, loss F)

OptimalSplitThreshold finds the optimal split threshold within the vectors, computed over values at the given split-index column.

func (Vectors[F]) Reverse ¶

func (vs Vectors[F]) Reverse() Vectors[F]

Reverse reverses vs in place. It returns vs.

func (Vectors[F]) SortByColumn ¶

func (vs Vectors[F]) SortByColumn(index int) Vectors[F]

SortByColumn sorts vs in place, according to the value at each vector's index (column), in ascending order. It returns vs.

func (Vectors[F]) SplitByThreshold ¶

func (vs Vectors[F]) SplitByThreshold(index int, threshold F) (lt, gte Vectors[F])

SplitByThreshold splits the collection of vectors by a threshold value, compared with the elements at the given index.

Two new collections are returned:

lt: vectors whose value at "index" is lower than "threshold"
gte: vectors whose value at "index" is greater than or equal to "threshold"

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL