Documentation ¶
Overview ¶
sparsevector contains a number of implementations of SparseVectors, with a focus on dot products and cosines.
The fastest implementation is SparseVectorUint32. This works by having parallel arrays for the vector element indices and values. The arrays are sorted in parallel when the vector is created to allow for a linear scan through the vectors when calculating Cos() and Dot(). The indices and values are separate to optimise scanning through the indices.
Next in performance is GenSparseVector. This is similar to SparseVectorUint32 but the index can be any type for which you can implement VectorIndex. It works similarly to SparseVectorUint32 but is slowed down by accessing index values via the VectorIndex interface.
The slowest implementation is MapSparseVector. This is implemented using a map[uint32]Value. I view this as the baseline implementation. Since performance depends very much on your data and what you're doing with it I'd advise testing with each of the implementations to find which is fastest for you.
Index ¶
- type GenSparseVector
- func (sv1 *GenSparseVector) Add(v2 Vector) Vector
- func (sv *GenSparseVector) AddConst(toAdd Value)
- func (sv1 *GenSparseVector) Cos(svi2 Vector) Value
- func (sv1 *GenSparseVector) Dot(svi2 Vector) Value
- func (sv *GenSparseVector) GetIndex() VectorIndex
- func (sv *GenSparseVector) Iter(f func(index interface{}, value Value))
- func (sv *GenSparseVector) IterUpdate(f func(index interface{}, value Value) Value)
- func (v *GenSparseVector) Mag() Value
- func (sv *GenSparseVector) Mean() Value
- func (sv *GenSparseVector) Mult(l Value)
- func (sv1 *GenSparseVector) Sub(v2 Vector) Vector
- func (sv *GenSparseVector) SubConst(toSub Value)
- type IndexSparseVector
- type IntIndex
- func (si IntIndex) Append(idx interface{}) VectorIndex
- func (si IntIndex) GetAtLocation(location int) interface{}
- func (si IntIndex) Len() int
- func (si IntIndex) Less(i, j int) bool
- func (si IntIndex) LessThanOther(i int, sii2 VectorIndex, j int) bool
- func (si IntIndex) New(l int) VectorIndex
- func (si IntIndex) Swap(i, j int)
- type MapSparseVector
- type SparseVector
- type SparseVectorUint32
- func (sv1 *SparseVectorUint32) Add(sv2 Vector) Vector
- func (sv *SparseVectorUint32) AddConst(toAdd Value)
- func (sv1 *SparseVectorUint32) Cos(sv2 Vector) Value
- func (sv1 *SparseVectorUint32) Dot(sv2in Vector) Value
- func (sv *SparseVectorUint32) GetIndices() []uint32
- func (sv *SparseVectorUint32) Iter(f func(index uint32, value Value))
- func (sv *SparseVectorUint32) IterUpdate(f func(index uint32, value Value) Value)
- func (v *SparseVectorUint32) Mag() Value
- func (sv *SparseVectorUint32) MapIndices(im map[uint32]uint32)
- func (sv *SparseVectorUint32) Mean() Value
- func (sv *SparseVectorUint32) Mult(l Value)
- func (sv1 *SparseVectorUint32) Sub(sv2 Vector) Vector
- func (sv *SparseVectorUint32) SubConst(toSub Value)
- type StringIndex
- func (si StringIndex) Append(idx interface{}) VectorIndex
- func (si StringIndex) GetAtLocation(location int) interface{}
- func (si StringIndex) Len() int
- func (si StringIndex) Less(i, j int) bool
- func (si StringIndex) LessThanOther(i int, sii2 VectorIndex, j int) bool
- func (si StringIndex) New(l int) VectorIndex
- func (si StringIndex) Swap(i, j int)
- type Uint32Index
- func (si Uint32Index) Append(idx interface{}) VectorIndex
- func (si Uint32Index) GetAtLocation(location int) interface{}
- func (si Uint32Index) Len() int
- func (si Uint32Index) Less(i, j int) bool
- func (si Uint32Index) LessThanOther(i int, sii2 VectorIndex, j int) bool
- func (su Uint32Index) New(l int) VectorIndex
- func (si Uint32Index) Swap(i, j int)
- type Value
- type ValueOp
- type Vector
- type VectorIndex
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type GenSparseVector ¶
type GenSparseVector struct {
// contains filtered or unexported fields
}
GenSparseVector is a sparse vector whose rows can be identified by any type that can be ordered. For example, the rows could be usernames, URLs, document names.
Note that using this with a simple uint32 index carries an approxiamate 5x performance penalty relative to using a SparseVector. However it is faster than MapSparseVector for the vector lengths we have benchmarked
func NewGenSparseVector ¶
func NewGenSparseVector(index VectorIndex, values []Value) *GenSparseVector
NewGenSparseVector creates a new GenSparseVector. You should provide parallel arrays of indicies and values. Note that NewGenSparseVector will sort these in place.
func (*GenSparseVector) Add ¶
func (sv1 *GenSparseVector) Add(v2 Vector) Vector
func (*GenSparseVector) AddConst ¶
func (sv *GenSparseVector) AddConst(toAdd Value)
AddConst adds a constant value to each of the present values in the sparse vector
func (*GenSparseVector) Cos ¶
func (sv1 *GenSparseVector) Cos(svi2 Vector) Value
Cos calculates the cosine between this vector and another. Both vectors must be GenSparseVectors
func (*GenSparseVector) Dot ¶
func (sv1 *GenSparseVector) Dot(svi2 Vector) Value
Dot calculates the dot-product of this vector and another. Both vectors should be GenSparseVectors
func (*GenSparseVector) GetIndex ¶
func (sv *GenSparseVector) GetIndex() VectorIndex
GetIndices returns the index values of the vector
func (*GenSparseVector) Iter ¶
func (sv *GenSparseVector) Iter(f func(index interface{}, value Value))
Iter lets you iterate over the members of the sparse vector
func (*GenSparseVector) IterUpdate ¶
func (sv *GenSparseVector) IterUpdate(f func(index interface{}, value Value) Value)
Iter lets you iterate over the members of the sparse vector
func (*GenSparseVector) Mag ¶
func (v *GenSparseVector) Mag() Value
Mag returns the magnitude of this vector
func (*GenSparseVector) Mean ¶
func (sv *GenSparseVector) Mean() Value
Mean() Calculates the mean element value (mean of values that are present)
func (*GenSparseVector) Mult ¶
func (sv *GenSparseVector) Mult(l Value)
Mult multiplies the vector by a constant.
func (*GenSparseVector) Sub ¶
func (sv1 *GenSparseVector) Sub(v2 Vector) Vector
func (*GenSparseVector) SubConst ¶
func (sv *GenSparseVector) SubConst(toSub Value)
SubConst subtracts a constant value to each of the present values in the sparse vector.
type IndexSparseVector ¶
type IndexSparseVector interface { // GetIndex returns the index values of the sparse vector GetIndex() VectorIndex // Iter calls a function for each value present in a sparse vector Iter(f func(index interface{}, value Value)) // IterUpdate calls a function for each value present in a sparse vector. The // value is replaced by the result of the function IterUpdate(f func(index interface{}, value Value) Value) }
IndexSparseVector is an additional interface for sparse vectors that use the VectorIndex interface to describe their indices
type IntIndex ¶
type IntIndex []int
IntIndex is an implementation of VectorIndex for int.
func (IntIndex) Append ¶
func (si IntIndex) Append(idx interface{}) VectorIndex
func (IntIndex) GetAtLocation ¶
func (IntIndex) LessThanOther ¶
func (si IntIndex) LessThanOther(i int, sii2 VectorIndex, j int) bool
func (IntIndex) New ¶
func (si IntIndex) New(l int) VectorIndex
type MapSparseVector ¶
type MapSparseVector struct {
// contains filtered or unexported fields
}
MapSparseVector is a sparse vector implemented using a map
func NewMapSparseVector ¶
func NewMapSparseVector(indices []uint32, values []Value) *MapSparseVector
NewMapSparseVector creates a new MapSparseVector. Pass in parallel arrays of the indices and values of non-zero entries in the vector
func (*MapSparseVector) Add ¶
func (m1 *MapSparseVector) Add(v2 Vector) Vector
func (*MapSparseVector) Cos ¶
func (m1 *MapSparseVector) Cos(v2 Vector) Value
Cos calculates the cosine of this vector and another. Both vectors must be MapSparseVectors
func (*MapSparseVector) Dot ¶
func (m1 *MapSparseVector) Dot(v2 Vector) Value
Dot calculates the dot product of this sparse vector and another. Both vectors must be MapSparseVectors
func (*MapSparseVector) Mag ¶
func (m *MapSparseVector) Mag() Value
Mag returns the magnitude of the vector
func (*MapSparseVector) Mult ¶
func (m *MapSparseVector) Mult(l Value)
func (*MapSparseVector) Sub ¶
func (m1 *MapSparseVector) Sub(v2 Vector) Vector
type SparseVector ¶
type SparseVector interface { Vector // Mean calculates the mean of non-zero values in a vector Mean() Value // AddConst adds a constant value to non-zero values in the vector AddConst(toAdd Value) // SubConst subtracts a constant value from non-zero values in the vector SubConst(toSub Value) }
SparseVector extends the Vector interface to include some functions that only apply to sparse vectors, in particular ones the operate only on present values. The intention of Mean, AddConst and SubConst is to allow you to mean-center your vectors.
type SparseVectorUint32 ¶
type SparseVectorUint32 struct {
// contains filtered or unexported fields
}
SparseVector is a sparse representation of a vector. It stores only non-zero values, and stores them in parallel arrays of indices and values. When the vector is created the arrays are sorted by index.
func NewSparseVectorUint32 ¶
func NewSparseVectorUint32(indices []uint32, values []Value) *SparseVectorUint32
NewSparseVector creates a new sparse vector. Pass in parallel arrays of the indices and values for non-zero entries. NewSparseVector will sort these entries by index to enable faster calculations later. It also calculates the vector magnitude.
func (*SparseVectorUint32) Add ¶
func (sv1 *SparseVectorUint32) Add(sv2 Vector) Vector
func (*SparseVectorUint32) AddConst ¶
func (sv *SparseVectorUint32) AddConst(toAdd Value)
AddConst adds a constant value to each of the present values in the sparse vector
func (*SparseVectorUint32) Cos ¶
func (sv1 *SparseVectorUint32) Cos(sv2 Vector) Value
Cos calculates the cosine of the angle between this sparse vector and another.
func (*SparseVectorUint32) Dot ¶
func (sv1 *SparseVectorUint32) Dot(sv2in Vector) Value
Dot calculates the dot product of this vector and another sparse vector.
func (*SparseVectorUint32) GetIndices ¶
func (sv *SparseVectorUint32) GetIndices() []uint32
GetIndices returns the array of indices of non-zero values in the vector
func (*SparseVectorUint32) Iter ¶
func (sv *SparseVectorUint32) Iter(f func(index uint32, value Value))
Iter lets you iterate over the members of the sparse vector
func (*SparseVectorUint32) IterUpdate ¶
func (sv *SparseVectorUint32) IterUpdate(f func(index uint32, value Value) Value)
Iter lets you iterate over the members of the sparse vector
func (*SparseVectorUint32) Mag ¶
func (v *SparseVectorUint32) Mag() Value
Mag returns the magnitude of the vector. It is calculated lazily and cached. Note (as with the other sparse vector implementations) this means these vectors are not thread safe.
func (*SparseVectorUint32) MapIndices ¶
func (sv *SparseVectorUint32) MapIndices(im map[uint32]uint32)
MapIndices applies a map to the indices of non-zero values in the vector. The idea is to allow you to shuffle non-zero values to the start of a set of vectors, potentially reducing the lengths of vectors in the set
func (*SparseVectorUint32) Mean ¶
func (sv *SparseVectorUint32) Mean() Value
Mean() Calculates the mean element value (mean of values that are present)
func (*SparseVectorUint32) Mult ¶
func (sv *SparseVectorUint32) Mult(l Value)
Mult multiplies the vector by a constant l. The vector is modified in place
func (*SparseVectorUint32) Sub ¶
func (sv1 *SparseVectorUint32) Sub(sv2 Vector) Vector
func (*SparseVectorUint32) SubConst ¶
func (sv *SparseVectorUint32) SubConst(toSub Value)
SubConst subtracts a constant value to each of the present values in the sparse vector.
type StringIndex ¶
type StringIndex []string
StringIndex is an implementation of VectorIndex for string.
Example ¶
// Vector for user's ratings of Star Wars starwars := NewGenSparseVector(StringIndex{ "Brian", "Liz", "Jane", "Jenny", "Mike", "David", }, []Value{ 1.0, 4.5, 3.5, 2.0, 4.0, 5.0, }) // Vector for user's ratings of Battlestar Galactica battlestargalactica := NewGenSparseVector(StringIndex{ "Brian", "Liz", "Jane", "Mike", "David", "Penny", }, []Value{ 3.0, 3.5, 4.5, 3.0, 5.0, }) // Naive similarity measure between these two films cos := starwars.Cos(battlestargalactica) fmt.Printf("Similarity is %f", cos)
Output: Similarity is 0.928748
func (StringIndex) Append ¶
func (si StringIndex) Append(idx interface{}) VectorIndex
func (StringIndex) GetAtLocation ¶
func (si StringIndex) GetAtLocation(location int) interface{}
func (StringIndex) Len ¶
func (si StringIndex) Len() int
func (StringIndex) Less ¶
func (si StringIndex) Less(i, j int) bool
func (StringIndex) LessThanOther ¶
func (si StringIndex) LessThanOther(i int, sii2 VectorIndex, j int) bool
func (StringIndex) New ¶
func (si StringIndex) New(l int) VectorIndex
func (StringIndex) Swap ¶
func (si StringIndex) Swap(i, j int)
type Uint32Index ¶
type Uint32Index []uint32
Uint32Index is an implementation of VectorIndex for uint32.
func (Uint32Index) Append ¶
func (si Uint32Index) Append(idx interface{}) VectorIndex
func (Uint32Index) GetAtLocation ¶
func (si Uint32Index) GetAtLocation(location int) interface{}
func (Uint32Index) Len ¶
func (si Uint32Index) Len() int
func (Uint32Index) Less ¶
func (si Uint32Index) Less(i, j int) bool
func (Uint32Index) LessThanOther ¶
func (si Uint32Index) LessThanOther(i int, sii2 VectorIndex, j int) bool
func (Uint32Index) New ¶
func (su Uint32Index) New(l int) VectorIndex
func (Uint32Index) Swap ¶
func (si Uint32Index) Swap(i, j int)
type Value ¶
type Value float32
Value is the type we use for vector values. We defined a type so we can easily change types, until such times as our Go overlords figure out generics.
type Vector ¶
type Vector interface { // Cos calculates the cosine of the angle between two vectors Cos(v Vector) Value // Mag gives the magnitude of a vector Mag() Value // Dot calculates the dot product of two vectors Dot(v Vector) Value // Add adds a vector to this one Add(v Vector) Vector // Sub subtracts a vector from this one Sub(v Vector) Vector // Multiply by a constant. Vector is modified in place Mult(l Value) }
Vector is a general interface to a vector. Our focus on similarity calculations means that so far we've only defined cosine, magnitude and dot-product methods.
type VectorIndex ¶
type VectorIndex interface { // VectorIndex must implement the standard sort interface sort.Interface // LessThanOther compares values at a location in this index with // a value at a second location in another index. // In most cases the two VectorIndexs must be the same underlying // type. LessThanOther(i int, v2 VectorIndex, j int) bool // GetAtLocation returns the index value currently at location i GetAtLocation(i int) interface{} // New creates a new index of this type, using l as a capacity hint New(l int) VectorIndex // Append adds an index value to this index Append(idx interface{}) VectorIndex }
VectorIndex is an interface for the index of a sparse vector. The idea here is to allow the index and values to be sorted in parallel. The mental model for this interface is that the index is implemented as an array or slice, and that index values at points in the array can be extracted or compared.