sparsevector

package module
v0.0.0-...-1eabb21 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 31, 2015 License: MIT Imports: 2 Imported by: 0

README

SparseVector

Efficient sparse vector implementations in Go.

Build Status GoDoc

What's included?

SparseVectorUint32 Sparse Vector with uint32 indices and Value values implemented by parallel ordered lists of indices and values
GenSparseVector Sparse Vector with generic indices and a parallel ordered list of values. The index must implement the VectorIndex interface, and hence be sortable.
MapSparseVector A Sparse Vector with uint32 indices and Value values implemented using a map
Uint32Index a GenSparseVector index for uint32
IntIndex A GenSparseVector index for int
StringIndex A GenSparseVector index for strings

Performance

Benchmarks are included for uint32 versions of all the Sparse Vector implementations. In these tests SparseVectorUint32 is by far the fastest, GenSparseVector takes about 5 times as long, and MapSparseVector takes about 1.6 times more again.

What can you do with it?

I've focused on what I need for similarity calculations, so the vectors do cosine and dot-product. I've also included adding and subtracting vectors and constant values, and multiplying by constant values. You can discover the mean of the present values, and also iterate and perform operations on the elements present in the vectors.

License

MIT license in LICENSE.txt

Contributing

Drop a pull request if you'd like to contribute.

Documentation

Overview

sparsevector contains a number of implementations of SparseVectors, with a focus on dot products and cosines.

The fastest implementation is SparseVectorUint32. This works by having parallel arrays for the vector element indices and values. The arrays are sorted in parallel when the vector is created to allow for a linear scan through the vectors when calculating Cos() and Dot(). The indices and values are separate to optimise scanning through the indices.

Next in performance is GenSparseVector. This is similar to SparseVectorUint32 but the index can be any type for which you can implement VectorIndex. It works similarly to SparseVectorUint32 but is slowed down by accessing index values via the VectorIndex interface.

The slowest implementation is MapSparseVector. This is implemented using a map[uint32]Value. I view this as the baseline implementation. Since performance depends very much on your data and what you're doing with it I'd advise testing with each of the implementations to find which is fastest for you.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type GenSparseVector

type GenSparseVector struct {
	// contains filtered or unexported fields
}

GenSparseVector is a sparse vector whose rows can be identified by any type that can be ordered. For example, the rows could be usernames, URLs, document names.

Note that using this with a simple uint32 index carries an approxiamate 5x performance penalty relative to using a SparseVector. However it is faster than MapSparseVector for the vector lengths we have benchmarked

func NewGenSparseVector

func NewGenSparseVector(index VectorIndex, values []Value) *GenSparseVector

NewGenSparseVector creates a new GenSparseVector. You should provide parallel arrays of indicies and values. Note that NewGenSparseVector will sort these in place.

func (*GenSparseVector) Add

func (sv1 *GenSparseVector) Add(v2 Vector) Vector

func (*GenSparseVector) AddConst

func (sv *GenSparseVector) AddConst(toAdd Value)

AddConst adds a constant value to each of the present values in the sparse vector

func (*GenSparseVector) Cos

func (sv1 *GenSparseVector) Cos(svi2 Vector) Value

Cos calculates the cosine between this vector and another. Both vectors must be GenSparseVectors

func (*GenSparseVector) Dot

func (sv1 *GenSparseVector) Dot(svi2 Vector) Value

Dot calculates the dot-product of this vector and another. Both vectors should be GenSparseVectors

func (*GenSparseVector) GetIndex

func (sv *GenSparseVector) GetIndex() VectorIndex

GetIndices returns the index values of the vector

func (*GenSparseVector) Iter

func (sv *GenSparseVector) Iter(f func(index interface{}, value Value))

Iter lets you iterate over the members of the sparse vector

func (*GenSparseVector) IterUpdate

func (sv *GenSparseVector) IterUpdate(f func(index interface{}, value Value) Value)

Iter lets you iterate over the members of the sparse vector

func (*GenSparseVector) Mag

func (v *GenSparseVector) Mag() Value

Mag returns the magnitude of this vector

func (*GenSparseVector) Mean

func (sv *GenSparseVector) Mean() Value

Mean() Calculates the mean element value (mean of values that are present)

func (*GenSparseVector) Mult

func (sv *GenSparseVector) Mult(l Value)

Mult multiplies the vector by a constant.

func (*GenSparseVector) Sub

func (sv1 *GenSparseVector) Sub(v2 Vector) Vector

func (*GenSparseVector) SubConst

func (sv *GenSparseVector) SubConst(toSub Value)

SubConst subtracts a constant value to each of the present values in the sparse vector.

type IndexSparseVector

type IndexSparseVector interface {
	// GetIndex returns the index values of the sparse vector
	GetIndex() VectorIndex

	// Iter calls a function for each value present in a sparse vector
	Iter(f func(index interface{}, value Value))

	// IterUpdate calls a function for each value present in a sparse vector. The
	// value is replaced by the result of the function
	IterUpdate(f func(index interface{}, value Value) Value)
}

IndexSparseVector is an additional interface for sparse vectors that use the VectorIndex interface to describe their indices

type IntIndex

type IntIndex []int

IntIndex is an implementation of VectorIndex for int.

func (IntIndex) Append

func (si IntIndex) Append(idx interface{}) VectorIndex

func (IntIndex) GetAtLocation

func (si IntIndex) GetAtLocation(location int) interface{}

func (IntIndex) Len

func (si IntIndex) Len() int

func (IntIndex) Less

func (si IntIndex) Less(i, j int) bool

func (IntIndex) LessThanOther

func (si IntIndex) LessThanOther(i int, sii2 VectorIndex, j int) bool

func (IntIndex) New

func (si IntIndex) New(l int) VectorIndex

func (IntIndex) Swap

func (si IntIndex) Swap(i, j int)

type MapSparseVector

type MapSparseVector struct {
	// contains filtered or unexported fields
}

MapSparseVector is a sparse vector implemented using a map

func NewMapSparseVector

func NewMapSparseVector(indices []uint32, values []Value) *MapSparseVector

NewMapSparseVector creates a new MapSparseVector. Pass in parallel arrays of the indices and values of non-zero entries in the vector

func (*MapSparseVector) Add

func (m1 *MapSparseVector) Add(v2 Vector) Vector

func (*MapSparseVector) Cos

func (m1 *MapSparseVector) Cos(v2 Vector) Value

Cos calculates the cosine of this vector and another. Both vectors must be MapSparseVectors

func (*MapSparseVector) Dot

func (m1 *MapSparseVector) Dot(v2 Vector) Value

Dot calculates the dot product of this sparse vector and another. Both vectors must be MapSparseVectors

func (*MapSparseVector) Mag

func (m *MapSparseVector) Mag() Value

Mag returns the magnitude of the vector

func (*MapSparseVector) Mult

func (m *MapSparseVector) Mult(l Value)

func (*MapSparseVector) Sub

func (m1 *MapSparseVector) Sub(v2 Vector) Vector

type SparseVector

type SparseVector interface {
	Vector

	// Mean calculates the mean of non-zero values in a vector
	Mean() Value

	// AddConst adds a constant value to non-zero values in the vector
	AddConst(toAdd Value)

	// SubConst subtracts a constant value from non-zero values in the vector
	SubConst(toSub Value)
}

SparseVector extends the Vector interface to include some functions that only apply to sparse vectors, in particular ones the operate only on present values. The intention of Mean, AddConst and SubConst is to allow you to mean-center your vectors.

type SparseVectorUint32

type SparseVectorUint32 struct {
	// contains filtered or unexported fields
}

SparseVector is a sparse representation of a vector. It stores only non-zero values, and stores them in parallel arrays of indices and values. When the vector is created the arrays are sorted by index.

func NewSparseVectorUint32

func NewSparseVectorUint32(indices []uint32, values []Value) *SparseVectorUint32

NewSparseVector creates a new sparse vector. Pass in parallel arrays of the indices and values for non-zero entries. NewSparseVector will sort these entries by index to enable faster calculations later. It also calculates the vector magnitude.

func (*SparseVectorUint32) Add

func (sv1 *SparseVectorUint32) Add(sv2 Vector) Vector

func (*SparseVectorUint32) AddConst

func (sv *SparseVectorUint32) AddConst(toAdd Value)

AddConst adds a constant value to each of the present values in the sparse vector

func (*SparseVectorUint32) Cos

func (sv1 *SparseVectorUint32) Cos(sv2 Vector) Value

Cos calculates the cosine of the angle between this sparse vector and another.

func (*SparseVectorUint32) Dot

func (sv1 *SparseVectorUint32) Dot(sv2in Vector) Value

Dot calculates the dot product of this vector and another sparse vector.

func (*SparseVectorUint32) GetIndices

func (sv *SparseVectorUint32) GetIndices() []uint32

GetIndices returns the array of indices of non-zero values in the vector

func (*SparseVectorUint32) Iter

func (sv *SparseVectorUint32) Iter(f func(index uint32, value Value))

Iter lets you iterate over the members of the sparse vector

func (*SparseVectorUint32) IterUpdate

func (sv *SparseVectorUint32) IterUpdate(f func(index uint32, value Value) Value)

Iter lets you iterate over the members of the sparse vector

func (*SparseVectorUint32) Mag

func (v *SparseVectorUint32) Mag() Value

Mag returns the magnitude of the vector. It is calculated lazily and cached. Note (as with the other sparse vector implementations) this means these vectors are not thread safe.

func (*SparseVectorUint32) MapIndices

func (sv *SparseVectorUint32) MapIndices(im map[uint32]uint32)

MapIndices applies a map to the indices of non-zero values in the vector. The idea is to allow you to shuffle non-zero values to the start of a set of vectors, potentially reducing the lengths of vectors in the set

func (*SparseVectorUint32) Mean

func (sv *SparseVectorUint32) Mean() Value

Mean() Calculates the mean element value (mean of values that are present)

func (*SparseVectorUint32) Mult

func (sv *SparseVectorUint32) Mult(l Value)

Mult multiplies the vector by a constant l. The vector is modified in place

func (*SparseVectorUint32) Sub

func (sv1 *SparseVectorUint32) Sub(sv2 Vector) Vector

func (*SparseVectorUint32) SubConst

func (sv *SparseVectorUint32) SubConst(toSub Value)

SubConst subtracts a constant value to each of the present values in the sparse vector.

type StringIndex

type StringIndex []string

StringIndex is an implementation of VectorIndex for string.

Example
// Vector for user's ratings of Star Wars
starwars := NewGenSparseVector(StringIndex{
	"Brian",
	"Liz",
	"Jane",
	"Jenny",
	"Mike",
	"David",
}, []Value{
	1.0,
	4.5,
	3.5,
	2.0,
	4.0,
	5.0,
})

// Vector for user's ratings of Battlestar Galactica
battlestargalactica := NewGenSparseVector(StringIndex{
	"Brian",
	"Liz",
	"Jane",
	"Mike",
	"David",
	"Penny",
}, []Value{
	3.0,
	3.5,
	4.5,
	3.0,
	5.0,
})

// Naive similarity measure between these two films
cos := starwars.Cos(battlestargalactica)
fmt.Printf("Similarity is %f", cos)
Output:

Similarity is 0.928748

func (StringIndex) Append

func (si StringIndex) Append(idx interface{}) VectorIndex

func (StringIndex) GetAtLocation

func (si StringIndex) GetAtLocation(location int) interface{}

func (StringIndex) Len

func (si StringIndex) Len() int

func (StringIndex) Less

func (si StringIndex) Less(i, j int) bool

func (StringIndex) LessThanOther

func (si StringIndex) LessThanOther(i int, sii2 VectorIndex, j int) bool

func (StringIndex) New

func (si StringIndex) New(l int) VectorIndex

func (StringIndex) Swap

func (si StringIndex) Swap(i, j int)

type Uint32Index

type Uint32Index []uint32

Uint32Index is an implementation of VectorIndex for uint32.

func (Uint32Index) Append

func (si Uint32Index) Append(idx interface{}) VectorIndex

func (Uint32Index) GetAtLocation

func (si Uint32Index) GetAtLocation(location int) interface{}

func (Uint32Index) Len

func (si Uint32Index) Len() int

func (Uint32Index) Less

func (si Uint32Index) Less(i, j int) bool

func (Uint32Index) LessThanOther

func (si Uint32Index) LessThanOther(i int, sii2 VectorIndex, j int) bool

func (Uint32Index) New

func (su Uint32Index) New(l int) VectorIndex

func (Uint32Index) Swap

func (si Uint32Index) Swap(i, j int)

type Value

type Value float32

Value is the type we use for vector values. We defined a type so we can easily change types, until such times as our Go overlords figure out generics.

func AddOp

func AddOp(v1, v2 Value) Value

func SubOp

func SubOp(v1, v2 Value) Value

type ValueOp

type ValueOp func(v1, v2 Value) Value

An operation on a pair of values.

type Vector

type Vector interface {
	// Cos calculates the cosine of the angle between two vectors
	Cos(v Vector) Value

	// Mag gives the magnitude of a vector
	Mag() Value

	// Dot calculates the dot product of two vectors
	Dot(v Vector) Value

	// Add adds a vector to this one
	Add(v Vector) Vector

	// Sub subtracts a vector from this one
	Sub(v Vector) Vector

	// Multiply by a constant.  Vector is modified in place
	Mult(l Value)
}

Vector is a general interface to a vector. Our focus on similarity calculations means that so far we've only defined cosine, magnitude and dot-product methods.

type VectorIndex

type VectorIndex interface {
	// VectorIndex must implement the standard sort interface
	sort.Interface

	// LessThanOther compares values at a location in this index with
	// a value at a second location in another index.
	// In most cases the two VectorIndexs must be the same underlying
	// type.
	LessThanOther(i int, v2 VectorIndex, j int) bool

	// GetAtLocation returns the index value currently at location i
	GetAtLocation(i int) interface{}

	// New creates a new index of this type, using l as a capacity hint
	New(l int) VectorIndex

	// Append adds an index value to this index
	Append(idx interface{}) VectorIndex
}

VectorIndex is an interface for the index of a sparse vector. The idea here is to allow the index and values to be sorted in parallel. The mental model for this interface is that the index is implemented as an array or slice, and that index values at points in the array can be extracted or compared.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL