annoy

package module
v0.0.0-...-76116ff Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 25, 2021 License: MIT Imports: 8 Imported by: 0

README

annoy gann

CircleCI MIT License

portfolio_view

gann (go-approximate-nearest-neighbor) is a library for approximate nearest neighbor search purely written in golang.

The implemented algorithm is truly inspired by Annoy (https://github.com/spotify/annoy).

feature

  1. purely written in Go: no dependencies out of Go world.
  2. easy to tune with a bit of parameters

installation

go get github.com/mathetake/gann

parameters

setup phase parameters
name type description run-time complexity space complexity accuracy
dim int dimension of target vectors the larger, the more expensive the larger, the more expensive N/A
nTree int # of trees the larger, the more expensive the larger, the more expensive the larger, the more accurate
k int maximum # of items in a single leaf the larger, the less expensive N/A the larger, the less accurate
runtime (search phase) parameters
name type description time complexity accuracy
searchNum int # of requested neighbors the larger, the more expensive N/A
bucketScale float64 affects the size of bucket the larger, the more expensive the larger, the more accurate

bucketScale affects the size of bucket which consists of items for exact distance calculation. The actual size of the bucket is calculated by int(searchNum * bucketScale).

In the search phase, we traverse index trees and continuously put items on reached leaves to the bucket until the bucket becomes full. Then we calculate the exact distances between a item in the bucket and the query vector to get approximate nearest neighbors.

Therefore, the larger bucketScale, the more computational complexity while the more accurate result to be produced.

example

package main

import (
	"fmt"
	"math/rand"
	"time"

	"github.com/mathetake/gann"
	"github.com/mathetake/gann/metric"
)

var (
	dim    = 3
	nTrees = 2
	k      = 10
	nItem  = 1000
)

func main() {
	rawItems := make([][]float64, 0, nItem)
	rand.Seed(time.Now().UnixNano())

	for i := 0; i < nItem; i++ {
		item := make([]float64, 0, dim)
		for j := 0; j < dim; j++ {
			item = append(item, rand.Float64())
		}
		rawItems = append(rawItems, item)
	}

	m, err := metric.NewCosineMetric(dim)
	if err != nil {
		// err handling
		return
	}

	// create index
	idx, err := gann.CreateNewIndex(rawItems, dim, nTrees, k, m)
	if err != nil {
		// error handling
		return
	}

	// search
	var searchNum = 5
	var bucketScale float64 = 10
	q := []float64{0.1, 0.02, 0.001}
	res, err := idx.GetANNbyVector(q, searchNum, bucketScale)
	if err != nil {
		// error handling
		return
	}

	fmt.Printf("res: %v\n", res)
}

references

License

MIT

Documentation

Overview

Package gann can be used for approximate nearest neighbor search.

By calling gann.CreateNewIndex function, we can obtain a search index. Its interface is defined in gann.Index:

type Index interface {
	GetANNbyItemID(id int64, searchNum int, bucketScale float64) (ann []int64, err error)
	GetANNbyVector(v []float64, searchNum int, bucketScale float64) (ann []int64, err error)
}

GetANNbyItemID allows us to pass id of specific item for search execution and instead GetANNbyVector allows us to pass a vector.

See README.md for more details.

Index

Constants

View Source
const (
	AxisX = iota
	AxisY
	AxisZ
	AxisEnd
)
View Source
const (
	RadFactor = 0.017453292519943295
)

Variables

This section is empty.

Functions

This section is empty.

Types

type Index

type Index interface {

	// GetANNbyVector ... search approximate nearest neighbors by a given query vector
	GetANNbyVector(v PointAble, searchNum int, bucketScale float64) (ann []int, err error)
}

func CreateNewIndex

func CreateNewIndex(rawItems PointArray, dim, nTree, k int, m Metric) (Index, error)

CreateNewIndex build a new search index for given vectors. rawItems should consist of search target vectors and its slice index corresponds to the first argument id of GetANNbyItemID. For example, if we want to search approximate nearest neighbors of rawItems[3], it can simply achieved by calling index.GetANNbyItemID(3, ...).

dim is the dimension of target spaces. nTree and k are tunable parameters which affects performances of the index (see README.md for details.)

The last argument m is type of metric.Metric and represents the metric of the target search space. See https://godoc.org/github.com/mathetake/gann/metric for details.

type Metric

type Metric interface {
	// CalcDistance ... calculates the distance between given vectors
	CalcDistance(v1, v2 []float64) float64
	// GetSplittingVector ... calculates the splitting vector which becomes a node's vector in the index
	GetSplittingVector(vs [][]float64) []float64
	// CalcDirectionPriority ... calculates the priority of the children nodes which can be used for determining
	// which way (right or left child) should go next traversal. The return values must be contained in [-1, 1].
	CalcDirectionPriority(base, target []float64) float64
}

Metric is the interface of metrics which defines target search spaces.

func NewCosineMetric

func NewCosineMetric(dim int) (Metric, error)

NewCosineMetric returns cosineDistance. NOTE: We assume that the given vectors are already normalized, i.e. the norm equals 1

type Point

type Point []float64

n dimension

type Point3

type Point3 [AxisEnd]float64

func PointR3OfLocation

func PointR3OfLocation(latitude, longitude float64) (pt Point3)

func R3OfLocation

func R3OfLocation(latitude, longitude float64) (pt Point3)

type PointAble

type PointAble interface {
	Dimension() int
	Point() Point
}

type PointArray

type PointArray interface {
	Len() int
	At(int) PointAble
}

type PointN

type PointN []float64

func (PointN) Dimension

func (p PointN) Dimension() int

func (PointN) Point

func (p PointN) Point() Point

type PointNArray

type PointNArray []PointN

func (PointNArray) At

func (a PointNArray) At(i int) PointAble

func (PointNArray) Len

func (a PointNArray) Len() int

type PointR3

type PointR3 struct {
	// contains filtered or unexported fields
}

func NewPointR3

func NewPointR3(latitude, longitude float64) (r PointR3)

func (PointR3) Dimension

func (p PointR3) Dimension() int

func (PointR3) Point

func (p PointR3) Point() Point

type PointR3Array

type PointR3Array []PointR3

func (PointR3Array) At

func (a PointR3Array) At(i int) PointAble

func (PointR3Array) Len

func (a PointR3Array) Len() int

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL