gann

package module
v0.0.0-...-7932e50 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 15, 2020 License: MIT Imports: 7 Imported by: 0

README

gann

CircleCI MIT License

portfolio_view

gann (go-approximate-nearest-neighbor) is a library for approximate nearest neighbor search purely written in golang.

The implemented algorithm is truly inspired by Annoy (https://github.com/spotify/annoy).

feature

  1. purely written in Go: no dependencies out of Go world.
  2. easy to tune with a bit of parameters

installation

go get github.com/mathetake/gann

parameters

setup phase parameters
name type description run-time complexity space complexity accuracy
dim int dimension of target vectors the larger, the more expensive the larger, the more expensive N/A
nTree int # of trees the larger, the more expensive the larger, the more expensive the larger, the more accurate
k int maximum # of items in a single leaf the larger, the less expensive N/A the larger, the less accurate
runtime (search phase) parameters
name type description time complexity accuracy
searchNum int # of requested neighbors the larger, the more expensive N/A
bucketScale float64 affects the size of bucket the larger, the more expensive the larger, the more accurate

bucketScale affects the size of bucket which consists of items for exact distance calculation. The actual size of the bucket is calculated by int(searchNum * bucketScale).

In the search phase, we traverse index trees and continuously put items on reached leaves to the bucket until the bucket becomes full. Then we calculate the exact distances between a item in the bucket and the query vector to get approximate nearest neighbors.

Therefore, the larger bucketScale, the more computational complexity while the more accurate result to be produced.

example

package main

import (
	"fmt"
	"math/rand"
	"time"

	"github.com/mathetake/gann"
	"github.com/mathetake/gann/metric"
)

var (
	dim    = 3
	nTrees = 2
	k      = 10
	nItem  = 1000
)

func main() {
	rawItems := make([][]float64, 0, nItem)
	rand.Seed(time.Now().UnixNano())

	for i := 0; i < nItem; i++ {
		item := make([]float64, 0, dim)
		for j := 0; j < dim; j++ {
			item = append(item, rand.Float64())
		}
		rawItems = append(rawItems, item)
	}

	m, err := metric.NewCosineMetric(dim)
	if err != nil {
		// err handling
		return
	}

	// create index
	idx, err := gann.CreateNewIndex(rawItems, dim, nTrees, k, m)
	if err != nil {
		// error handling
		return
	}

	// search
	var searchNum = 5
	var bucketScale float64 = 10
	q := []float64{0.1, 0.02, 0.001}
	res, err := idx.GetANNbyVector(q, searchNum, bucketScale)
	if err != nil {
		// error handling
		return
	}

	fmt.Printf("res: %v\n", res)
}

references

License

MIT

Documentation

Overview

Package gann can be used for approximate nearest neighbor search.

By calling gann.CreateNewIndex function, we can obtain a search index. Its interface is defined in gann.Index:

type Index interface {
	GetANNbyItemID(id int64, searchNum int, bucketScale float64) (ann []int64, err error)
	GetANNbyVector(v []float64, searchNum int, bucketScale float64) (ann []int64, err error)
}

GetANNbyItemID allows us to pass id of specific item for search execution and instead GetANNbyVector allows us to pass a vector.

See README.md for more details.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Index

type Index interface {
	// GetANNbyItemID ... search approximate nearest neighbors by a given itemID
	GetANNbyItemID(id int64, searchNum int, bucketScale float64) (ann []int64, err error)

	// GetANNbyVector ... search approximate nearest neighbors by a given query vector
	GetANNbyVector(v []float64, searchNum int, bucketScale float64) (ann []int64, err error)
}

Index is the interface of gann's search index. GetANNbyItemID and GetANNbyVector are different in the form of query. GetANNbyItemID can be executed by passing a certain item's id contained in the list of items used in the index building phase. GetANNbyVector allows us to pass any vector of proper dimension.

searchNum is the number of requested approximated nearest neighbors, and bucketScale can be tuned to make balance between the search result's accuracy and computational complexity in the search phase.

see README.md for more details.

func CreateNewIndex

func CreateNewIndex(rawItems [][]float64, dim, nTree, k int, m metric.Metric) (Index, error)

CreateNewIndex build a new search index for given vectors. rawItems should consist of search target vectors and its slice index corresponds to the first argument id of GetANNbyItemID. For example, if we want to search approximate nearest neighbors of rawItems[3], it can simply achieved by calling index.GetANNbyItemID(3, ...).

dim is the dimension of target spaces. nTree and k are tunable parameters which affects performances of the index (see README.md for details.)

The last argument m is type of metric.Metric and represents the metric of the target search space. See https://godoc.org/github.com/mathetake/gann/metric for details.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL