covertree

package module
v1.12.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 23, 2020 License: MIT Imports: 8 Imported by: 0

README

go-covertree

GoDoc Go Report Card Build Status

go-covertree is a cover tree implementation in Go for nearest-neighbour search and clustering. It uses an extensible backing store interface (suitable to adapting to key-value stores, RDBMSes, etc) to support very large data sets.

For further horizontal scaling, a partitioned store implementation supports sharding across multiple underlying stores using a partitioning function.

See the API documentation for more details.

This software is made available under an MIT license.

Thread safety

Tree instances are thread-safe for readonly access.

Insertions into the tree (using Insert) are purely append-only operations, and safe to make concurrently, allowing tree construction to be parallelised.

Searching the tree (using FindNearest) is purely a read-only operation and safe to do concurrently, including with insertions.

Removals from the tree (using Remove) are not thread-safe and should be externally synchronised if concurrent read-write access is required.

Store implementations should observe their own thread-safety considerations.

Example usage

Define a type to be stored in the tree:

type Point struct {
    X float64
    Y float64
}

Define a DistanceFunc to compute the distance between two instances of the type:

func distanceBetween(a, b interface{}) float64 {
    p1 := a.(*Point)
    p2 := b.(*Point)
	
    distX := p1.X - p2.X
    distY := p1.Y - p2.Y
	
    return math.Sqrt(distX * distX + distY * distY)
}

Create a Tree. A tree using a provided in-memory store can be conveniently created using NewInMemoryTree:

tree := covertree.NewInMemoryTree(basis, rootDistance, distanceBetween)

The basis specifies the logarithmic base for determining the ratio of coverage of nodes at adjacent levels of the tree. If unsure, values around 2.0 may be good starting points.

The rootDistance specifies the maximum expected distance between nodes (actually the minimum distance between root nodes) and determines when new root nodes are created. This should generally be set to the largest distance between nodes expected for your data set.

Custom Store implementations can also use the basic Tree constructor to create trees:

// Creates a tree that is backed by a specific store
tree, err := covertree.NewTreeWithStore(pointStore, basis, rootDistance, distanceBetween)       

Insert some things into the tree:

err := tree.Insert(&Point{1.5, 3.14})

Find the 5 nearest things in the store that are within 10.0 of a query point:

results, err := tree.FindNearest(&Point{0.0, 0.0}, 5, 10.0)

Remove things from the store:

removed, err := tree.Remove(&Point{1.5, 3.14})

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewInMemoryStore added in v1.10.0

func NewInMemoryStore(distanceFunc DistanceFunc) *inMemoryStore

func NewPartitionedStore added in v1.10.0

func NewPartitionedStore(partitioningFunc PartitioningFunc, stores ...Store) *partitionedStore

NewPartitionedStore returns a store which distributes store operations across the underlying stores using the specified partitioning function.

Operations for a given partition key are always assigned to the same store.

Types

type CompositeTree added in v1.12.0

type CompositeTree struct {
	// contains filtered or unexported fields
}

CompositeTree spreads operations across multiple subtrees for scaling and parallelisation.

func NewCompositeTree added in v1.12.0

func NewCompositeTree(trees ...*Tree) *CompositeTree

func (*CompositeTree) FindNearest added in v1.12.0

func (ct *CompositeTree) FindNearest(query interface{}, maxResults int, maxDistance float64) (results []ItemWithDistance, err error)

FindNearest returns the nearest items in all the subtrees to the specified query item, up to the specified maximum number of results and maximum distance.

Subtrees are queried in parallel.

Results are returned with their distances from the query item, in order from closest to furthest.

If no items are found matching the given criteria, an empty result set is returned.

Multiple calls to FindNearest and Insert are safe to make concurrently.

func (*CompositeTree) Insert added in v1.12.0

func (ct *CompositeTree) Insert(item interface{}) (err error)

Insert inserts the specified item into one of the subtrees.

Multiple calls to FindNearest and Insert are safe to make concurrently.

type DistanceFunc

type DistanceFunc func(a, b interface{}) (distance float64)

DistanceFunc represents a function which defines the distance between two items.

type ItemWithDistance

type ItemWithDistance struct {
	Item     interface{}
	Distance float64
}

ItemWithDistance represents an item and its distance from some other predetermined item, as defined by a DistanceFunc.

type LevelsWithItems

type LevelsWithItems struct {
	// contains filtered or unexported fields
}

LevelsWithItems represents a set of child items of a parent item, separated into their levels.

func (*LevelsWithItems) Add

func (lwi *LevelsWithItems) Add(level int, item interface{})

Add adds an item to the specified level.

func (*LevelsWithItems) Set

func (lwi *LevelsWithItems) Set(level int, items []interface{})

Set specifies the items for an entire level.

type PartitioningFunc added in v1.10.0

type PartitioningFunc func(parentItem interface{}) (partitionKey string)

type Store

type Store interface {

	// AddItem saves an item to the store as a child of the specified parent
	// item, at the given level. The parent may be nil, indicating that an item
	// is being added at the root of the tree.
	//
	// Implementations are free to assume that this will only be called for new,
	// never-before-seen items.
	AddItem(item, parent interface{}, level int) error

	// LoadChildren returns the explicit child items of the specified parents
	// item along with their levels. If a parent is nil, this is expected to
	// return the root items (which are children of the nil parent). The
	// children are expected to be returned in the same order as their parents.
	LoadChildren(parents ...interface{}) (children []LevelsWithItems, err error)

	// RemoveItem disassociates an item in the store from the specified parent
	// at the given level. If no such item exists, this operation should have no
	// effect.
	//
	// Implementations are free to completely delete the item itself along with
	// any relationships to child items, but should bear in mind that children
	// should continue to exist as orphans and will be re-parented to other
	// items (via calls to UpdateItem).
	RemoveItem(item, parent interface{}, level int) error

	// UpdateItem updates the parent and level of a given item. It is valid for
	// child items to be re-parented; the item, which must already exist in the
	// store, is associated with the new parent and level.
	//
	// Implementations are free to assume that this will only be called for
	// items which have previously been added via AddItem.
	UpdateItem(item, parent interface{}, level int) error
}

Store implementations allow entire Trees to be made accessible in an extensible way. Implementations may provide capabilities such as persistence and serialisation to various formats and data stores.

Each Store instance is responsible for storing and retrieval of the data for a single Tree. This implies that any keys or identifiers required for the loading and saving of a tree should be known to the instance of the store.

Store implementations will typically need to know about the type of items being stored in the tree.

Stores should implement distance-identity semantics; two items whose distance is exactly zero should be considered the same item.

type Tracer added in v1.6.0

type Tracer struct {
	TotalCoveredSetSize   int
	MaxCoverSetSize       int
	MaxLevelsTraversed    int
	LoadChildrenCount     int
	TotalLoadChildrenTime time.Duration
	TotalTime             time.Duration
	// contains filtered or unexported fields
}

Tracer represents a record for performance metrics of Tree operations.

Tracers for a given tree can be created using the tree’s NewTracer method.

Tracers are not thread safe and should not be shared by multiple Goroutines.

func (*Tracer) FindNearest added in v1.6.0

func (t *Tracer) FindNearest(query interface{}, maxResults int, maxDistance float64) (results []ItemWithDistance, err error)

FindNearest returns the nearest items in the tree to the specified query item, up to the specified maximum number of results and maximum distance.

Results are returned with their distances from the query item, in order from closest to furthest.

If no items are found matching the given criteria, an empty result set is returned.

func (*Tracer) Insert added in v1.6.0

func (t *Tracer) Insert(item interface{}) (err error)

Insert inserts the specified item into the tree.

func (*Tracer) Remove added in v1.6.0

func (t *Tracer) Remove(item interface{}) (removed interface{}, err error)

Remove removes the given item from the tree. If no such item exists in the tree, this has no effect.

removed will be the item that was successfully removed, or nil if no matching item was found.

func (*Tracer) String added in v1.7.1

func (t *Tracer) String() string

type Tree

type Tree struct {
	// contains filtered or unexported fields
}

Tree represents a single cover tree.

Trees should generally not be created except via NewTreeFromStore, and then only by a Store.

func NewInMemoryTree

func NewInMemoryTree(basis float64, rootDistance float64, distanceFunc DistanceFunc) *Tree

NewInMemoryTree creates a new, empty tree which is backed by an in-memory store. The tree will use the specified function for determining the distance between items.

Note that for the sake of efficiency, and due to how an in-memory tree will tend to be used, the in-memory implementation uses pointer equality instead of distance-identity. In particular, this means that removal requires the exact item to be specified in order to be removed.

func NewTreeWithStore

func NewTreeWithStore(store Store, basis float64, rootDistance float64, distanceFunc DistanceFunc) (*Tree, error)

NewTreeWithStore creates and initialises a Tree using the specified store.

basis is the logarithmic base for determining the coverage of nodes at each level of the tree.

rootDistance is the minimum expected distance between root nodes. New nodes that exceed this distance will be created as additional roots.

distanceFunc is the function used by the tree to determine the distance between two items.

func (*Tree) FindNearest

func (t *Tree) FindNearest(query interface{}, maxResults int, maxDistance float64) (results []ItemWithDistance, err error)

FindNearest returns the nearest items in the tree to the specified query item, up to the specified maximum number of results and maximum distance.

Results are returned with their distances from the query item, in order from closest to furthest.

If no items are found matching the given criteria, an empty result set is returned.

Multiple calls to FindNearest and Insert are safe to make concurrently.

func (*Tree) Insert

func (t *Tree) Insert(item interface{}) (err error)

Insert inserts the specified item into the tree.

Multiple calls to FindNearest and Insert are safe to make concurrently.

func (*Tree) NewTracer added in v1.6.0

func (t *Tree) NewTracer() *Tracer

NewTracer returns a new Tracer for recording performance metrics for operations on this tree.

Tracers are not thread safe and should not be shared by multiple Goroutines.

func (*Tree) Remove

func (t *Tree) Remove(item interface{}) (removed interface{}, err error)

Remove removes the given item from the tree. If no such item exists in the tree, this has no effect.

removed will be the item that was successfully removed, or nil if no matching item was found.

This method is not safe for concurrent use. Calls to Remove should be externally synchronised so they do not execute concurrently with each other or with calls to FindNearest or Insert.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL