go2vec

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2024 License: BSD-3-Clause Imports: 9 Imported by: 0

README

Introduction

GoDoc Report card Build Status

This is a package for reading word2vec vectors in Go and finding similar words and analogies.

Installation

This package can be installed with the go command:

go get gopkg.in/danieldk/go2vec.v1

To install the command-line utilities, use:

go get gopkg.in/danieldk/go2vec.v1/cmd/...

The package documentation is available at: https://godoc.org/gopkg.in/danieldk/go2vec.v1

Documentation

Overview

Package go2vec loads word2vec embeddings.

This package can load binary word2vec files. It also supports distance and analogy queries on the embeddings.

go2vec uses gonum's C BLAS binding by default. Binding to the right BLAS library can give nice performance improvements. The binding can be configured using CGO flags. For instance, to link against OpenBLAS on Linux:

CGO_LDFLAGS="-L/path/to/OpenBLAS -lopenblas" go install github.com/gonum/blas/cgo

or Accelerate on OS X:

CGO_LDFLAGS="-framework Accelerate" go install github.com/gonum/blas/cgo

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CosineSimilarity added in v1.1.0

func CosineSimilarity(vec1, vec2 []float32) float64

Types

type Embeddings

type Embeddings struct {
	// contains filtered or unexported fields
}

Embeddings is used to store a set of word embeddings, such that common operations can be performed on these embeddings (such as retrieving similar words).

func NewEmbeddings

func NewEmbeddings(embedSize int) *Embeddings

NewEmbeddings creates a set of word embeddings from scratch. This constructor should be used in conjunction with 'Put' to populate the embeddings.

func ReadWord2VecBinary

func ReadWord2VecBinary(r *bufio.Reader, normalize bool) (*Embeddings, error)

ReadWord2VecBinary reads word embeddings from a binary file that is produced by word2vec. The embeddings can be normalized using their L2 norms.

func (*Embeddings) Analogy

func (e *Embeddings) Analogy(word1, word2, word3 string, limit int) ([]WordSimilarity, error)

Analogy performs word analogy queries.

Consider an analogy of the form 'word1' is to 'word2' as 'word3' is to 'word4'. This method returns candidates for 'word4' based on 'word1..3'.

If 'e1' is the embedding of 'word1', etc., then the embedding 'e4 = (e2 - e1) + e3' is computed. Then the words with embeddings that are the most similar to e4 are returned.

The query words are never returned as a result.

func (*Embeddings) Embedding

func (e *Embeddings) Embedding(word string) ([]float32, bool)

Embedding returns the embedding for a particular word. If the word is unknown, the second return value will be false.

func (*Embeddings) EmbeddingSize

func (e *Embeddings) EmbeddingSize() int

EmbeddingSize returns the embedding size.

func (*Embeddings) Iterate

func (e *Embeddings) Iterate(f IterFunc)

Iterate applies the provided iteration function to all word embeddings.

func (*Embeddings) Matrix

func (e *Embeddings) Matrix() []float32

func (*Embeddings) Put

func (e *Embeddings) Put(word string, embedding []float32) error

Put adds a word embedding to the word embeddings. The new word can be queried after the call returns.

func (*Embeddings) SetBLAS

func (e *Embeddings) SetBLAS(impl blas.Float32Level2)

SetBLAS sets the BLAS implementation to use (default: C BLAS).

func (*Embeddings) Similarity

func (e *Embeddings) Similarity(word string, limit int) ([]WordSimilarity, error)

Similarity finds words that have embeddings that are similar to that of the given word. The 'limit' argument specifis how many words should be returned. The returned slice is ordered by similarity.

The query word is never returned as a result.

func (*Embeddings) Size

func (e *Embeddings) Size() int

Size returns the number of words in the embeddings.

func (*Embeddings) WordIdx

func (e *Embeddings) WordIdx(word string) (int, bool)

WordIdx returns the index of the word within an embedding.

func (*Embeddings) Write

func (e *Embeddings) Write(w *bufio.Writer) error

Write embeddings to a binary file accepted by word2vec

type IterFunc

type IterFunc func(word string, embedding []float32) bool

IterFunc is a function for iterating over word embeddings. The function should return 'false' if the iteration should be stopped.

type WordSimilarity

type WordSimilarity struct {
	Word       string
	Similarity float32
}

WordSimilarity stores the similarity of a word compared to a query word.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL