word2vec

package
v0.0.0-...-22e7a19 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 13, 2017 License: Apache-2.0 Imports: 18 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewHuffmanTree

func NewHuffmanTree(c *corpus.Corpus, dimension int, dt tensor.Dtype, eng tensor.Engine) (map[int]*Node, error)

NewHuffmanTree creates the map of wordID with Node.

Types

type CBOW

type CBOW struct {
	*State
	// contains filtered or unexported fields
}

CBOW is a piece of Word2Vec model.

func NewCBOW

func NewCBOW(s *State) *CBOW

NewCBOW creates *CBOW

func (*CBOW) Train

func (c *CBOW) Train(f io.ReadCloser) error

Train call Trainer with CBOW trainOne.

type Embedding

type Embedding struct {
	// contains filtered or unexported fields
}

Embedding represents a word embedding. It holds a Tensor, and preslices it for additional performance gains.

type HierarchicalSoftmax

type HierarchicalSoftmax struct {
	MaxDepth int
	// contains filtered or unexported fields
}

HierarchicalSoftmax is a piece of Word2Vec optimizer.

func NewHierarchicalSoftmax

func NewHierarchicalSoftmax(maxDepth int) *HierarchicalSoftmax

NewHierarchicalSoftmax creates *HierarchicalSoftmax. The huffman tree is NOT built yet.

func (*HierarchicalSoftmax) Init

func (hs *HierarchicalSoftmax) Init(c *corpus.Corpus, dimension int) (err error)

Init initializes the huffman tree.

func (*HierarchicalSoftmax) Update

func (hs *HierarchicalSoftmax) Update(targetID int, contextVector, poolVector tensor.Tensor, learningRate float64) error

Update updates the word vector using the huffman tree.

type NegativeSampling

type NegativeSampling struct {
	NegativeSampleSize int
	// contains filtered or unexported fields
}

NegativeSampling is a piece of Word2Vec optimizer.

func NewNegativeSampling

func NewNegativeSampling(negativeSampleSize int) *NegativeSampling

NewNegativeSampling creates *NegativeSampling. The negative vector is NOT built yet.

func (*NegativeSampling) Init

func (ns *NegativeSampling) Init(c *corpus.Corpus, dimension int) (err error)

Init initializes the negative vector.

func (*NegativeSampling) Update

func (ns *NegativeSampling) Update(targetID int, contextVector, poolVector tensor.Tensor, learningRate float64) error

Update updates the word vector using the negative vector.

type Node

type Node struct {
	Parent    *Node
	Code      int
	Value     int
	Vector    *model.SyncTensor
	CachePath Nodes
}

Node stores the node with vector in huffman tree.

func (*Node) GetPath

func (n *Node) GetPath() Nodes

GetPath returns the nodes from root to word on huffman tree.

type Nodes

type Nodes []*Node

Nodes is the list of Node.

func (*Nodes) Len

func (n *Nodes) Len() int

func (*Nodes) Less

func (n *Nodes) Less(i, j int) bool

func (*Nodes) Swap

func (n *Nodes) Swap(i, j int)

type Optimizer

type Optimizer interface {
	Init(c *corpus.Corpus, dimension int) error
	Update(targetID int, contextVector, poolVector tensor.Tensor, learningRate float64) error
}

Optimizer is the interface to initialize after scanning corpus once, and update the word vector.

type SkipGram

type SkipGram struct {
	*State
	// contains filtered or unexported fields
}

SkipGram is a piece of Word2Vec model.

func NewSkipGram

func NewSkipGram(s *State) *SkipGram

NewSkipGram creates *SkipGram

func (*SkipGram) Train

func (s *SkipGram) Train(f io.ReadCloser) error

Train call Trainer with SkipGram trainOne.

type State

type State struct {
	*model.Config
	*corpus.Corpus
	// contains filtered or unexported fields
}

State stores all common configs for Word2Vec models.

func NewState

func NewState(config *model.Config, opt Optimizer,
	subsampleThreshold, theta float64, batchSize int) *State

NewState creates *NewState.

func (*State) Preprocess

func (s *State) Preprocess(f io.ReadSeeker) (io.ReadCloser, error)

Preprocess scans the corpus once before Train to count the word frequency.

func (*State) Save

func (s *State) Save(outputPath string) error

Save saves the word vector to outputFile.

func (*State) Trainer

func (s *State) Trainer(f io.ReadCloser, trainOne func(wordIDs []int, wordIndex int, lr float64) error) error

Trainer trains a corpus. It assumes that Preprocess() has already been called

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL