nlpbench

package module
v0.0.0-...-5aa9c0b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 9, 2017 License: MIT Imports: 6 Imported by: 0

README

nlpbench

Companion code to blog article series on optimising algorithms and data structures in Go for machine learning and large data sets.

Please refer to blog article series can be found here: http://www.jamesbowman.me/post/optimising-machine-learning-algorithms/

Please find the produced Go library of machine learning algorithm implementations here: http://github.com/james-bowman/nlp

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CountVectoriser1

type CountVectoriser1 struct {
	Vocabulary map[string]int
	// contains filtered or unexported fields
}

func NewCountVectoriser1

func NewCountVectoriser1(removeStopwords bool) *CountVectoriser1

func (*CountVectoriser1) Fit

func (v *CountVectoriser1) Fit(train ...string) *CountVectoriser1

func (*CountVectoriser1) FitTransform

func (v *CountVectoriser1) FitTransform(docs ...string) (*mat64.Dense, error)

func (*CountVectoriser1) Transform

func (v *CountVectoriser1) Transform(docs ...string) (*mat64.Dense, error)

type CountVectoriser2

type CountVectoriser2 struct {
	Vocabulary map[string]int
	// contains filtered or unexported fields
}

func NewCountVectoriser2

func NewCountVectoriser2(removeStopwords bool) *CountVectoriser2

func (*CountVectoriser2) Fit

func (v *CountVectoriser2) Fit(train ...string) *CountVectoriser2

func (*CountVectoriser2) FitTransform

func (v *CountVectoriser2) FitTransform(docs ...string) (*mat64.Dense, error)

func (*CountVectoriser2) Transform

func (v *CountVectoriser2) Transform(docs ...string) (*mat64.Dense, error)

type CountVectoriser3

type CountVectoriser3 struct {
	Vocabulary map[string]int
	// contains filtered or unexported fields
}

func NewCountVectoriser3

func NewCountVectoriser3(removeStopwords bool) *CountVectoriser3

func (*CountVectoriser3) Fit

func (v *CountVectoriser3) Fit(train ...string) *CountVectoriser3

func (*CountVectoriser3) FitTransform

func (v *CountVectoriser3) FitTransform(docs ...string) (*mat64.Dense, error)

func (*CountVectoriser3) Transform

func (v *CountVectoriser3) Transform(docs ...string) (*mat64.Dense, error)

type DOKCountVectoriser1

type DOKCountVectoriser1 struct {
	Vocabulary map[string]int
	// contains filtered or unexported fields
}

func NewDOKCountVectoriser1

func NewDOKCountVectoriser1(removeStopwords bool) *DOKCountVectoriser1

func (*DOKCountVectoriser1) Fit

func (*DOKCountVectoriser1) FitTransform

func (v *DOKCountVectoriser1) FitTransform(docs ...string) (*sparse.DOK, error)

func (*DOKCountVectoriser1) Transform

func (v *DOKCountVectoriser1) Transform(docs ...string) (*sparse.DOK, error)

type SparseTfidfTransformer

type SparseTfidfTransformer struct {
	// contains filtered or unexported fields
}

func (*SparseTfidfTransformer) Fit

func (*SparseTfidfTransformer) FitTransform

func (t *SparseTfidfTransformer) FitTransform(mat mat64.Matrix) (mat64.Matrix, error)

func (*SparseTfidfTransformer) Transform

func (t *SparseTfidfTransformer) Transform(mat mat64.Matrix) (mat64.Matrix, error)

type TfidfTransformer1

type TfidfTransformer1 struct {
	// contains filtered or unexported fields
}

func (*TfidfTransformer1) Fit

func (*TfidfTransformer1) FitTransform

func (t *TfidfTransformer1) FitTransform(mat mat64.Matrix) (*mat64.Dense, error)

func (*TfidfTransformer1) Transform

func (t *TfidfTransformer1) Transform(mat mat64.Matrix) (*mat64.Dense, error)

type TfidfTransformer2

type TfidfTransformer2 struct {
	// contains filtered or unexported fields
}

TfidfTransformer takes a raw term document matrix and weights each raw term frequency value depending upon how commonly it occurs across all documents within the corpus. For example a very commonly occuring word like `the` is likely to occur in all documents and so would be weighted down. More precisely, TfidfTransformer applies a tf-idf algorithm to the matrix where each term frequency is multiplied by the inverse document frequency. Inverse document frequency is calculated as log(n/df) where df is the number of documents in which the term occurs and n is the total number of documents within the corpus. We add 1 to both n and df before division to prevent division by zero.

func NewTfidfTransformer

func NewTfidfTransformer() *TfidfTransformer2

NewTfidfTransformer constructs a new TfidfTransformer.

func (*TfidfTransformer2) Fit

Fit takes a training term document matrix, counts term occurances across all documents and constructs an inverse document frequency transform to apply to matrices in subsequent calls to Transform().

func (*TfidfTransformer2) FitTransform

func (t *TfidfTransformer2) FitTransform(mat mat64.Matrix) (*mat64.Dense, error)

FitTransform is exactly equivalent to calling Fit() followed by Transform() on the same matrix. This is a convenience where separate trianing data is not being used to fit the model i.e. the model is fitted on the fly to the test data.

func (*TfidfTransformer2) Transform

func (t *TfidfTransformer2) Transform(mat mat64.Matrix) (*mat64.Dense, error)

type TfidfTransformer3

type TfidfTransformer3 struct {
	// contains filtered or unexported fields
}

func (*TfidfTransformer3) Fit

Fit takes a training term document matrix, counts term occurances across all documents and constructs an inverse document frequency transform to apply to matrices in subsequent calls to Transform().

func (*TfidfTransformer3) FitTransform

func (t *TfidfTransformer3) FitTransform(mat mat64.Matrix) (*mat64.Dense, error)

FitTransform is exactly equivalent to calling Fit() followed by Transform() on the same matrix. This is a convenience where separate trianing data is not being used to fit the model i.e. the model is fitted on the fly to the test data.

func (*TfidfTransformer3) Transform

func (t *TfidfTransformer3) Transform(mat mat64.Matrix) (*mat64.Dense, error)

type Transformer

type Transformer interface {
	Fit(mat64.Matrix) Transformer
	Transform(mat mat64.Matrix) (*mat64.Dense, error)
	FitTransform(mat mat64.Matrix) (*mat64.Dense, error)
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL