gonline

package module

v0.0.0-...-8e14ada Latest Latest Go to latest Published: Dec 24, 2017 License: MIT Imports: 16 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/tma15/gonline

Links

Open Source Insights

README ¶

gonline

A Go implementation of online machine learning algorithms

How to Install

$ go get github.com/tma15/gonline

How to Build

$ cd $GOPATH/src/github.com/tma15/gonline/gonline
$ go build

Supported Algorithms

Perceptron (p)
Passive Aggressive (pa)
Passive Aggressive I (pa1)
Passive Aggressive II (pa2)
Confidence Weighted (cw)
Adaptive Regularization of Weight Vectors (arow)
Adaptive Moment Estimation (adam)

Characters in parentheses are option arguments for -a of gonline train.

Usage

Template command of training:

$ ./gonline train -a <ALGORITHM> -m <MODELFILE> -t <TESTINGFILE> -i <ITERATION> <TRAININGFILE1> <TRAININGFILE2> ... <TRAININGFILEK>

To train learner by AROW algorithm:

$ wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/news20.scale.bz2
$ wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/news20.t.scale.bz2
$ bunzip2 news20.scale.bz2 news20.t.scale.bz2
$ time ./gonline train -a arow -m model -i 10 -t ./news20.t.scale -withoutshuffle ./news20.scale
algorithm: AROW
testfile ./news20.t.scale
training data will not be shuffled
epoch:1 test accuracy: 0.821438 (3280/3993)
epoch:2 test accuracy: 0.835212 (3335/3993)
epoch:3 test accuracy: 0.842725 (3365/3993)
epoch:4 test accuracy: 0.845980 (3378/3993)
epoch:5 test accuracy: 0.849236 (3391/3993)
epoch:6 test accuracy: 0.853243 (3407/3993)
epoch:7 test accuracy: 0.854746 (3413/3993)
epoch:8 test accuracy: 0.856749 (3421/3993)
epoch:9 test accuracy: 0.859254 (3431/3993)
epoch:10 test accuracy: 0.859755 (3433/3993)
./gonline train -a arow -m model -i 10 -t ./news20.t.scale -withoutshuffle   109.53s user 1.65s system 98% cpu 1:53.25 total

In practice, shuffling training data can improve accuracy.

If your environment is multi-core CPU, you can make training faster than single core CPU using the following command when a number of training data is large:

$ touch news20.scale.big
$ for i in 1 2 3 4 5; do cat news20.scale >> news20.scale.big; done
$ time ./gonline train -a arow -m model -i 10 -t ./news20.t.scale -withoutshuffle -p 4 -s ipm ./news20.scale.big
./gonline train -a arow -m model -i 10 -t ./news20.t.scale -withoutshuffle -p  291.76s user 12.25s system 179% cpu 2:49.49 total
$ time ./gonline train -a arow -m model -i 10 -t ./news20.t.scale -withoutshuffle -p 1 -s ipm ./news20.scale.big
./gonline train -a arow -m model -i 10 -t ./news20.t.scale -withoutshuffle -p  176.38s user 5.91s system 94% cpu 3:12.42 total

where -s is training strategy and ipm for -s means using Iterative Parameter Mixture for training. -p is number of using cores for training. These experiments are conducted by using 1.7 GHz Intel Core i5. When a number of training data is not large, training time will not be shortened.

You can see more command options using help option:

$ ./gonline train -h
Usage of train:
  -C=0.01: degree of aggressiveness for PA-I and PA-II
  -a="": algorithm for training {p, pa, pa1, pa2, cw, arow}
  -algorithm="": algorithm for training {p, pa, pa1, pa2, cw, arow}
  -eta=0.8: confidence parameter for Confidence Weighted
  -g=10: regularization parameter for AROW
  -i=1: number of iterations
  -m="": file name of model
  -model="": file name of model
  -p=4: number of cores for ipm (Iterative Prameter Mixture)
  -s="": training strategy {ipm}; default is training with single core
  -t="": file name of test data
  -withoutshuffle=false: does not shuffle the training data

Template command of testing:

$ ./gonline test -m <MODELFILE> <TESTINGFILE1> <TESTINGFILE2> ... <TESTINGFILEK>

To evaluate learner:

$ ./gonline test -m model news20.t.scale
test accuracy: 0.859755 (3433/3993)

Benchmark

For all algorithms which are supported by gonline, fitting 10 iterations on training data news.scale, then predicting test data news.t.scale. Training data are not shuffled. Default values are used as hyper parameters.

algorithm	accuracy
Perceptron	0.798147
Passive Aggressive	0.769597
Passive Aggressive I	0.798147
Passive Aggressive II	0.801402
Confidence Weighted (many-constraints update where k=∞)	0.851741
AROW (the full version)	0.860255
ADAM	0.846481

Evaluation is conducted using following command:

$ ./gonline train -a <ALGORITHM> -m model -i 10 -t ./news20.t.scale -withoutshuffle ./news20.scale

Accuracy of SVMs with linear kernel which is supported by libsvm:

$ svm-train -t 0 news20.scale
$ svm-predict news20.t.scale news20.scale.model out
Accuracy = 84.022% (3355/3993) (classification)

TODO: Tuning hyper parameters for each algorithm using development data.

Data Format

The format of training and testing data is:

<label> <feature1>:<value1> <feature2>:<value2> ...

Feature names such as <feature1> and <feature2> could be strings besides on integers. For example, words such as soccer and baseball can be used as <feature1> in text classification setting.

References

Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz and Yoram Singer. "Online Passive-Aggressive Algorithms". JMLR. 2006.
Mark Dredze, Koby Crammer, and Fernando Pereira. "Confidence-Weighted Linear Classification". ICML. 2008.
Koby Crammer, Mark Dredze, and Alex Kulesza. "Multi-Class Confidence Weighted Algorithms". EMNLP. 2009.
Koby Crammer, Alex Kulesza, and Mark Dredze. "Adaptive Regularization of Weight Vectors". NIPS. 2009.
Koby Crammer, Alex Kulesza, and Mark Dredze. "Adaptive Regularization of Weight Vectors". Machine Learning. 2013.
Ryan McDonald, Keith Hall, and Gideon Mann. "Distributed Training Strategies for the Structured Perceptron". NAACL. 2010.
Diederik P. Kingma and Jimmy Lei Ba. "ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION". ICLR. 2015.

License

This software is released under the MIT License, see LICENSE.txt.

Documentation ¶

Index ¶

func BroadCastModel(avg_learner *LearnerInterface, learners *[]LearnerInterface)
func FitLearners(learners *[]LearnerInterface, x *[]map[string]float64, y *[]string)
func LoadData(fname string) (*[]map[string]float64, *[]string)
func LoadFromStdin() ([]map[string]float64, []string)
func Max(x, y float64) float64
func Min(x, y float64) float64
func Normal_CDF(mu, sigma float64) func(x float64) float64
func ShuffleData(x *[]map[string]float64, y *[]string)
type Adam
- func NewAdam() *Adam
- func (this *Adam) Fit(x *[]map[string]float64, y *[]string)
- func (this *Adam) GetParams() *[][][]float64
- func (this *Adam) Name() string
type Arow
- func NewArow(g float64) *Arow
- func (this *Arow) Fit(x *[]map[string]float64, y *[]string)
- func (this *Arow) GetNonZeroParams() *[][][]Param
- func (this *Arow) GetParams() *[][][]float64
- func (this *Arow) Name() string
- func (this *Arow) SetParams(params *[][][]float64)
type CW
- func NewCW(eta float64) *CW
- func (this *CW) Fit(x *[]map[string]float64, y *[]string)
- func (this *CW) GetParams() *[][][]float64
- func (this *CW) Name() string
- func (this *CW) SetParams(params *[][][]float64)
type Classifier
- func LoadClassifier(fname string) Classifier
- func LoadClassifierBinary(fname string) Classifier
- func NewClassifier() Classifier
- func (this *Classifier) Predict(x *map[string]float64) int
- func (this *Classifier) PredictTopN(x *map[string]float64, n int) ([]int, []float64)
type Client
- func NewClient() Client
- func (this *Client) SendData(host, port string, data *Data) *LearnerInterface
- func (this *Client) SendModel(host, port string, learner *LearnerInterface)
type Data
- func (this *Data) GetBatch(start, end int) *Data
type Dict
- func NewDict() Dict
- func (this *Dict) AddElem(elem string)
- func (this *Dict) HasElem(elem string) bool
type Feature
- func NewFeature(id int, val float64, name string) Feature
type Learner
- func (this *Learner) Fit(*[]map[string]float64, *[]int)
- func (this *Learner) GetDics() (*Dict, *Dict)
- func (this *Learner) GetNonZeroParams() *[][][]Param
- func (this *Learner) GetParam() *[][]float64
- func (this *Learner) GetParams() *[][][]float64
- func (this *Learner) Name() string
- func (this *Learner) Save(fname string)
- func (this *Learner) SaveBinary(fname string)
- func (this *Learner) SetDics(ftdict, labeldict *Dict)
- func (this *Learner) SetParam(w *[][]float64)
- func (this *Learner) SetParams(params *[][][]float64)
type LearnerInterface
- func AverageModels(learners []LearnerInterface) *LearnerInterface
type LearnerServer
- func NewLearnerServer(host, port string) LearnerServer
- func (this *LearnerServer) Start()
type Margin
type Margins
- func (this Margins) Len() int
- func (this Margins) Less(i, j int) bool
- func (this Margins) Swap(i, j int)
type Model
type PA
- func NewPA(mode string, C float64) *PA
- func (this *PA) Fit(x *[]map[string]float64, y *[]string)
- func (this *PA) Name() string
type Param
type Perceptron
- func NewPerceptron() *Perceptron
- func (this *Perceptron) Fit(x *[]map[string]float64, y *[]string)
- func (this *Perceptron) Name() string

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func BroadCastModel ¶

func BroadCastModel(avg_learner *LearnerInterface, learners *[]LearnerInterface)

func FitLearners ¶

func FitLearners(learners *[]LearnerInterface, x *[]map[string]float64, y *[]string)

func LoadData ¶

func LoadData(fname string) (*[]map[string]float64, *[]string)

func LoadFromStdin ¶

func LoadFromStdin() ([]map[string]float64, []string)

func Max ¶

func Max(x, y float64) float64

func Min ¶

func Min(x, y float64) float64

func Normal_CDF ¶

func Normal_CDF(mu, sigma float64) func(x float64) float64

Cumulative Distribution Function for the Normal distribution

func ShuffleData ¶

func ShuffleData(x *[]map[string]float64, y *[]string)

Types ¶

type Adam ¶

type Adam struct {
	*Learner

	M [][]float64
	V [][]float64
	// contains filtered or unexported fields
}

ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION http://arxiv.org/pdf/1412.6980.pdf

func NewAdam ¶

func NewAdam() *Adam

func (*Adam) Fit ¶

func (this *Adam) Fit(x *[]map[string]float64, y *[]string)

func (*Adam) GetParams ¶

func (this *Adam) GetParams() *[][][]float64

func (*Adam) Name ¶

func (this *Adam) Name() string

type Arow ¶

type Arow struct {
	*Learner

	Diag [][]float64
	// contains filtered or unexported fields
}

- http://webee.technion.ac.il/people/koby/publications/arow_nips09.pdf - http://web.eecs.umich.edu/~kulesza/pubs/arow_mlj13.pdf

func NewArow ¶

func NewArow(g float64) *Arow

func (*Arow) Fit ¶

func (this *Arow) Fit(x *[]map[string]float64, y *[]string)

func (*Arow) GetNonZeroParams ¶

func (this *Arow) GetNonZeroParams() *[][][]Param

func (*Arow) GetParams ¶

func (this *Arow) GetParams() *[][][]float64

func (*Arow) Name ¶

func (this *Arow) Name() string

func (*Arow) SetParams ¶

func (this *Arow) SetParams(params *[][][]float64)

type CW ¶

type CW struct {
	*Learner

	Diag [][]float64
	// contains filtered or unexported fields
}

- http://www.cs.jhu.edu/~mdredze/publications/icml_variance.pdf - http://www.aclweb.org/anthology/D09-1052 - http://www.jmlr.org/papers/volume13/crammer12a/crammer12a.pdf

func NewCW ¶

func NewCW(eta float64) *CW

func (*CW) Fit ¶

func (this *CW) Fit(x *[]map[string]float64, y *[]string)

func (*CW) GetParams ¶

func (this *CW) GetParams() *[][][]float64

func (*CW) Name ¶

func (this *CW) Name() string

func (*CW) SetParams ¶

func (this *CW) SetParams(params *[][][]float64)

type Classifier ¶

type Classifier struct {
	Weight    [][]float64
	FtDict    Dict
	LabelDict Dict
}

func LoadClassifier ¶

func LoadClassifier(fname string) Classifier

func LoadClassifierBinary ¶

func LoadClassifierBinary(fname string) Classifier

func NewClassifier ¶

func NewClassifier() Classifier

func (*Classifier) Predict ¶

func (this *Classifier) Predict(x *map[string]float64) int

func (*Classifier) PredictTopN ¶

func (this *Classifier) PredictTopN(x *map[string]float64, n int) ([]int, []float64)

type Client ¶

type Client struct {
}

func NewClient ¶

func NewClient() Client

func (*Client) SendData ¶

func (this *Client) SendData(host, port string, data *Data) *LearnerInterface

func (*Client) SendModel ¶

func (this *Client) SendModel(host, port string, learner *LearnerInterface)

type Data ¶

type Data struct {
	X *[]map[string]float64 `json:"x" msgpack:"x"`
	Y *[]string             `json:"y" msgpack:"y"`
}

func (*Data) GetBatch ¶

func (this *Data) GetBatch(start, end int) *Data

type Dict ¶

type Dict struct {
	Id2elem []string
	Elem2id map[string]int
}

func NewDict ¶

func NewDict() Dict

func (*Dict) AddElem ¶

func (this *Dict) AddElem(elem string)

func (*Dict) HasElem ¶

func (this *Dict) HasElem(elem string) bool

type Feature ¶

type Feature struct {
	Id   int
	Val  float64
	Name string
}

func NewFeature ¶

func NewFeature(id int, val float64, name string) Feature

type Learner ¶

type Learner struct {
	Weight    [][]float64
	FtDict    Dict
	LabelDict Dict
}

func (*Learner) Fit ¶

func (this *Learner) Fit(*[]map[string]float64, *[]int)

func (*Learner) GetDics ¶

func (this *Learner) GetDics() (*Dict, *Dict)

func (*Learner) GetNonZeroParams ¶

func (this *Learner) GetNonZeroParams() *[][][]Param

func (*Learner) GetParam ¶

func (this *Learner) GetParam() *[][]float64

func (*Learner) GetParams ¶

func (this *Learner) GetParams() *[][][]float64

func (*Learner) Name ¶

func (this *Learner) Name() string

func (*Learner) Save ¶

func (this *Learner) Save(fname string)

func (*Learner) SaveBinary ¶

func (this *Learner) SaveBinary(fname string)

func (*Learner) SetDics ¶

func (this *Learner) SetDics(ftdict, labeldict *Dict)

func (*Learner) SetParam ¶

func (this *Learner) SetParam(w *[][]float64)

func (*Learner) SetParams ¶

func (this *Learner) SetParams(params *[][][]float64)

type LearnerInterface ¶

type LearnerInterface interface {
	Name() string

	Fit(*[]map[string]float64, *[]string)
	Save(string)
	SaveBinary(string)
	GetParam() *[][]float64
	GetParams() *[][][]float64
	GetNonZeroParams() *[][][]Param
	GetDics() (*Dict, *Dict)
	SetParam(*[][]float64)
	SetParams(*[][][]float64)
	SetDics(*Dict, *Dict)
	// contains filtered or unexported methods
}

func AverageModels ¶

func AverageModels(learners []LearnerInterface) *LearnerInterface

Repeat following processes:

For every two learners, calculate average model,
generate a slice of averaged models.

Finally, return an average model over all learners.

type LearnerServer ¶

type LearnerServer struct {
	Learner LearnerInterface
	Host    string
	Port    string
}

func NewLearnerServer ¶

func NewLearnerServer(host, port string) LearnerServer

func (*LearnerServer) Start ¶

func (this *LearnerServer) Start()

type Margin ¶

type Margin struct {
	Id  int
	Val float64
}

type Margins ¶

type Margins []Margin

func (Margins) Len ¶

func (this Margins) Len() int

func (Margins) Less ¶

func (this Margins) Less(i, j int) bool

func (Margins) Swap ¶

func (this Margins) Swap(i, j int)

type Model ¶

type Model struct {
	Algorightm string         `json:"a" msgpack:"a"`
	Id2Feature []string       `json:"id2f" msgpack:"id2f"`
	Feature2Id map[string]int `json:"f2id" msgpack:"f2id"`
	Params     [][][]float64  `json:"params msgpack:"params"`
	Id2Label   []string       `json:"id2y" msgpack:"id2y"`
	Label2Id   map[string]int `json:"y2id" msgpack:"y2id"`
}

type PA ¶

type PA struct {
	*Learner
	C   float64 /* degree of aggressiveness */
	Tau func(float64, float64, float64) float64
}

http://www.jmlr.org/papers/volume7/crammer06a/crammer06a.pdf

func NewPA ¶

func NewPA(mode string, C float64) *PA

func (*PA) Fit ¶

func (this *PA) Fit(x *[]map[string]float64, y *[]string)

func (*PA) Name ¶

func (this *PA) Name() string

type Param ¶

type Param Feature

type Perceptron ¶

type Perceptron struct {
	*Learner
}

func NewPerceptron ¶

func NewPerceptron() *Perceptron

func (*Perceptron) Fit ¶

func (this *Perceptron) Fit(x *[]map[string]float64, y *[]string)

func (*Perceptron) Name ¶

func (this *Perceptron) Name() string

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
gonline

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL