gonline

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 14, 2015 License: MIT Imports: 10 Imported by: 0

README

== gonline

A library of online machine learning algorithms written in golang.

How to Install

$go get github.com/tma15/gonline

How to Build

$cd $GOPATH/src/github.com/tma15/gonline/gonline
$go build

Supported Algorithms

  • Perceptron (p)
  • Passive Aggressive (pa)
  • Passive Aggressive I (pa1)
  • Passive Aggressive II (pa2)
  • Confidence Weighted (cw)
  • AROW (arow)

Characters in parentheses are option arguments for -a of gonline train.

Usage

Template command of training:

$./gonline train -a <ALGORITHM> -m <MODELFILE> -t <TESTINGFILE> -i <ITERATION> <TRAININGFILE1> <TRAININGFILE2> ... <TRAININGFILEK>

To train learner by AROW algorithm:

$wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/news20.scale.bz2
$wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/news20.t.scale.bz2
$bunzip2 news20.scale.bz2 news20.t.scale.bz2
$time ./gonline train -a arow -m model -i 10 -t ./news20.t.scale -withoutshuffle ./news20.scale
algorithm: AROW
testfile ./news20.t.scale
training data will not be shuffled
epoch:1 test accuracy: 0.821438 (3280/3993)
epoch:2 test accuracy: 0.835212 (3335/3993)
epoch:3 test accuracy: 0.842725 (3365/3993)
epoch:4 test accuracy: 0.845980 (3378/3993)
epoch:5 test accuracy: 0.849236 (3391/3993)
epoch:6 test accuracy: 0.853243 (3407/3993)
epoch:7 test accuracy: 0.854746 (3413/3993)
epoch:8 test accuracy: 0.856749 (3421/3993)
epoch:9 test accuracy: 0.859254 (3431/3993)
epoch:10 test accuracy: 0.859755 (3433/3993)
./gonline train -a arow -m model -i 10 -t ./news20.t.scale -withoutshuffle   109.53s user 1.65s system 98% cpu 1:53.25 total

In practice, shuffling training data can improve accuracy.

You can see more command options using help option:

$./gonline train -h
Usage of train:
  -C=0.01: degree of aggressiveness for PA-I and PA-II
  -a="": algorithm for training {p, pa, pa1, pa2, cw, arow}
  -algorithm="": algorithm for training {p, pa, pa1, pa2, cw, arow}
  -eta=0.8: confidence parameter for Confidence Weighted
  -g=10: regularization parameter for AROW
  -i=1: number of iterations
  -m="": file name of model
  -model="": file name of model
  -t="": file name of test data
  -withoutshuffle=false: doesn't shuffle the training data

Template command of testing:

$./gonline test -m <MODELFILE> <TESTINGFILE1> <TESTINGFILE2> ... <TESTINGFILEK>

To evaluate learner:

$./gonline test -m model news20.t.scale
test accuracy: 0.859755 (3433/3993)

Benchmark

For all algorithms which are supported by gonline, fitting 10 iterations on training data news.scale, then predicting test data news.t.scale. Training data don't be shuffled. Default values are used as hyper parameters.

algorithm accuracy
Perceptron 0.778613
Passive Aggressive 0.772101
Passive Aggressive I 0.792136
Passive Aggressive II 0.782870
Confidence Weighted (many-constraints update where k=∞) 0.852241
AROW (the full version) 0.859755

Evaluation is conducted using following command:

$./gonline train -a <ALGORITHM> -m model -i 10 -t ./news20.t.scale -withoutshuffle ./news20.scale

Accuracy of SVMs with linear kernel which is supported by libsvm:

$svm-train -t 0 news20.scale
$svm-predict news20.t.scale news20.scale.model out
Accuracy = 84.022% (3355/3993) (classification)

TODO: Tuning hyper parameters for each algorithm using development data.

Data Format

The format of training and testing data is:

<label> <feature1>:<value1> <feature2>:<value2> ...

Feature names such as <feature1> and <feature2> could be strings besides on integers. For example, words such as soccer and baseball can be used as <feature1> in text classification setting.

References

  • Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz and Yoram Singer. "Online Passive-Aggressive Algorithms". JMLR. 2006.
  • Mark Dredze, Koby Crammer and Fernando Pereira. "Confidence-Weighted Linear Classification". ICML. 2008.
  • Koby Crammer, Mark Dredze and Alex Kulesza. "Multi-Class Confidence Weighted Algorithms". EMNLP. 2009.
  • Koby Crammer, Alex Kulesza and Mark Dredze. "Adaptive Regularization of Weight Vectors". NIPS. 2009.
  • Koby Crammer, Alex Kulesza, and Mark Dredze. "Adaptive Regularization of Weight Vectors". Machine Learning. 2013.

License

This software is released under the MIT License, see LICENSE.txt.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetSortedFeatures

func GetSortedFeatures(x *map[string]float64) []string

func LoadData

func LoadData(fname string) (*[]map[string]float64, *[]string)

func LoadFromStdin

func LoadFromStdin() ([]map[string]float64, []string)

func Max

func Max(x, y float64) float64

func Min

func Min(x, y float64) float64

func Normal_CDF

func Normal_CDF(mu, sigma float64) func(x float64) float64

Cumulative Distribution Function for the Normal distribution

func ShuffleData

func ShuffleData(x *[]map[string]float64, y *[]string)

Types

type Arow

type Arow struct {
	*Learner
	// contains filtered or unexported fields
}

- http://webee.technion.ac.il/people/koby/publications/arow_nips09.pdf - http://web.eecs.umich.edu/~kulesza/pubs/arow_mlj13.pdf

func NewArow

func NewArow(gamma float64) *Arow

func (*Arow) Fit

func (this *Arow) Fit(x *[]map[string]float64, y *[]string)

func (*Arow) Name

func (this *Arow) Name() string

type CW

type CW struct {
	*Learner
	// contains filtered or unexported fields
}

- http://www.cs.jhu.edu/~mdredze/publications/icml_variance.pdf - http://www.aclweb.org/anthology/D09-1052 - http://www.jmlr.org/papers/volume13/crammer12a/crammer12a.pdf

func NewCW

func NewCW(eta float64) *CW

func (*CW) Fit

func (this *CW) Fit(x *[]map[string]float64, y *[]string)

func (*CW) Name

func (this *CW) Name() string

type Classifier

type Classifier struct {
	Weight    [][]float64
	FtDict    Dict
	LabelDict Dict
}

func LoadClassifier

func LoadClassifier(fname string) Classifier

func NewClassifier

func NewClassifier() Classifier

func (*Classifier) Predict

func (this *Classifier) Predict(x *map[string]float64) int

type Dict

type Dict struct {
	Id2elem []string
	Elem2id map[string]int
}

func NewDict

func NewDict() Dict

func (*Dict) AddElem

func (this *Dict) AddElem(elem string)

func (*Dict) HasElem

func (this *Dict) HasElem(elem string) bool

type Learner

type Learner struct {
	Weight    [][]float64
	FtDict    Dict
	LabelDict Dict
}

func (*Learner) Fit

func (this *Learner) Fit(*[]map[string]float64, *[]int)

func (*Learner) GetDics

func (this *Learner) GetDics() (*Dict, *Dict)

func (*Learner) GetParam

func (this *Learner) GetParam() *[][]float64

func (*Learner) Name

func (this *Learner) Name() string

func (*Learner) Save

func (this *Learner) Save(fname string)

func (*Learner) SetDics

func (this *Learner) SetDics(ftdict, labeldict *Dict)

func (*Learner) SetParam

func (this *Learner) SetParam(w *[][]float64)

type LearnerInterface

type LearnerInterface interface {
	Name() string

	Fit(*[]map[string]float64, *[]string)
	Save(string)
	GetParam() *[][]float64
	GetDics() (*Dict, *Dict)
	SetParam(*[][]float64)
	SetDics(*Dict, *Dict)
	// contains filtered or unexported methods
}

type PA

type PA struct {
	*Learner
	C   float64 /* degree of aggressiveness */
	Tau func(float64, float64, float64) float64
}

http://www.jmlr.org/papers/volume7/crammer06a/crammer06a.pdf

func NewPA

func NewPA(mode string, C float64) *PA

func (*PA) Fit

func (this *PA) Fit(x *[]map[string]float64, y *[]string)

func (*PA) Name

func (this *PA) Name() string

type Perceptron

type Perceptron struct {
	*Learner
}

func NewPerceptron

func NewPerceptron() *Perceptron

func (*Perceptron) Fit

func (this *Perceptron) Fit(x *[]map[string]float64, y *[]string)

func (*Perceptron) Name

func (this *Perceptron) Name() string

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL