golinear

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 29, 2018 License: BSD-3-Clause Imports: 6 Imported by: 0

README

Introduction

GoDoc

golinear is a package for training and using linear classifiers in the Go programming language (golang).

Installation

To use this package, you need the liblinear library. On Mac OS X, you can install this library with Homebrew:

brew install liblinear

Ubuntu and Debian provide packages for liblinear. However, at the time of writing (July 2, 2014), these were serverly outdated. This package requires version 1.9 or later.

This latest API-stable version (v1) can be installed with the go command:

go get gopkg.in/danieldk/golinear.v1

or included in your source code:

import "gopkg.in/danieldk/golinear.v1"

The package documentation is available at: http://godoc.org/gopkg.in/danieldk/golinear.v1

Plans

  1. Port classification to Go.
  2. Port training to Go.

We will take a pragmatic approach to porting code to Go: if the performance penalty is minor, ported code will flow to the main branch. Otherwise, we will keep it around until the performance is good enough.

Examples

Examples for using golinear can be found at:

https://github.com/danieldk/golinear-examples

Documentation

Overview

Package golinear trains and applies linear classifiers.

The package is a binding against liblinear with a Go-ish interface. Trained models can be saved to and loaded from disk, to avoid the (potentially) costly training process.

A model is trained using a problem. A problem consists of training instances, where each training instance has a class label and a feature vector. The training procedure attempts to find one or more functions that separate the instances of two classes. This model can then predict the class of unseen instances.

Consider for instance that we would like to do sentiment analysis, using the following, humble, training corpus:

Positive: A beautiful album.
Negative: A crappy ugly album.

To represent this as a problem, we have to convert the classses (positive/negative) to an integral class labels and extract features. In this case, we can simply label the classes as positive: 0, negative: 1. We will use the words as our features (a: 1, beautiful: 2, album: 3, crappy: 4, ugly: 5) and use booleans as our feature values. In other words, the sentences will have the following feature vectors:

            1   2   3   4   5
          +---+---+---+---+---+
Positive: | 1 | 1 | 1 | 0 | 0 |
          +---+---+---+---+---+

          +---+---+---+---+---+
Negative: | 1 | 0 | 1 | 1 | 1 |
          +---+---+---+---+---+

We can now construct the problem using this representation:

problem := golinear.NewProblem()
problem.Add(golinear.TrainingInstance{0, golinear.FromDenseVector([]float64{1, 1, 1, 0, 0})})
problem.Add(golinear.TrainingInstance{1, golinear.FromDenseVector([]float64{1, 0, 1, 1, 1})})

The problem is used to train a linear classifier using a set of parameters to choose the type of solver, constraint violation cost, etc. We will use the default parameters, which train a L2-regularized L2-loss support vector classifier.

param := golinear.DefaultParameters()
model, err := golinear.TrainModel(param, problem)
if err != nil {
	log.Fatal(err)
}

Of course, now we would like to use this model to classify other sentences. For instance:

This is a beautiful book.

We map this sentence to the feature vector that we used during training, simply ignoring words that we did not encounter while training the model:

          +---+---+---+---+---+
????????: | 1 | 1 | 0 | 0 | 0 |
          +---+---+---+---+---+

The Predict method of the model is used to predict the label of this feature vector.

label := model.Predict(golinear.FromDenseVector([]float64{1, 1, 0, 0, 0}))

As expected, the model will predict the sentence to be positive (0).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CrossValidation

func CrossValidation(problem *Problem, param Parameters, nFolds uint) ([]float64, error)

Perform cross validation. The instances in the problem are separated in the given number of folds. Each fold is sequentially evaluated using the model trained with the remaining folds. The slice that is returned contains the predicted instance classes.

Types

type ClassWeight

type ClassWeight struct {
	Label int
	Value float64
}

type FeatureValue

type FeatureValue struct {
	Index int
	Value float64
}

Represents a feature and its value. The Index of a feature is used to uniquely identify the feature, and should start at 1.

type FeatureVector

type FeatureVector []FeatureValue

Sparse feature vector, represented as the list (slice) of non-zero features.

func FromDenseVector

func FromDenseVector(denseVector []float64) FeatureVector

Convert a dense feature vector, represented as a slice of feature values to the sparse representation used by this package. The features will be numbered 1..len(denseVector). The following vectors will be equal:

gosvm.FromDenseVector([]float64{0.2, 0.1, 0.3, 0.6})
gosvm.FeatureVector{{1, 0.2}, {2, 0.1}, {3, 0.3}, {4, 0.6}}

type Model

type Model struct {
	// contains filtered or unexported fields
}

A model contains the trained model and can be used to predict the class of a seen or unseen instance.

func LoadModel

func LoadModel(filename string) (*Model, error)

Load a previously saved model.

func TrainModel

func TrainModel(param Parameters, problem *Problem) (*Model, error)

Train an SVM using the given parameters and problem.

func (*Model) Bias

func (model *Model) Bias() float64

Extracts the bias of a two-class problem.

func (*Model) Labels

func (model *Model) Labels() []int

Get a slice with class labels

func (*Model) Predict

func (model *Model) Predict(nodes []FeatureValue) float64

Predict the label of an instance using the given model.

func (*Model) PredictDecisionValues

func (model *Model) PredictDecisionValues(nodes []FeatureValue) (float64, map[int]float64, error)

Predict the label of an instance. In contrast to Predict, it also returns the per-label decision values.

func (*Model) PredictDecisionValuesSlice

func (model *Model) PredictDecisionValuesSlice(nodes []FeatureValue) (float64, []float64, error)

Predict the label of an instance. In contrast to Predict, it also returns the per-label decision values. The PredictDecisionValues function is more user-friendly, but has the overhead of constructing a map. If you are only interested in the classes with the highest decision values, it may be better to use this function in conjunction with Labels().

func (*Model) PredictProbability

func (model *Model) PredictProbability(nodes []FeatureValue) (float64, map[int]float64, error)

Predict the label of an instance, given a model with probability information. This method returns the label of the predicted class, a map of class probabilities. Probability estimates are currently given for logistic regression only. If another solver is used, the probability of each class is zero.

func (*Model) PredictProbabilitySlice

func (model *Model) PredictProbabilitySlice(nodes []FeatureValue) (float64, []float64, error)

Predict the label of an instance, given a model with probability information. This method returns the label of the predicted class, a map of class probabilities. Probability estimates are currently given for logistic regression only. If another solver is used, the probability of each class is zero.

The PredictProbability function is more user-friendly, but has the overhead of constructing a map. If you are only interested in the classes with the highest probabilities, it may be better to use this function in conjunction with Labels().

func (*Model) Save

func (model *Model) Save(filename string) error

Save the model to a file.

func (*Model) Weights

func (model *Model) Weights() []float64

Extracts the weight vector of a two-class problem.

func (*Model) WeightsMulti

func (model *Model) WeightsMulti() [][]float64

Extracts the weight vectors of a multi-class problem.

NOT IMPLEMENTED.

type Parameters

type Parameters struct {
	// The type of solver
	SolverType SolverType

	// The cost of constraints violation.
	Cost float64
	// The relative penalty for each class.
	RelCosts []ClassWeight
}

Parameters for training a linear model.

func DefaultParameters

func DefaultParameters() Parameters

type Problem

type Problem struct {
	// contains filtered or unexported fields
}

A problem is a set of instances and corresponding labels.

func NewProblem

func NewProblem() *Problem

func (*Problem) Add

func (problem *Problem) Add(trainInst TrainingInstance) error

func (*Problem) Bias

func (problem *Problem) Bias() float64

func (*Problem) Iterate

func (problem *Problem) Iterate(fun ProblemIterFunc)

Iterate over the training instances in a problem.

func (*Problem) SetBias

func (problem *Problem) SetBias(bias float64)

type ProblemIterFunc

type ProblemIterFunc func(instance *TrainingInstance) bool

Function prototype for iteration over problems. The function should return 'true' if the iteration should continue or 'false' otherwise.

type SolverType

type SolverType struct {
	// contains filtered or unexported fields
}

func NewL1RL2LossSvc

func NewL1RL2LossSvc(epsilon float64) SolverType

L1-regularized L2-loss support vector classification.

func NewL1RL2LossSvcDefault

func NewL1RL2LossSvcDefault() SolverType

L1-regularized L2-loss support vector classification, epsilon = 0.01.

func NewL1RLogisticRegression

func NewL1RLogisticRegression(epsilon float64) SolverType

L1-regularized logistic regression.

func NewL1RLogisticRegressionDefault

func NewL1RLogisticRegressionDefault() SolverType

L1-regularized logistic regression, epsilon = 0.01.

func NewL2RL1LossSvRegressionDual

func NewL2RL1LossSvRegressionDual(epsilon float64) SolverType

L2-regularized L1-loss support vector regression (dual).

func NewL2RL1LossSvRegressionDualDefault

func NewL2RL1LossSvRegressionDualDefault(epsilon float64) SolverType

L2-regularized L1-loss support vector regression (dual), epsilon = 0.1.

func NewL2RL1LossSvcDual

func NewL2RL1LossSvcDual(epsilon float64) SolverType

L2-regularized L1-loss support vector classification (dual).

func NewL2RL1LossSvcDualDefault

func NewL2RL1LossSvcDualDefault() SolverType

L2-regularized L1-loss support vector classification (dual), epsilon = 0.1.

func NewL2RL2LossSvRegression

func NewL2RL2LossSvRegression(epsilon float64) SolverType

L2-regularized L2-loss support vector regression (primal).

func NewL2RL2LossSvRegressionDefault

func NewL2RL2LossSvRegressionDefault(epsilon float64) SolverType

L2-regularized L2-loss support vector regression (primal), epsilon = 0.001.

func NewL2RL2LossSvRegressionDual

func NewL2RL2LossSvRegressionDual(epsilon float64) SolverType

L2-regularized L2-loss support vector regression (dual).

func NewL2RL2LossSvRegressionDualDefault

func NewL2RL2LossSvRegressionDualDefault(epsilon float64) SolverType

L2-regularized L2-loss support vector regression (dual), epsilon = 0.1.

func NewL2RL2LossSvcDual

func NewL2RL2LossSvcDual(epsilon float64) SolverType

L2-regularized L2-loss support vector classification (dual).

func NewL2RL2LossSvcDualDefault

func NewL2RL2LossSvcDualDefault() SolverType

L2-regularized L2-loss support vector classification (dual), epsilon = 0.1.

func NewL2RL2LossSvcPrimal

func NewL2RL2LossSvcPrimal(epsilon float64) SolverType

L2-regularized L2-loss support vector classification (primal).

func NewL2RL2LossSvcPrimalDefault

func NewL2RL2LossSvcPrimalDefault() SolverType

L2-regularized L2-loss support vector classification (primal), epsilon = 0.01.

func NewL2RLogisticRegression

func NewL2RLogisticRegression(epsilon float64) SolverType

L2-regularized logistic regression (primal).

func NewL2RLogisticRegressionDefault

func NewL2RLogisticRegressionDefault() SolverType

L2-regularized logistic regression (primal), epsilon = 0.01.

func NewL2RLogisticRegressionDual

func NewL2RLogisticRegressionDual(epsilon float64) SolverType

L2-regularized logistic regression (dual) for regression.

func NewL2RLogisticRegressionDualDefault

func NewL2RLogisticRegressionDualDefault() SolverType

L2-regularized logistic regression (dual) for regression, epsilon = 0.1.

func NewMCSVMCS

func NewMCSVMCS(epsilon float64) SolverType

Support vector classification by Crammer and Singer.

func NewMCSVMCSDefault

func NewMCSVMCSDefault() SolverType

Support vector classification by Crammer and Singer, epsilon = 0.1.

type TrainingInstance

type TrainingInstance struct {
	Label    float64
	Features FeatureVector
}

Training instance, consisting of the label of the instance and its feature vector. In classification, the label is an integer indicating the class label. In regression, the label is the target value, which can be any real number. The label is not used for one-class SVMs.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL