Documentation ¶
Overview ¶
Package golinear trains and applies linear classifiers.
The package is a binding against liblinear with a Go-ish interface. Trained models can be saved to and loaded from disk, to avoid the (potentially) costly training process.
A model is trained using a problem. A problem consists of training instances, where each training instance has a class label and a feature vector. The training procedure attempts to find one or more functions that separate the instances of two classes. This model can then predict the class of unseen instances.
Consider for instance that we would like to do sentiment analysis, using the following, humble, training corpus:
Positive: A beautiful album. Negative: A crappy ugly album.
To represent this as a problem, we have to convert the classses (positive/negative) to an integral class labels and extract features. In this case, we can simply label the classes as positive: 0, negative: 1. We will use the words as our features (a: 1, beautiful: 2, album: 3, crappy: 4, ugly: 5) and use booleans as our feature values. In other words, the sentences will have the following feature vectors:
1 2 3 4 5 +---+---+---+---+---+ Positive: | 1 | 1 | 1 | 0 | 0 | +---+---+---+---+---+ +---+---+---+---+---+ Negative: | 1 | 0 | 1 | 1 | 1 | +---+---+---+---+---+
We can now construct the problem using this representation:
problem := golinear.NewProblem() problem.Add(golinear.TrainingInstance{0, golinear.FromDenseVector([]float64{1, 1, 1, 0, 0})}) problem.Add(golinear.TrainingInstance{1, golinear.FromDenseVector([]float64{1, 0, 1, 1, 1})})
The problem is used to train a linear classifier using a set of parameters to choose the type of solver, constraint violation cost, etc. We will use the default parameters, which train a L2-regularized L2-loss support vector classifier.
param := golinear.DefaultParameters() model, err := golinear.TrainModel(param, problem) if err != nil { log.Fatal(err) }
Of course, now we would like to use this model to classify other sentences. For instance:
This is a beautiful book.
We map this sentence to the feature vector that we used during training, simply ignoring words that we did not encounter while training the model:
+---+---+---+---+---+ ????????: | 1 | 1 | 0 | 0 | 0 | +---+---+---+---+---+
The Predict method of the model is used to predict the label of this feature vector.
label := model.Predict(golinear.FromDenseVector([]float64{1, 1, 0, 0, 0}))
As expected, the model will predict the sentence to be positive (0).
Index ¶
- func CrossValidation(problem *Problem, param Parameters, nFolds uint) ([]float64, error)
- type ClassWeight
- type FeatureValue
- type FeatureVector
- type Model
- func (model *Model) Bias() float64
- func (model *Model) Labels() []int
- func (model *Model) Predict(nodes []FeatureValue) float64
- func (model *Model) PredictDecisionValues(nodes []FeatureValue) (float64, map[int]float64, error)
- func (model *Model) PredictDecisionValuesSlice(nodes []FeatureValue) (float64, []float64, error)
- func (model *Model) PredictProbability(nodes []FeatureValue) (float64, map[int]float64, error)
- func (model *Model) PredictProbabilitySlice(nodes []FeatureValue) (float64, []float64, error)
- func (model *Model) Save(filename string) error
- func (model *Model) Weights() []float64
- func (model *Model) WeightsMulti() [][]float64
- type Parameters
- type Problem
- type ProblemIterFunc
- type SolverType
- func NewL1RL2LossSvc(epsilon float64) SolverType
- func NewL1RL2LossSvcDefault() SolverType
- func NewL1RLogisticRegression(epsilon float64) SolverType
- func NewL1RLogisticRegressionDefault() SolverType
- func NewL2RL1LossSvRegressionDual(epsilon float64) SolverType
- func NewL2RL1LossSvRegressionDualDefault(epsilon float64) SolverType
- func NewL2RL1LossSvcDual(epsilon float64) SolverType
- func NewL2RL1LossSvcDualDefault() SolverType
- func NewL2RL2LossSvRegression(epsilon float64) SolverType
- func NewL2RL2LossSvRegressionDefault(epsilon float64) SolverType
- func NewL2RL2LossSvRegressionDual(epsilon float64) SolverType
- func NewL2RL2LossSvRegressionDualDefault(epsilon float64) SolverType
- func NewL2RL2LossSvcDual(epsilon float64) SolverType
- func NewL2RL2LossSvcDualDefault() SolverType
- func NewL2RL2LossSvcPrimal(epsilon float64) SolverType
- func NewL2RL2LossSvcPrimalDefault() SolverType
- func NewL2RLogisticRegression(epsilon float64) SolverType
- func NewL2RLogisticRegressionDefault() SolverType
- func NewL2RLogisticRegressionDual(epsilon float64) SolverType
- func NewL2RLogisticRegressionDualDefault() SolverType
- func NewMCSVMCS(epsilon float64) SolverType
- func NewMCSVMCSDefault() SolverType
- type TrainingInstance
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CrossValidation ¶
func CrossValidation(problem *Problem, param Parameters, nFolds uint) ([]float64, error)
Perform cross validation. The instances in the problem are separated in the given number of folds. Each fold is sequentially evaluated using the model trained with the remaining folds. The slice that is returned contains the predicted instance classes.
Types ¶
type ClassWeight ¶
type FeatureValue ¶
Represents a feature and its value. The Index of a feature is used to uniquely identify the feature, and should start at 1.
type FeatureVector ¶
type FeatureVector []FeatureValue
Sparse feature vector, represented as the list (slice) of non-zero features.
func FromDenseVector ¶
func FromDenseVector(denseVector []float64) FeatureVector
Convert a dense feature vector, represented as a slice of feature values to the sparse representation used by this package. The features will be numbered 1..len(denseVector). The following vectors will be equal:
gosvm.FromDenseVector([]float64{0.2, 0.1, 0.3, 0.6}) gosvm.FeatureVector{{1, 0.2}, {2, 0.1}, {3, 0.3}, {4, 0.6}}
type Model ¶
type Model struct {
// contains filtered or unexported fields
}
A model contains the trained model and can be used to predict the class of a seen or unseen instance.
func TrainModel ¶
func TrainModel(param Parameters, problem *Problem) (*Model, error)
Train an SVM using the given parameters and problem.
func (*Model) Predict ¶
func (model *Model) Predict(nodes []FeatureValue) float64
Predict the label of an instance using the given model.
func (*Model) PredictDecisionValues ¶
Predict the label of an instance. In contrast to Predict, it also returns the per-label decision values.
func (*Model) PredictDecisionValuesSlice ¶
func (model *Model) PredictDecisionValuesSlice(nodes []FeatureValue) (float64, []float64, error)
Predict the label of an instance. In contrast to Predict, it also returns the per-label decision values. The PredictDecisionValues function is more user-friendly, but has the overhead of constructing a map. If you are only interested in the classes with the highest decision values, it may be better to use this function in conjunction with Labels().
func (*Model) PredictProbability ¶
Predict the label of an instance, given a model with probability information. This method returns the label of the predicted class, a map of class probabilities. Probability estimates are currently given for logistic regression only. If another solver is used, the probability of each class is zero.
func (*Model) PredictProbabilitySlice ¶
func (model *Model) PredictProbabilitySlice(nodes []FeatureValue) (float64, []float64, error)
Predict the label of an instance, given a model with probability information. This method returns the label of the predicted class, a map of class probabilities. Probability estimates are currently given for logistic regression only. If another solver is used, the probability of each class is zero.
The PredictProbability function is more user-friendly, but has the overhead of constructing a map. If you are only interested in the classes with the highest probabilities, it may be better to use this function in conjunction with Labels().
type Parameters ¶
type Parameters struct { // The type of solver SolverType SolverType // The cost of constraints violation. Cost float64 // The relative penalty for each class. RelCosts []ClassWeight }
Parameters for training a linear model.
func DefaultParameters ¶
func DefaultParameters() Parameters
type Problem ¶
type Problem struct {
// contains filtered or unexported fields
}
A problem is a set of instances and corresponding labels.
func NewProblem ¶
func NewProblem() *Problem
func (*Problem) Add ¶
func (problem *Problem) Add(trainInst TrainingInstance) error
func (*Problem) Iterate ¶
func (problem *Problem) Iterate(fun ProblemIterFunc)
Iterate over the training instances in a problem.
type ProblemIterFunc ¶
type ProblemIterFunc func(instance *TrainingInstance) bool
Function prototype for iteration over problems. The function should return 'true' if the iteration should continue or 'false' otherwise.
type SolverType ¶
type SolverType struct {
// contains filtered or unexported fields
}
func NewL1RL2LossSvc ¶
func NewL1RL2LossSvc(epsilon float64) SolverType
L1-regularized L2-loss support vector classification.
func NewL1RL2LossSvcDefault ¶
func NewL1RL2LossSvcDefault() SolverType
L1-regularized L2-loss support vector classification, epsilon = 0.01.
func NewL1RLogisticRegression ¶
func NewL1RLogisticRegression(epsilon float64) SolverType
L1-regularized logistic regression.
func NewL1RLogisticRegressionDefault ¶
func NewL1RLogisticRegressionDefault() SolverType
L1-regularized logistic regression, epsilon = 0.01.
func NewL2RL1LossSvRegressionDual ¶
func NewL2RL1LossSvRegressionDual(epsilon float64) SolverType
L2-regularized L1-loss support vector regression (dual).
func NewL2RL1LossSvRegressionDualDefault ¶
func NewL2RL1LossSvRegressionDualDefault(epsilon float64) SolverType
L2-regularized L1-loss support vector regression (dual), epsilon = 0.1.
func NewL2RL1LossSvcDual ¶
func NewL2RL1LossSvcDual(epsilon float64) SolverType
L2-regularized L1-loss support vector classification (dual).
func NewL2RL1LossSvcDualDefault ¶
func NewL2RL1LossSvcDualDefault() SolverType
L2-regularized L1-loss support vector classification (dual), epsilon = 0.1.
func NewL2RL2LossSvRegression ¶
func NewL2RL2LossSvRegression(epsilon float64) SolverType
L2-regularized L2-loss support vector regression (primal).
func NewL2RL2LossSvRegressionDefault ¶
func NewL2RL2LossSvRegressionDefault(epsilon float64) SolverType
L2-regularized L2-loss support vector regression (primal), epsilon = 0.001.
func NewL2RL2LossSvRegressionDual ¶
func NewL2RL2LossSvRegressionDual(epsilon float64) SolverType
L2-regularized L2-loss support vector regression (dual).
func NewL2RL2LossSvRegressionDualDefault ¶
func NewL2RL2LossSvRegressionDualDefault(epsilon float64) SolverType
L2-regularized L2-loss support vector regression (dual), epsilon = 0.1.
func NewL2RL2LossSvcDual ¶
func NewL2RL2LossSvcDual(epsilon float64) SolverType
L2-regularized L2-loss support vector classification (dual).
func NewL2RL2LossSvcDualDefault ¶
func NewL2RL2LossSvcDualDefault() SolverType
L2-regularized L2-loss support vector classification (dual), epsilon = 0.1.
func NewL2RL2LossSvcPrimal ¶
func NewL2RL2LossSvcPrimal(epsilon float64) SolverType
L2-regularized L2-loss support vector classification (primal).
func NewL2RL2LossSvcPrimalDefault ¶
func NewL2RL2LossSvcPrimalDefault() SolverType
L2-regularized L2-loss support vector classification (primal), epsilon = 0.01.
func NewL2RLogisticRegression ¶
func NewL2RLogisticRegression(epsilon float64) SolverType
L2-regularized logistic regression (primal).
func NewL2RLogisticRegressionDefault ¶
func NewL2RLogisticRegressionDefault() SolverType
L2-regularized logistic regression (primal), epsilon = 0.01.
func NewL2RLogisticRegressionDual ¶
func NewL2RLogisticRegressionDual(epsilon float64) SolverType
L2-regularized logistic regression (dual) for regression.
func NewL2RLogisticRegressionDualDefault ¶
func NewL2RLogisticRegressionDualDefault() SolverType
L2-regularized logistic regression (dual) for regression, epsilon = 0.1.
func NewMCSVMCS ¶
func NewMCSVMCS(epsilon float64) SolverType
Support vector classification by Crammer and Singer.
func NewMCSVMCSDefault ¶
func NewMCSVMCSDefault() SolverType
Support vector classification by Crammer and Singer, epsilon = 0.1.
type TrainingInstance ¶
type TrainingInstance struct { Label float64 Features FeatureVector }
Training instance, consisting of the label of the instance and its feature vector. In classification, the label is an integer indicating the class label. In regression, the label is the target value, which can be any real number. The label is not used for one-class SVMs.