ml

package
v1.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 13, 2020 License: BSD-3-Clause Imports: 14 Imported by: 0

README

Gosl. ml. Machine Learning

go.dev reference

More information is available in the documentation of this package.

Package ml implements functions to develop Machine Learning algorithms. A goal is to handle large problems. Nonetheless, this package is on its early stages of development.

This package has been initiated with basis on the great teachings of Prof. Andrew Ng [1,2].

White papers

  1. Machine Learning

TODO

  1. Implement Support Vector Machines
  2. Implement Artificial Neural Networks
  3. Implement concurrency and add an option to run code in parallel

Linear and Logistic Regression

The Regression interface defines the functions that LinReg and LogReg must implement so they can be trained by using GraDescReg Gradient-Descent or plotted using PlotterReg.

The ParamsReg holds the Theta and Bias parameters, not in a single vector as customarily in other packages. The ParamsReg is an Observable (from package utl) structure so it will notify changes to Observers.

The Data structure holds the X (nSamples versus nFeatures) matrix of data, raw or mapped according to a mapping rule. The Data structure is also Observable.

The Stat structure reads Data and compute basic statistics. It is an Observer of Data and thus will get notified of data changes.

A simple Linear Regression can be carried out using the following code (see more in the Examples folder or the t_???_test.go test files)

// data
XYraw := [][]float64{
    {0.99, 90.01},
    {1.02, 89.05},
    ...
    {1.43, 94.98},
    {0.95, 87.33},
}
data := ml.NewDataGivenRawXY(XYraw)

// parameters
params := ml.NewParamsReg(data.Nfeatures)

// model
model := ml.NewLinReg(data, params, "reg01")

// train using analytical solution
model.Train()

// ----------------------- plotting --------------------------

// clear plotting area
plt.Reset(true, &plt.A{WidthPt: 400, Dpi: 150, Prop: 1.5})

// plot data x-y
plt.Subplot(2, 1, 1)
pp := ml.NewPlotterReg(data, params, model, nil)
pp.DataY(0)

// plot model x-y
pp.ModelY(0, 0.8, 1.6)

// plot contour of cost function
plt.Subplot(2, 1, 2)
pp.ContourCost(-1, 0, 0, 100, 0, 70)

// save figure
plt.Save("/tmp/gosl", "ml_simple01")
Output of some examples

Simple

Prof A Ng's Test 1

Prof A Ng's Test 2

K-means Clustering

Prof A Ng's Test 1

References

[1] Ng A, CS229 Machine Learning, Stanford, https://see.stanford.edu/Course/CS229

[2] Ng A, Coursera https://www.coursera.org/learn/machine-learning

Documentation

Overview

Package ml implements Machine Learning algorithms

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Data

type Data struct {
	utl.Observable // can notify others of changes here via data.NotifyUpdate()

	// input
	Nsamples  int        // number of data points (samples). number of rows in X and Y
	Nfeatures int        // number of features. number of columns in X
	X         *la.Matrix // [nSamples][nFeatures] X values
	Y         la.Vector  // [nSamples] Y values [optional]

	// access
	Stat *Stat // statistics about this data
}

Data holds data in matrix format; e.g. for regression computations

Example:
       _          _                                     _   _
      |  -1  0 -3  |                                   |  0  |
      |  -2  3  3  |                       (optional)  |  1  |
  X = |   3  1  4  |                               Y = |  1  |
      |  -4  5  0  |                                   |  0  |
      |_  1 -8  5 _|(nSamples x nFeatures)             |_ 1 _|(nSamples)

NOTE: remember to call data.NotifyUpdate() after changing X or y components

func NewData

func NewData(nSamples, nFeatures int, useY, allocate bool) (o *Data)

NewData returns a new object to hold ML data

Input:
  nSamples  -- number of data samples (rows in X)
  nFeatures -- number of features (columns in X)
  useY      -- use y data vector
  allocate  -- allocates X (and Y); otherwise,
               X and Y must be set using Set() method
Output:
  new object

func NewDataGivenRawX

func NewDataGivenRawX(Xraw [][]float64) (o *Data)

NewDataGivenRawX returns a new object with data set from raw X values

Input:
  Xraw -- [nSamples][nFeatures] table with x values (NO y values)
Output:
  new object

func NewDataGivenRawXY

func NewDataGivenRawXY(XYraw [][]float64) (o *Data)

NewDataGivenRawXY returns a new object with data set from raw XY values

Input:
  XYraw -- [nSamples][nFeatures+1] table with x and y raw values,
           where the last column contains y-values
Output:
  new object

func (*Data) GetCopy

func (o *Data) GetCopy() (p *Data)

GetCopy returns a deep copy of this object

func (*Data) Set

func (o *Data) Set(X *la.Matrix, Y la.Vector)

Set sets X matrix and Y vector [optional] and notify observers

Input:
  X -- x values
  Y -- y values [optional]

type DataMapper

type DataMapper interface {
	Map(x, xRaw la.Vector)                        // maps xRaw into x
	GetMapped(XYraw [][]float64, useY bool) *Data // returns new data with mapped X values
	NumOriginalFeatures() int                     // returns the number of original features
	NumExtraFeatures() int                        // returns the number of added features
}

DataMapper maps features into an expanded set of features

type Kmeans

type Kmeans struct {
	Classes   []int       // [nSamples] indices of classes of each sample
	Centroids []la.Vector // [nClasses][nFeatures] coordinates of centroids
	Nmembers  []int       // [nClasses] number of members in each class
	// contains filtered or unexported fields
}

Kmeans implements the K-means model (Observer of Data)

func NewKmeans

func NewKmeans(data *Data, nClasses int) (o *Kmeans)

NewKmeans returns a new K-means model

func (*Kmeans) ComputeCentroids

func (o *Kmeans) ComputeCentroids()

ComputeCentroids update centroids based on new classes information (from FindClosestCentroids)

func (*Kmeans) FindClosestCentroids

func (o *Kmeans) FindClosestCentroids()

FindClosestCentroids finds closest centroids to each sample

func (*Kmeans) Nclasses

func (o *Kmeans) Nclasses() int

Nclasses returns the number of classes

func (*Kmeans) SetCentroids

func (o *Kmeans) SetCentroids(Xc [][]float64)

SetCentroids sets centroids; e.g. trial centroids

Xc -- [nClass][nFeatures]

func (*Kmeans) Train

func (o *Kmeans) Train(nMaxIt int, tolNormChange float64) (nIter int)

Train trains model

func (*Kmeans) Update

func (o *Kmeans) Update()

Update perform updates after data has been changed (as an Observer)

type LinReg

type LinReg struct {
	ParamsReg // import ParamsReg
	// contains filtered or unexported fields
}

LinReg implements a linear regression model

func NewLinReg

func NewLinReg(data *Data) (o *LinReg)

NewLinReg returns a new LinReg object

data -- X,y data

func (*LinReg) Cost

func (o *LinReg) Cost() (c float64)

Cost returns the cost c(x;θ,b)

Input:
  data -- X,y data
  params -- θ and b
  x -- vector of features
Output:
  c -- total cost (model error)

func (*LinReg) Gradients

func (o *LinReg) Gradients(dCdθ la.Vector) (dCdb float64)

Gradients returns ∂C/∂θ and ∂C/∂b

Output:
  dCdθ -- ∂C/∂θ
  dCdb -- ∂C/∂b

func (*LinReg) Predict

func (o *LinReg) Predict(x la.Vector) (y float64)

Predict returns the model evaluation @ {x;θ,b}

Input:
  x -- vector of features
Output:
  y -- model prediction y(x)

func (*LinReg) Train

func (o *LinReg) Train()

Train finds θ and b using closed-form solution

Input:
  data -- X,y data
Output:
  params -- θ and b

func (*LinReg) TrainNumerical

func (o *LinReg) TrainNumerical(θini la.Vector, bini float64, method string, saveHist bool, control dbf.Params) (minCost float64, hist *opt.History)

TrainNumerical trains model using numerical optimizer

θini -- initial (trial) θ values
bini -- initial (trial) bias
method -- method/kind of numerical solver. e.g. conjgrad, powel, graddesc
saveHist -- save history
control -- parameters to numerical solver. See package 'opt'

type LogReg

type LogReg struct {
	ParamsReg // import ParamsReg
	// contains filtered or unexported fields
}

LogReg implements a logistic regression model (Observer of Data)

func NewLogReg

func NewLogReg(data *Data) (o *LogReg)

NewLogReg returns a new LogReg object

data -- X,y data

func (*LogReg) AllocateGradient

func (o *LogReg) AllocateGradient() (dCdθ la.Vector)

AllocateGradient allocate object to compute Gradients

func (*LogReg) AllocateHessian

func (o *LogReg) AllocateHessian() (d, v la.Vector, D, H *la.Matrix)

AllocateHessian allocate objects to compute Hessian

func (*LogReg) Cost

func (o *LogReg) Cost() (c float64)

Cost returns the cost c(x;θ,b)

Input:
  data -- X,y data
  params -- θ and b
  x -- vector of features
Output:
  c -- total cost (model error)

func (*LogReg) Gradients

func (o *LogReg) Gradients(dCdθ la.Vector) (dCdb float64)

Gradients returns ∂C/∂θ and ∂C/∂b

Output:
  dCdθ -- ∂C/∂θ
  dCdb -- ∂C/∂b

func (*LogReg) Hessian

func (o *LogReg) Hessian(d, v la.Vector, D, H *la.Matrix) (w float64)

Hessian computes the Hessian matrix and other partial derivatives

Input, if d !=nil, otherwise allocate these four objects:
  d -- [nSamples]  d[i] = g(l[i]) * [ 1 - g(l[i]) ]  auxiliary vector
  v -- [nFeatures] v = ∂²C/∂θ∂b second order partial derivative
  D -- [nSamples][nFeatures]  D[i][j] = d[i]*X[i][j]  auxiliary matrix
  H -- [nFeatures][nFeatures]  H = ∂²C/∂θ² Hessian matrix

Output, either new objectos or pointers to the input ones:
  dNew := d   (allocated here if d == nil)
  vNew := v   (allocated here if v == nil)
  Dnew := D   (allocated here if D == nil)
  Hnew := H   (allocated here if H == nil)
  w -- H = ∂²C/∂b²

func (*LogReg) Predict

func (o *LogReg) Predict(x la.Vector) (y float64)

Predict returns the model evaluation @ {x;θ,b}

Input:
  x -- vector of features
Output:
  y -- model prediction y(x)

func (*LogReg) Train

func (o *LogReg) Train()

Train finds θ and b using Newton's method

Input:
  data -- X,y data
Output:
  params -- θ and b

func (*LogReg) TrainNumerical

func (o *LogReg) TrainNumerical(θini la.Vector, bini float64, method string, saveHist bool, control dbf.Params) (minCost float64, hist *opt.History)

TrainNumerical trains model using numerical optimizer

θini -- initial (trial) θ values
bini -- initial (trial) bias
method -- method/kind of numerical solver. e.g. conjgrad, powel, graddesc
saveHist -- save history
control -- parameters to numerical solver. See package 'opt'

func (*LogReg) Update

func (o *LogReg) Update()

Update perform updates after data has been changed (as an Observer)

type LogRegMulti

type LogRegMulti struct {
	// contains filtered or unexported fields
}

LogRegMulti implements a logistic regression model for multiple classes (Observer of data)

func NewLogRegMulti

func NewLogRegMulti(data *Data) (o *LogRegMulti)

NewLogRegMulti returns a new object NOTE: the y-vector in data must have values in [0, nClass-1]

func (*LogRegMulti) GetFunctionsForPlotting

func (o *LogRegMulti) GetFunctionsForPlotting() (ffcn fun.Sv, ffcns []fun.Sv)

GetFunctionsForPlotting returns functions for plotting

func (*LogRegMulti) Nclasses

func (o *LogRegMulti) Nclasses() int

Nclasses returns the number of classes

func (*LogRegMulti) Predict

func (o *LogRegMulti) Predict(x la.Vector) (class int, probs []float64)

Predict returns the model evaluation @ {x;θ,b}

Input:
  x -- vector of features
Output:
  class -- class with the highest probability
  probs -- probabilities

func (*LogRegMulti) SetLambda

func (o *LogRegMulti) SetLambda(lambda float64)

SetLambda sets the regularization parameter

func (*LogRegMulti) Train

func (o *LogRegMulti) Train()

Train finds the parameters using Newton's method

func (*LogRegMulti) TrainNumerical

func (o *LogRegMulti) TrainNumerical(method string, saveHist bool, control dbf.Params) (minCosts []float64, hists []*opt.History)

TrainNumerical trains model using numerical optimizer

method -- method/kind of numerical solver. e.g. conjgrad, powel, graddesc
saveHist -- save history
control -- parameters to numerical solver. See package 'opt'

func (*LogRegMulti) Update

func (o *LogRegMulti) Update()

Update perform updates after data has been changed (as an Observer)

type ParamsReg

type ParamsReg struct {
	utl.Observable // notifies interested parties
	// contains filtered or unexported fields
}

ParamsReg holds the θ and b parameters for regression computations

NOTE: Since ParamsReg is an Observable, the internal values
      should only be changed by calling the Set... methods since
      these methods will notify the Observers

func (*ParamsReg) AccessBias

func (o *ParamsReg) AccessBias() (ptb *float64)

AccessBias returns access (pointer) to b

func (*ParamsReg) AccessThetas

func (o *ParamsReg) AccessThetas() (θ la.Vector)

AccessThetas returns access (slice) to θ

func (*ParamsReg) Backup

func (o *ParamsReg) Backup()

Backup creates an internal copy of parameters

func (*ParamsReg) GetBias

func (o *ParamsReg) GetBias() (b float64)

GetBias gets a copy of b

func (*ParamsReg) GetDegree

func (o *ParamsReg) GetDegree() (p int)

GetDegree gets a copy of p

func (*ParamsReg) GetLambda

func (o *ParamsReg) GetLambda() (λ float64)

GetLambda gets a copy of λ

func (*ParamsReg) GetParam

func (o *ParamsReg) GetParam(i int) (value float64)

GetParam returns either θ or b (use negative indices for b)

i -- index of θ or -1 for bias

func (*ParamsReg) GetTheta

func (o *ParamsReg) GetTheta(i int) (θi float64)

GetTheta returns the value of θ[i]

func (*ParamsReg) GetThetas

func (o *ParamsReg) GetThetas() (θ la.Vector)

GetThetas gets a copy of θ

func (*ParamsReg) Init

func (o *ParamsReg) Init(nFeatures int)

Init initializes ParamsReg with nFeatures (number of features)

func (*ParamsReg) Restore

func (o *ParamsReg) Restore(skipNotification bool)

Restore restores an internal copy of parameters and notifies observers

func (*ParamsReg) SetBias

func (o *ParamsReg) SetBias(b float64)

SetBias sets b and notifies observers

func (*ParamsReg) SetDegree

func (o *ParamsReg) SetDegree(p int)

SetDegree sets p and notifies observers

func (*ParamsReg) SetJSON

func (o *ParamsReg) SetJSON(jsonString string)

SetJSON sets parameters from JSON string and notifies observers

func (*ParamsReg) SetLambda

func (o *ParamsReg) SetLambda(λ float64)

SetLambda sets λ and notifies observers

func (*ParamsReg) SetParam

func (o *ParamsReg) SetParam(i int, value float64)

SetParam sets either θ or b (use negative indices for b). Notifies observers

i -- index of θ or -1 for bias

func (*ParamsReg) SetParams

func (o *ParamsReg) SetParams(θ la.Vector, b float64)

SetParams sets θ and b and notifies observers

func (*ParamsReg) SetTheta

func (o *ParamsReg) SetTheta(i int, θi float64)

SetTheta sets one component of θ and notifies observers

func (*ParamsReg) SetThetas

func (o *ParamsReg) SetThetas(θ la.Vector)

SetThetas sets the whole vector θ and notifies observers

type Plotter

type Plotter struct {

	// constants
	NumPointsModelY int // number of points for ModelY()
	NumPointsModelC int // nubmer of poitns for ModelC()

	// arguments: data
	ArgsDataY     *plt.A         // args for data y
	ArgsBinClassY map[int]*plt.A // maps y classes [0 or 1] to plot arguments
	ArgsClassesY  map[int]*plt.A // maps y classes [0, 1, 2, ...] to plot arguments

	// arguments: centroids
	ArgsCentroids   *plt.A // args for centroids
	ArgsCentroCirc1 *plt.A // args for circle highlighting centroids
	ArgsCentroCirc2 *plt.A // args for circle highlighting centroids

	// arguments: model
	ArgsModelY *plt.A // arguments for x-y model line
	ArgsModelC *plt.A // arguments for ContourModel
	// contains filtered or unexported fields
}

Plotter plots results from Machine Learning models

func NewPlotter

func NewPlotter(data *Data, mapper DataMapper) (o *Plotter)

NewPlotter returns a new ploter

mapper -- data mapper [may be nil]

func (*Plotter) Centroids

func (o *Plotter) Centroids(centroids []la.Vector)

Centroids plots centroids of classes

func (*Plotter) DataClass

func (o *Plotter) DataClass(nClass, iFeature, jFeature int, classes []int)

DataClass plots data classes

classes -- use given classes instead of data.Y

func (*Plotter) DataY

func (o *Plotter) DataY(iFeature int)

DataY plots data x[iFeature] versus data y values

func (*Plotter) ModelC

func (o *Plotter) ModelC(model fun.Sv, iFeature, jFeature int, level float64, ximin, ximax, xjmin, xjmax float64)

ModelC plots contour defined by the model f({x} with varying x[iFeature] and x[jFeature]

func (*Plotter) ModelClass

func (o *Plotter) ModelClass(model fun.Sv, nClass, iFeature, jFeature int, ximin, ximax, xjmin, xjmax float64)

ModelClass plots contour indicating model Classes

func (*Plotter) ModelClassOneVsAll

func (o *Plotter) ModelClassOneVsAll(models []fun.Sv, iFeature, jFeature int, ximin, ximax, xjmin, xjmax float64)

ModelClassOneVsAll plots each Model prediction using 1 = this model, 0 = other models

func (*Plotter) ModelY

func (o *Plotter) ModelY(model fun.Sv, iFeature int, xmin, xmax float64)

ModelY plots model y values

type PolyDataMapper

type PolyDataMapper struct {
	// contains filtered or unexported fields
}

PolyDataMapper maps features to expanded polynomial

func NewPolyDataMapper

func NewPolyDataMapper(nOriFeatures, iFeature, jFeature, degree int) (o *PolyDataMapper)

NewPolyDataMapper returns a new object

func (*PolyDataMapper) GetMapped

func (o *PolyDataMapper) GetMapped(XYraw [][]float64, useY bool) (data *Data)

GetMapped returns a new Regression data with mapped/augmented X values

func (*PolyDataMapper) Map

func (o *PolyDataMapper) Map(x, xRaw la.Vector)

Map maps xRaw into x and ignores y[:] = xyRaw[len(xyRaw)-1]

Input:
  xRaw -- array with x values
Output:
  x -- pre-allocated vector such that len(x) = nFeatures

func (*PolyDataMapper) NumExtraFeatures

func (o *PolyDataMapper) NumExtraFeatures() int

NumExtraFeatures returns the number of extra features added by this mapper

func (*PolyDataMapper) NumOriginalFeatures

func (o *PolyDataMapper) NumOriginalFeatures() int

NumOriginalFeatures returns the number of original features, before mapping/augmentation

type Stat

type Stat struct {
	MinX  []float64 // [nFeatures] min x values
	MaxX  []float64 // [nFeatures] max x values
	SumX  []float64 // [nFeatures] sum of x values
	MeanX []float64 // [nFeatures] mean of x values
	SigX  []float64 // [nFeatures] standard deviations of x
	DelX  []float64 // [nFeatures] difference: max(x) - min(x)
	MinY  float64   // min of y values
	MaxY  float64   // max of y values
	SumY  float64   // sum of y values
	MeanY float64   // mean of y values
	SigY  float64   // standard deviation of y
	DelY  float64   // difference: max(y) - min(y)
	// contains filtered or unexported fields
}

Stat holds statistics about data

NOTE: Stat is an Observer of Data; thus, data.NotifyUpdate() will recompute stat

func NewStat

func NewStat(data *Data) (o *Stat)

NewStat returns a new Stat object

func (*Stat) CopyInto

func (o *Stat) CopyInto(p *Stat)

CopyInto copies stat into p

func (*Stat) SumVars

func (o *Stat) SumVars() (s la.Vector, t float64)

SumVars computes the sums along the columns of X and y

Output:
  t -- scalar t = oᵀy  sum of columns of the y vector: t = Σ_i^m o_i y_i
  s -- vector s = Xᵀo  sum of columns of the X matrix: s_j = Σ_i^m o_i X_ij  [nFeatures]

func (*Stat) Update

func (o *Stat) Update()

Update compute statistics for given data (an Observer of Data)

Directories

Path Synopsis
Package imgd (image-data) adds functionality to process image-data, e.g.
Package imgd (image-data) adds functionality to process image-data, e.g.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL