randomforest

package module
v2.0.0-...-a7fc2a4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2022 License: Apache-2.0 Imports: 10 Imported by: 0

README

Go

GoDoc: https://godoc.org/github.com/malaschitz/randomForest

This fork add Saving/Loading functions see the section Saving/Loading click on this link

Test:

go test ./... -cover -coverpkg=.  

randomForest

Random Forest implementation in golang.

Simple Random Forest

	xData := [][]float64{}
	yData := []int{}
	for i := 0; i < 1000; i++ {
		x := []float64{rand.Float64(), rand.Float64(), rand.Float64(), rand.Float64()}
		y := int(x[0] + x[1] + x[2] + x[3])
		xData = append(xData, x)
		yData = append(yData, y)
	}
	forest := randomForest.Forest{}		
	forest.Data = randomforest.ForestData{X: xData, Class: yData}
	forest.Train(1000)
	//test
	fmt.Println("Vote", forest.Vote([]float64{0.1, 0.1, 0.1, 0.1})) 
	fmt.Println("Vote", forest.Vote([]float64{0.9, 0.9, 0.9, 0.9}))

Extremely Randomized Trees

	forest.TrainX(1000)	

Deep Forest

Deep forest inspired by https://arxiv.org/abs/1705.07366

    dForest := forest.BuildDeepForest()
    dForest.Train(20, 100, 1000) //20 small forest with 100 trees help to build deep forest with 1000 trees

Continuos Random Forest

Continuos Random Forest for data where are still new and new data (forex, wheather, user logs, ...). New data create a new trees and oldest trees are removed.

forest := randomForest.Forest{}
data := []float64{rand.Float64(), rand.Float64()}
res := 1; //result
forest.AddDataRow(data, res, 1000, 10, 2000) 
// AddDataRow : add new row, trim oldest row if there is more than 1000 rows, calculate a new 10 trees, but remove oldest trees if there is more than 2000 trees.

Boruta Algorithm for feature selection

Boruta algorithm was developed as package for language R. It is one of most effective feature selection algorithm. There is paper in Journal of Statistical Software.

Boruta algorithm use random forest for selection important features.

	xData := ... //data
	yData := ... //labels
	selectedFeatures := randomforest.BorutaDefault(xData, yData)
	// or randomforest.BorutaDefault(xData, yData, 100, 20, 0.05, true, true)

In /examples is example with MNIST database. On picture are selected features (495 from 784) from images.

boruta 05

Saving/Loading

Saving

Will Save the forest structure into binary file

File name format:

forest-UUID[:8]+sha256.bin
	if fileName, err := forest.Save("saved/"); err != nil {
		t.Error(err)
		return
	}
Loading

Will load forest structure from binary file :

	if forest, errForest = Load("saved/forestTest.bin"); errForest != nil {
		return
	}
	
	fmt.Println("Vote", forest.Vote([]float64{0.9, 0.9, 0.9, 0.9}))

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	NumWorkers = runtime.NumCPU() // max number of concurrent goroutines during training
)

Functions

func Boruta

func Boruta(x [][]float64, class []int, trees int, cycles int, threshold float64, recursive bool, verbose bool) ([]int, map[int]int)

func BorutaDefault

func BorutaDefault(x [][]float64, class []int) ([]int, map[int]int)

Boruta is smart algorithm for select important features with Random Forest. It was developed in language R.

X [][]float64 - data for random forest. At least three features (columns) are required. Class []int - classes for random forest (0,1,..) trees int - number of trees used by Boruta algorithm. Is not need too big number of trees. (50-200) cycles int - number of cycles (20-50) of Boruta algorithm. threshold float64 - threshold for select feauters (0.05) recursive bool - algorithm repeat process until all features are important verbose bool - will print process of boruta algorithm.

Types

type Branch

type Branch struct {
	Attribute        int
	Value            float64
	IsLeaf           bool
	LeafValue        []float64
	Gini             float64
	GiniGain         float64
	Size             int
	Branch0, Branch1 *Branch
	Depth            int
}

Branch is tree structure of branches

type DeepForest

type DeepForest struct {
	Forest         *Forest
	ForestDeep     Forest
	Groves         []Forest
	NGroves        int
	NFeatures      int
	NTrees         int
	RandomFeatures [][]int
	ResultFeatures [][]float64
	Results        []float64
}

DeepForest deep forest implementation where is standard forest, mini forests (Groves) and final ForestDeep (Forest + Groves)

func BytesToDeepForest

func BytesToDeepForest(bytes []byte) (*DeepForest, error)

func LoadDeepForest

func LoadDeepForest(path string) (*DeepForest, error)

func (*DeepForest) Save

func (deepforest *DeepForest) Save(folder string, compress bool) (string, error)

func (*DeepForest) ToBytes

func (deepForest *DeepForest) ToBytes(compress bool) ([]byte, error)

func (*DeepForest) Train

func (dForest *DeepForest) Train(groves int, trees int, deepTrees int)

Train DeepForest with parameters of number of groves, number of trees in groves, number of trees in final Deep Forest

func (*DeepForest) Vote

func (dForest *DeepForest) Vote(x []float64) []float64

Vote return result of DeepForest

type Forest

type Forest struct {
	Data              ForestData // database for calculate trees
	Trees             []Tree     // all generated trees
	Features          int        // number of attributes
	Classes           int        // number of classes
	LeafSize          int        // leaf size
	MFeatures         int        // attributes for choose proper split
	NTrees            int        // number of trees
	NSize             int        // len of data
	MaxDepth          int        // max depth of forest
	FeatureImportance []float64  //stats of FeatureImportance
}

Forest je base class for whole forest with database, properties of Forest and trees.

func BytesToForest

func BytesToForest(bytes []byte) (*Forest, error)

func LoadForest

func LoadForest(path string) (*Forest, error)

func (*Forest) AddDataRow

func (forest *Forest) AddDataRow(data []float64, class int, max int, newTrees int, maxTrees int)

AddDataRow add new data data: new data row class: result max: max number of data. Remove first if there is more datas. If max < 1 - unlimited newTrees: number of trees after add data row maxTress: maximum number of trees

This feature support Continuous Random Forest

func (*Forest) BuildDeepForest

func (forest *Forest) BuildDeepForest() DeepForest

BuildDeepForest create DeepForest from Forest

func (*Forest) PrintFeatureImportance

func (forest *Forest) PrintFeatureImportance()

PrintFeatureImportance print list of features

func (*Forest) Save

func (forest *Forest) Save(folder string, compress bool) (string, error)

Save Will save the state of the forest into file

func (*Forest) ToBytes

func (forest *Forest) ToBytes(compress bool) ([]byte, error)

func (*Forest) Train

func (forest *Forest) Train(trees int)

Train run training process. Parameter is number of calculated trees.

func (*Forest) TrainX

func (forest *Forest) TrainX(trees int)

TrainX Extremely randomized trees

func (*Forest) Vote

func (forest *Forest) Vote(x []float64) []float64

Vote is used for calculate class in existed forest

func (*Forest) WeightVote

func (forest *Forest) WeightVote(x []float64) []float64

WeightVote use validation's weight for result

type ForestData

type ForestData struct {
	X     [][]float64 // All data are float64 numbers
	Class []int       // Result should be int numbers 0,1,2,..
}

ForestData contains database

type Tree

type Tree struct {
	Root       Branch
	Validation float64
}

Tree is one random tree in forest with Branch and validation number

Directories

Path Synopsis
img
tests
generator
Package generator is creting testing data for machine learning
Package generator is creting testing data for machine learning

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL