validation

package
v0.0.0-...-ebe581b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 22, 2024 License: Apache-2.0 Imports: 9 Imported by: 2

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func KFoldsSplit

func KFoldsSplit(fileRows [][]string, k int) ([][][]string, error)

KFoldsSplit divides the file into `k` parts directly. k is the number of parts that only could be 5 or 10. The first row of `fileRows` contains just names of feature, and it should be kept in all parts of return.

func LooSplit

func LooSplit(fileRows [][]string, idName string) ([][][]string, error)

LooSplit sorts file rows by IDs which extracted from file by `idName`, then divides each row into a subset.

func ShuffleKFoldsSplit

func ShuffleKFoldsSplit(fileRows [][]string, idName string, k int, seed string) ([][][]string, error)

ShuffleKFoldsSplit sorts file rows by IDs which extracted from file by `idName`, and shuffles the sorted rows, then divides the file into `k` parts. k is the number of parts that only could be 5 or 10.

func ShuffleSplit

func ShuffleSplit(fileRows [][]string, idName string, percents int, seed string) ([2][][]string, error)

ShuffleSplit sorts file rows by IDs which extracted from file by `idName`, and shuffles the sorted rows, then divides the file into two parts based on `percents` which denotes the first part of return.

func Split

func Split(fileRows [][]string, percents int) ([2][][]string, error)

Split divides the file into two parts directly based on percentage which denotes the first part of return. The first row of `fileRows` contains just names of feature, and it should be kept in both parts of return

Types

type BinClassValidation

type BinClassValidation interface {
	// Splitter divides data set into several subsets with some strategies (such as KFolds, LOO),
	// and hold out one subset as validation set and others as training set
	Splitter

	// SetPredictOut sets predicted probabilities from a prediction set to which `idx` refers.
	SetPredictOut(idx int, predProbas []float64) error

	// GetAllPredictOuts returns all prediction results has been stored.
	GetAllPredictOuts() map[int][]string

	// GetAccuracy returns classification accuracy.
	// idx is the index of prediction set (also of validation set) in split folds.
	GetAccuracy(idx int) (float64, error)

	// GetAllAccuracy returns scores of classification accuracy over all split folds,
	// and its Mean and Standard Deviation.
	GetAllAccuracy() (map[int]float64, float64, float64, error)

	// GetReport returns a json bytes of precision, recall, f1, true positive,
	// false positive, true negatives and false negatives for each class, and accuracy.
	GetReport(idx int) ([]byte, error)

	// GetReport returns a json bytes of precision, recall, f1, true positive,
	// false positive, true negatives and false negatives for each class, and accuracy, over all split folds.
	GetOverallReport() (map[int][]byte, error)

	// GetROCAndAUC returns a json bytes of roc's points and auc.
	GetROCAndAUC(idx int) ([]byte, error)

	// GetAllROCAndAUC returns a map contains all split folds' json bytes of roc and auc.
	GetAllROCAndAUC() (map[int][]byte, error)
}

BinClassValidation performs validation of Binary Classfication case

func NewBinClassValidation

func NewBinClassValidation(file [][]string, label string, idName string,
	posClass string, negClass string, threshold float64) (BinClassValidation, error)

NewBinClassValidation creates a BinClassValidation instance to handle binary classification validation. file contains all rows of a file,

and its first row contains just names of feature, and others contain all feature values

idName denotes which feature is ID that would be used in sample alignment label denotes name of lable feature posClass denotes name of positive class and must be one feature name in `file` negClass denotes name of negtive class, could be set with empty string

type RegressionValidation

type RegressionValidation interface {
	// Splitter divides data set into several subsets with some strategies (such as KFolds, LOO),
	// and hold out one subset as validation set and others as training set
	Splitter

	// SetPredictOut sets prediction outcomes from a prediction set to which `idx` refers.
	SetPredictOut(idx int, yPred []float64) error

	// GetAllPredictOuts returns all prediction results has been stored.
	GetAllPredictOuts() map[int][]float64

	// GetRMSE returns RMSE over the validation set to which `idx` refers.
	GetRMSE(idx int) (float64, error)

	// GetAllRMSE returns scores of RMSE over all split folds,
	// and its Mean and Standard Deviation.
	GetAllRMSE() (map[int]float64, float64, float64, error)
}

RegressionValidation performs validation of Regression case

func NewRegressionValidation

func NewRegressionValidation(file [][]string, label string, idName string) (RegressionValidation, error)

NewRegressionValidation creates a RegressionValidation instance to handle regression validation. file contains all rows of a file,

and its first row contains just names of feature, and others contain all feature values

idName denotes which feature is ID that would be used in sample alignment

type Splitter

type Splitter interface {
	// Split divides the file into two parts directly
	// based on percentage which denotes the first part of divisions.
	Split(percents int) error

	// ShuffleSplit shuffles the rows with `seed`,
	// then divides the file into two parts
	// based on `percents` which denotes the first part of divisions.
	ShuffleSplit(percents int, seed string) error

	// KFoldsSplit divides the file into `k` parts directly.
	// k is the number of parts that only could be 5 or 10.
	KFoldsSplit(k int) error

	// ShuffleKFoldsSplit shuffles the sorted rows with `seed`,
	// then divides the file into `k` parts.
	// k is the number of parts that only could be 5 or 10.
	ShuffleKFoldsSplit(k int, seed string) error

	// LooSplit sorts file rows by IDs which extracted from file by `idName`,
	// then divides each row into a subset.
	LooSplit() error

	// GetAllFolds returns all folds after split.
	// And could be only called successfully after split.
	GetAllFolds() ([][][]string, error)

	// GetTrainSet holds out the subset to which refered by `idxHO`
	// and returns the remainings as training set.
	GetTrainSet(idxHO int) ([][]string, error)

	// GetPredictSet returns the subset to which refered by `idx`
	// as predicting set (without label feature).
	GetPredictSet(idx int) ([][]string, error)

	// GetPredictSet returns the subset to which refered by `idx`
	// as validation set.
	GetValidSet(idx int) ([][]string, error)
}

Splitter divides data set into several subsets with some strategies (such as KFolds, LOO), and hold out one subset as validation set and others as training set

func NewSplitter

func NewSplitter(file [][]string, idName string, label string) Splitter

NewSplitter creates a Splitter instance. file contains all rows of a file,

and its first row contains just names of feature, and others contain all feature values.

idName denotes which feature is ID that would be used in sample alignment. label denotes name of lable feature.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL