gorse: github.com/zhenghaoz/gorse/core Index | Files

package core

import "github.com/zhenghaoz/gorse/core"

Package core provides core components for gorse.

Core components include:

* Dataset: used to train and test models.
* Splitter: used to split dataset.
* Evaluator: evaluate models.
* Validation: cross validation.

Index

Package Files

built_in.go data.go doc.go evaluator.go model.go ranking.go splitter.go validation.go

Variables

var (
    GorseDir   string
    DataSetDir string
    TempDir    string
)

The Data directories

func Copy Uses

func Copy(dst, src interface{}) error

Copy a object from src to dst.

func EvaluateAUC Uses

func EvaluateAUC(estimator ModelInterface, testSet, excludeSet DataSetInterface) float64

EvaluateAUC evaluates a model by AUC.

func EvaluateRank Uses

func EvaluateRank(estimator ModelInterface, testSet DataSetInterface, excludeSet DataSetInterface, n int, metrics ...RankMetric) []float64

EvaluateRank evaluates a model in top-n tasks.

func EvaluateRating Uses

func EvaluateRating(estimator ModelInterface, testSet DataSetInterface, metrics ...RatingMetric) []float64

EvaluateRating evaluates a model in rating prediction tasks.

func Items Uses

func Items(dataSet ...DataSetInterface) map[int]bool

Items gets all items from the test set and the training set.

func LoadEntityFromCSV Uses

func LoadEntityFromCSV(filePath string, fieldSep string, tagSep string, header bool,
    names []string, index int) []map[string]interface{}

LoadEntityFromCSV load entities (items or users) from a csv file.

func MAE Uses

func MAE(groundTruth []float64, prediction []float64) float64

MAE is mean absolute error.

func MAP Uses

func MAP(targetSet *base.MarginalSubSet, rankList []int) float64

MAP means Mean Average Precision. mAP: http://sdsawtelle.github.io/blog/output/mean-average-precision-MAP-for-recommender-systems.html

func MRR Uses

func MRR(targetSet *base.MarginalSubSet, rankList []int) float64

MRR means Mean Reciprocal Rank.

The mean reciprocal rank is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer: 1 for first place, ​1⁄2 for second place, ​1⁄3 for third place and so on. The mean reciprocal rank is the average of the reciprocal ranks of results for a sample of queries Q:

MRR = \frac{1}{Q} \sum^{|Q|}_{i=1} \frac{1}{rank_i}

func NDCG Uses

func NDCG(targetSet *base.MarginalSubSet, rankList []int) float64

NDCG means Normalized Discounted Cumulative Gain.

func Neighbors Uses

func Neighbors(dataSet DataSetInterface, itemId int, n int, similarity base.FuncSimilarity) ([]int, []float64)

Neighbors finds N nearest neighbors of a item. It returns a unordered slice of items (sparse ID) and corresponding similarities.

func Popular(dataSet DataSetInterface, n int) ([]int, []float64)

Popular finds popular items in the dataset.

func Precision Uses

func Precision(targetSet *base.MarginalSubSet, rankList []int) float64

Precision is the fraction of relevant items among the recommended items.

\frac{|relevant documents| \cap |retrieved documents|} {|{retrieved documents}|}

func RMSE Uses

func RMSE(groundTruth []float64, prediction []float64) float64

RMSE is root mean square error.

func Recall Uses

func Recall(targetSet *base.MarginalSubSet, rankList []int) float64

Recall is the fraction of relevant items that have been recommended over the total amount of relevant items.

\frac{|relevant documents| \cap |retrieved documents|} {|{relevant documents}|}

func Top Uses

func Top(items map[int]bool, userId int, n int, exclude *base.MarginalSubSet, model ModelInterface) ([]int, []float64)

Top gets the ranking

type CrossValidateResult Uses

type CrossValidateResult struct {
    TestScore []float64
    TestCosts []float64
}

CrossValidateResult contains the result of cross validate

func CrossValidate Uses

func CrossValidate(model ModelInterface, dataSet DataSetInterface, splitter Splitter, seed int64,
    options *base.RuntimeOptions, evaluators ...CrossValidationEvaluator) []CrossValidateResult

CrossValidate evaluates a model by k-fold cross validation.

func (CrossValidateResult) MeanAndMargin Uses

func (sv CrossValidateResult) MeanAndMargin() (float64, float64)

MeanAndMargin returns the mean and the margin of cross validation scores.

type CrossValidationEvaluator Uses

type CrossValidationEvaluator func(estimator ModelInterface, testSet, trainSet DataSetInterface) (scores, costs []float64)

CrossValidationEvaluator is the evaluator for cross-validation.

func NewRankEvaluator Uses

func NewRankEvaluator(n int, metrics ...RankMetric) CrossValidationEvaluator

NewRankEvaluator creates a evaluator for personalized ranking cross-validation.

func NewRatingEvaluator Uses

func NewRatingEvaluator(metrics ...RatingMetric) CrossValidationEvaluator

NewRatingEvaluator creates a evaluator for rating prediction cross-validation.

type DataSet Uses

type DataSet struct {
    // contains filtered or unexported fields
}

DataSet contains preprocessed data structures for recommendation models.

func LoadDataFromBuiltIn Uses

func LoadDataFromBuiltIn(dataSetName string) *DataSet

LoadDataFromBuiltIn loads a built-in Data set. Now support:

ml-100k   - MovieLens 100K
ml-1m     - MovieLens 1M
ml-10m    - MovieLens 10M
ml-20m    - MovieLens 20M
netflix   - Netflix
filmtrust - FlimTrust
epinions  - Epinions

func LoadDataFromCSV Uses

func LoadDataFromCSV(fileName string, sep string, hasHeader bool) *DataSet

LoadDataFromCSV loads Data from a CSV file. The CSV file should be:

[optional header]
<userId 1> <sep> <itemId 1> <sep> <rating 1> <sep> <extras>
<userId 2> <sep> <itemId 2> <sep> <rating 2> <sep> <extras>
<userId 3> <sep> <itemId 3> <sep> <rating 3> <sep> <extras>
...

For example, the `u.Data` from MovieLens 100K is:

196\t242\t3\t881250949
186\t302\t3\t891717742
22\t377\t1\t878887116

func LoadDataFromNetflix Uses

func LoadDataFromNetflix(fileName string, _ string, _ bool) *DataSet

LoadDataFromNetflix loads Data from the Netflix dataset. The file should be:

<itemId 1>:
<userId 1>, <rating 1>, <date>
<userId 2>, <rating 2>, <date>
<userId 3>, <rating 3>, <date>
...

func NewDataSet Uses

func NewDataSet(userIDs, itemIDs []int, ratings []float64) *DataSet

NewDataSet creates a data set.

func (*DataSet) Count Uses

func (set *DataSet) Count() int

Count returns the number of ratings.

func (*DataSet) FeatureCount Uses

func (set *DataSet) FeatureCount() int

FeatureCount returns the number of additional features.

func (*DataSet) Get Uses

func (set *DataSet) Get(i int) (int, int, float64)

Get the i-th record by <user ID, item ID, rating>.

func (*DataSet) GetWithIndex Uses

func (set *DataSet) GetWithIndex(i int) (int, int, float64)

GetWithIndex gets the i-th record by <user index, item index, rating>.

func (*DataSet) GlobalMean Uses

func (set *DataSet) GlobalMean() float64

GlobalMean computes the global mean of ratings.

func (*DataSet) Item Uses

func (set *DataSet) Item(itemId int) *base.MarginalSubSet

Item returns the subset of a item.

func (*DataSet) ItemByIndex Uses

func (set *DataSet) ItemByIndex(itemIndex int) *base.MarginalSubSet

ItemByIndex gets ratings of a item by index.

func (*DataSet) ItemCount Uses

func (set *DataSet) ItemCount() int

ItemCount returns the number of items.

func (*DataSet) ItemFeatures Uses

func (set *DataSet) ItemFeatures() []*base.SparseVector

ItemFeatures returns additional features of items.

func (*DataSet) ItemIndexer Uses

func (set *DataSet) ItemIndexer() *base.Indexer

ItemIndexer returns the item indexer.

func (*DataSet) Items Uses

func (set *DataSet) Items() []*base.MarginalSubSet

Items gets ratings of a item by index.

func (*DataSet) SetItemFeature Uses

func (set *DataSet) SetItemFeature(items []map[string]interface{}, features []string, idName string)

SetItemFeature sets features of items.

func (*DataSet) SetUserFeatures Uses

func (set *DataSet) SetUserFeatures(users []map[string]interface{}, features []string, idName string)

SetUserFeatures sets features of users.

func (*DataSet) SubSet Uses

func (set *DataSet) SubSet(subset []int) DataSetInterface

SubSet returns a subset of current dataset.

func (*DataSet) User Uses

func (set *DataSet) User(userId int) *base.MarginalSubSet

User returns the subset of a user.

func (*DataSet) UserByIndex Uses

func (set *DataSet) UserByIndex(userIndex int) *base.MarginalSubSet

UserByIndex gets ratings of a user by index.

func (*DataSet) UserCount Uses

func (set *DataSet) UserCount() int

UserCount returns the number of users.

func (*DataSet) UserFeatures Uses

func (set *DataSet) UserFeatures() []*base.SparseVector

UserFeatures returns additional features of users.

func (*DataSet) UserIndexer Uses

func (set *DataSet) UserIndexer() *base.Indexer

UserIndexer returns the user indexer.

func (*DataSet) Users Uses

func (set *DataSet) Users() []*base.MarginalSubSet

Users gets ratings of a user by index.

type DataSetInterface Uses

type DataSetInterface interface {
    // GlobalMean returns the global mean of ratings in the dataset.
    GlobalMean() float64
    // Count returns the number of ratings in the dataset.
    Count() int
    // UserCount returns the number of users in the dataset.
    UserCount() int
    // ItemCount returns the number of items in the dataset.
    ItemCount() int
    // FeatureCount returns the number of additional features.
    FeatureCount() int
    // Get i-th rating by (user ID, item ID, rating).
    Get(i int) (int, int, float64)
    // GetWithIndex gets i-th rating by (user index, item index, rating).
    GetWithIndex(i int) (int, int, float64)
    // UserIndexer returns the user indexer.
    UserIndexer() *base.Indexer
    // ItemIndexer returns the item indexer.
    ItemIndexer() *base.Indexer
    // SubSet gets a subset of current dataset.
    SubSet(subset []int) DataSetInterface
    // Users returns subsets of users.
    Users() []*base.MarginalSubSet
    // Items returns subsets of items.
    Items() []*base.MarginalSubSet
    // UserFeatures returns additional features of users.
    UserFeatures() []*base.SparseVector
    // ItemFeatures returns additional features of items.
    ItemFeatures() []*base.SparseVector
    // User returns the subset of a user.
    User(userId int) *base.MarginalSubSet
    // Item returns the subset of a item.
    Item(itemId int) *base.MarginalSubSet
    // UserByIndex returns the subset of a user by the index.
    UserByIndex(userIndex int) *base.MarginalSubSet
    // ItemByIndex returns the subset of a item by the index.
    ItemByIndex(itemIndex int) *base.MarginalSubSet
}

DataSetInterface is the interface for a dataset object.

func NewSubSet Uses

func NewSubSet(dataSet *DataSet, subset []int) DataSetInterface

NewSubSet creates a subset of a dataset.

func Split Uses

func Split(data DataSetInterface, testRatio float64) (train, test DataSetInterface)

Split dataset to a training set and a test set with ratio.

type ModelInterface Uses

type ModelInterface interface {
    // Set parameters.
    SetParams(params base.Params)
    // Get parameters.
    GetParams() base.Params
    // Predict the rating given by a user (userId) to a item (itemId).
    Predict(userId, itemId int) float64
    // Fit a model with a train set and parameters.
    Fit(trainSet DataSetInterface, options *base.RuntimeOptions)
}

ModelInterface is the interface for all models. Any model in this package should implement it.

type ModelSelectionResult Uses

type ModelSelectionResult struct {
    BestScore  float64
    BestCost   float64
    BestParams base.Params
    BestIndex  int
    CVResults  []CrossValidateResult
    AllParams  []base.Params
}

ModelSelectionResult contains the return of grid search.

func GridSearchCV Uses

func GridSearchCV(estimator ModelInterface, dataSet DataSetInterface, paramGrid ParameterGrid,
    splitter Splitter, seed int64, options *base.RuntimeOptions, evaluators ...CrossValidationEvaluator) []ModelSelectionResult

GridSearchCV finds the best parameters for a model.

func RandomSearchCV Uses

func RandomSearchCV(estimator ModelInterface, dataSet DataSetInterface, paramGrid ParameterGrid,
    splitter Splitter, trial int, seed int64, options *base.RuntimeOptions, evaluators ...CrossValidationEvaluator) []ModelSelectionResult

RandomSearchCV searches hyper-parameters by random.

type ParameterGrid Uses

type ParameterGrid map[base.ParamName][]interface{}

ParameterGrid contains candidate for grid search.

type RankMetric Uses

type RankMetric func(targetSet *base.MarginalSubSet, rankList []int) float64

RankMetric is used by evaluators in personalized ranking tasks.

type RatingMetric Uses

type RatingMetric func(groundTruth []float64, prediction []float64) float64

RatingMetric is used by evaluators in rating prediction tasks.

type Splitter Uses

type Splitter func(set DataSetInterface, seed int64) ([]DataSetInterface, []DataSetInterface)

Splitter split Data to train set and test set.

func NewKFoldSplitter Uses

func NewKFoldSplitter(k int) Splitter

NewKFoldSplitter creates a k-fold splitter.

func NewRatioSplitter Uses

func NewRatioSplitter(repeat int, testRatio float64) Splitter

NewRatioSplitter creates a ratio splitter.

func NewUserLOOSplitter Uses

func NewUserLOOSplitter(repeat int) Splitter

NewUserLOOSplitter creates a per-user leave-one-out Data splitter.

type SubSet Uses

type SubSet struct {
    *DataSet // the existed dataset.
    // contains filtered or unexported fields
}

SubSet creates a subset index over a existed dataset.

func (*SubSet) Count Uses

func (set *SubSet) Count() int

Count returns the number of ratings.

func (*SubSet) Get Uses

func (set *SubSet) Get(i int) (int, int, float64)

Get the i-th record by <user ID, item ID, rating>.

func (*SubSet) GetWithIndex Uses

func (set *SubSet) GetWithIndex(i int) (int, int, float64)

GetWithIndex gets the i-th record by <user index, item index, rating>.

func (*SubSet) GlobalMean Uses

func (set *SubSet) GlobalMean() float64

GlobalMean computes the global mean of ratings.

func (*SubSet) Item Uses

func (set *SubSet) Item(itemId int) *base.MarginalSubSet

Item returns ratings subset of a item.

func (*SubSet) ItemByIndex Uses

func (set *SubSet) ItemByIndex(itemIndex int) *base.MarginalSubSet

ItemByIndex gets ratings of a item by index.

func (*SubSet) Items Uses

func (set *SubSet) Items() []*base.MarginalSubSet

Items gets ratings of a item by index.

func (*SubSet) SubSet Uses

func (set *SubSet) SubSet(indices []int) DataSetInterface

SubSet returns a subset of current dataset.

func (*SubSet) User Uses

func (set *SubSet) User(userId int) *base.MarginalSubSet

User returns ratings subset of a user.

func (*SubSet) UserByIndex Uses

func (set *SubSet) UserByIndex(userIndex int) *base.MarginalSubSet

UserByIndex gets ratings of a user by index.

func (*SubSet) Users Uses

func (set *SubSet) Users() []*base.MarginalSubSet

Users gets ratings of a user by index.

Package core imports 19 packages (graph) and is imported by 6 packages. Updated 2019-06-03. Refresh now. Tools for package owners.