gorse: github.com/zhenghaoz/gorse/model Index | Files

package model

import "github.com/zhenghaoz/gorse/model"

Package model provides models for item rating and ranking.

There are two kinds of models: rating model and ranking model. Although rating models could be used for ranking, performance won't be guaranteed and even won't make sense, vice versa.

* Item rating models include: Random, Baseline, SVD(optimizer=Regression), SVD++, NMF, KNN, SlopeOne, CoClustering
* Item ranking models includes: ItemPop, WRMF, SVD(optimizer=BPR)

Index

Package Files

base.go co_clustering.go doc.go fm.go knn.go slope_one.go svd.go

type Base Uses

type Base struct {
    Params      base.Params   // Hyper-parameters
    UserIndexer *base.Indexer // Users' ID set
    ItemIndexer *base.Indexer // Items' ID set
    // contains filtered or unexported fields
}

Base model must be included by every recommendation model. Hyper-parameters, ID sets, random generator and fitting options are managed the Base model.

func (*Base) Fit Uses

func (model *Base) Fit(trainSet core.DataSet, options *base.RuntimeOptions)

Fit has not been implemented,

func (*Base) GetParams Uses

func (model *Base) GetParams() base.Params

GetParams returns all hyper-parameters.

func (*Base) Init Uses

func (model *Base) Init(trainSet core.DataSetInterface)

Init the Base model. The method must be called at the beginning of Fit.

func (*Base) Predict Uses

func (model *Base) Predict(userId, itemId int) float64

Predict has not been implemented.

func (*Base) SetParams Uses

func (model *Base) SetParams(params base.Params)

SetParams sets hyper-parameters for the Base model.

type BaseLine Uses

type BaseLine struct {
    Base
    UserBias   []float64 // b_u
    ItemBias   []float64 // b_i
    GlobalBias float64   // mu
    // contains filtered or unexported fields
}

BaseLine predicts the rating for given user and item by

\hat{r}_{ui} = b_{ui} = μ + b_u + b_i

If user u is unknown, then the Bias b_u is assumed to be zero. The same applies for item i with b_i. Hyper-parameters:

Reg         - The regularization parameter of the cost function that is
            optimized. Default is 0.02.
Lr          - The learning rate of SGD. Default is 0.005.
NEpochs     - The number of iteration of the SGD procedure. Default is 20.
RandomState - The random seed. Default is 0.

func NewBaseLine Uses

func NewBaseLine(params base.Params) *BaseLine

NewBaseLine creates a baseline model.

func (*BaseLine) Fit Uses

func (baseLine *BaseLine) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the BaseLine model.

func (*BaseLine) Predict Uses

func (baseLine *BaseLine) Predict(userId, itemId int) float64

Predict by the BaseLine model.

func (*BaseLine) SetParams Uses

func (baseLine *BaseLine) SetParams(params base.Params)

SetParams sets hyper-parameters for the BaseLine model.

type CoClustering Uses

type CoClustering struct {
    Base
    GlobalMean       float64     // A^{global}
    UserMeans        []float64   // A^{R}
    ItemMeans        []float64   // A^{R}
    UserClusters     []int       // p(i)
    ItemClusters     []int       // y(i)
    UserClusterMeans []float64   // A^{RC}
    ItemClusterMeans []float64   // A^{CC}
    CoClusterMeans   [][]float64 // A^{COC}
    // contains filtered or unexported fields
}

CoClustering [5] is a novel collaborative filtering approach based on weighted co-clustering algorithm that involves simultaneous clustering of users and items.

Let U={u_i}^m_{i=1} be the set of users such that |U|=m and P={p_j}^n_{j=1} be the set of items such that |P|=n. Let A be the m x n ratings matrix such that A_{ij} is the rating of the user u_i for the item p_j. The approximate matrix \hat{A}_{ij} is given by

\hat{A}_{ij} = A^{COC}_{gh} + (A^R_i - A^{RC}_g) + (A^C_j - A^{CC}_h)

where g=ρ(i), h=γ(j) and A^R_i, A^C_j are the average ratings of user u_i and item p_j, and A^{COC}_{gh}, A^{RC}_g and A^{CC}_h are the average ratings of the corresponding co-cluster, user-cluster and item-cluster respectively.

Hyper-parameters:

NEpochs       - The number of iterations of the optimization procedure. Default is 20.
NUserClusters - The number of user clusters. Default is 3.
NItemClusters - The number of item clusters. Default is 3.
RandomState   - The random seed. Default is 0.

func NewCoClustering Uses

func NewCoClustering(params base.Params) *CoClustering

NewCoClustering creates a CoClustering model.

func (*CoClustering) Fit Uses

func (coc *CoClustering) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the CoClustering model.

func (*CoClustering) Predict Uses

func (coc *CoClustering) Predict(userId, itemId int) float64

Predict by the CoClustering model.

func (*CoClustering) SetParams Uses

func (coc *CoClustering) SetParams(params base.Params)

SetParams sets hyper-parameters for the CoClustering model.

type FM Uses

type FM struct {
    Base
    UserFeatures []*base.SparseVector
    ItemFeatures []*base.SparseVector
    // Model parameters
    GlobalBias float64     // w_0
    Bias       []float64   // w_i
    Factors    [][]float64 // v_i

    // Fallback model
    UserRatings []*base.MarginalSubSet
    ItemPop     *ItemPop
    // contains filtered or unexported fields
}

FM is the implementation of factorization machine [12]. The prediction is given by

\hat y(x) = w_0 + \sum^n_{i=1} w_i x_i + \sum^n_{i=1} \sum^n_{j=i+1} <v_i, v_j>x_i x_j

Hyper-parameters:

 Reg 		- The regularization parameter of the cost function that is
			  optimized. Default is 0.02.
 Lr 		- The learning rate of SGD. Default is 0.005.
 nFactors	- The number of latent factors. Default is 100.
 NEpochs	- The number of iteration of the SGD procedure. Default is 20.
 InitMean	- The mean of initial random latent factors. Default is 0.
 InitStdDev	- The standard deviation of initial random latent factors. Default is 0.1.

func NewFM Uses

func NewFM(params base.Params) *FM

NewFM creates a factorization machine.

func (*FM) Fit Uses

func (fm *FM) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the factorization machine.

func (*FM) Predict Uses

func (fm *FM) Predict(userId int, itemId int) float64

Predict by the factorization machine.

func (*FM) SetParams Uses

func (fm *FM) SetParams(params base.Params)

SetParams sets hyper-parameters of the factorization machine.

type ItemPop Uses

type ItemPop struct {
    Base
    Pop []float64
}

ItemPop recommends items by their popularity. The popularity of a item is defined as the occurrence frequency of the item in the training data set.

func NewItemPop Uses

func NewItemPop(params base.Params) *ItemPop

NewItemPop creates an ItemPop model.

func (*ItemPop) Fit Uses

func (pop *ItemPop) Fit(set core.DataSetInterface, options *base.RuntimeOptions)

Fit the ItemPop model.

func (*ItemPop) Predict Uses

func (pop *ItemPop) Predict(userId, itemId int) float64

Predict by the ItemPop model.

type KNN Uses

type KNN struct {
    Base
    GlobalMean   float64
    SimMatrix    [][]float64
    LeftRatings  []*base.MarginalSubSet
    RightRatings []*base.MarginalSubSet
    UserRatings  []*base.MarginalSubSet
    LeftMean     []float64 // Centered KNN: user (item) Mean
    StdDev       []float64 // KNN with Z Score: user (item) standard deviation
    Bias         []float64 // KNN Baseline: Bias
    // contains filtered or unexported fields
}

KNN for collaborate filtering.

Type        - The type of KNN ('Basic', 'Centered', 'ZScore', 'Baseline').
                 Default is 'basic'.
Similarity  - The similarity function. Default is MSD.
UserBased      - User based or item based? Default is true.
K              - The maximum k neighborhoods to predict the rating. Default is 40.
MinK           - The minimum k neighborhoods to predict the rating. Default is 1.

func NewKNN Uses

func NewKNN(params base.Params) *KNN

NewKNN creates a KNN model.

func (*KNN) Fit Uses

func (knn *KNN) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the KNN model.

func (*KNN) Predict Uses

func (knn *KNN) Predict(userId, itemId int) float64

Predict by the KNN model.

func (*KNN) SetParams Uses

func (knn *KNN) SetParams(params base.Params)

SetParams sets hyper-parameters for the KNN model.

type NMF Uses

type NMF struct {
    Base
    GlobalMean float64     // the global mean of ratings
    UserFactor [][]float64 // p_u
    ItemFactor [][]float64 // q_i
    // contains filtered or unexported fields
}

NMF [3] is the Matrix Factorization process with non-negative latent factors. During the MF process, the non-negativity, which ensures good representativeness of the learnt model, is critically important. Hyper-parameters:

	 Reg      - The regularization parameter of the cost function that is
             optimized. Default is 0.06.
	 NFactors - The number of latent factors. Default is 15.
	 NEpochs  - The number of iteration of the SGD procedure. Default is 50.
	 InitLow  - The lower bound of initial random latent factor. Default is 0.
	 InitHigh - The upper bound of initial random latent factor. Default is 1.

func NewNMF Uses

func NewNMF(params base.Params) *NMF

NewNMF creates a NMF model.

func (*NMF) Fit Uses

func (nmf *NMF) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the NMF model.

func (*NMF) Predict Uses

func (nmf *NMF) Predict(userId int, itemId int) float64

Predict by the NMF model.

func (*NMF) SetParams Uses

func (nmf *NMF) SetParams(params base.Params)

SetParams sets hyper-parameters of the NMF model.

type SVD Uses

type SVD struct {
    Base
    // Model parameters
    UserFactor [][]float64 // p_u
    ItemFactor [][]float64 // q_i
    UserBias   []float64   // b_u
    ItemBias   []float64   // b_i
    GlobalMean float64     // mu

    // Fallback model
    UserRatings []*base.MarginalSubSet
    ItemPop     *ItemPop
    // contains filtered or unexported fields
}

SVD algorithm, as popularized by Simon Funk during the Netflix Prize. The prediction \hat{r}_{ui} is set as:

\hat{r}_{ui} = μ + b_u + b_i + q_i^Tp_u

If user u is unknown, then the Bias b_u and the factors p_u are assumed to be zero. The same applies for item i with b_i and q_i. Hyper-parameters:

  UseBias    - Add useBias in SVD model. Default is true.
	 Reg 		- The regularization parameter of the cost function that is
				  optimized. Default is 0.02.
	 Lr 		- The learning rate of SGD. Default is 0.005.
	 nFactors	- The number of latent factors. Default is 100.
	 NEpochs	- The number of iteration of the SGD procedure. Default is 20.
	 InitMean	- The mean of initial random latent factors. Default is 0.
	 InitStdDev	- The standard deviation of initial random latent factors. Default is 0.1.

func NewSVD Uses

func NewSVD(params base.Params) *SVD

NewSVD creates a SVD model.

func (*SVD) Fit Uses

func (svd *SVD) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the SVD model.

func (*SVD) Predict Uses

func (svd *SVD) Predict(userId int, itemId int) float64

Predict by the SVD model.

func (*SVD) SetParams Uses

func (svd *SVD) SetParams(params base.Params)

SetParams sets hyper-parameters of the SVD model.

type SVDpp Uses

type SVDpp struct {
    Base
    TrainSet   core.DataSetInterface
    UserFactor [][]float64 // p_u
    ItemFactor [][]float64 // q_i
    ImplFactor [][]float64 // y_i
    UserBias   []float64   // b_u
    ItemBias   []float64   // b_i
    GlobalMean float64     // mu
    // contains filtered or unexported fields
}

SVDpp (SVD++) [10] is an extension of SVD taking into account implicit interactions. The predicted \hat{r}_{ui} is:

\hat{r}_{ui} = \mu + b_u + b_i + q_i^T\left(p_u + |I_u|^{-\frac{1}{2}} \sum_{j \in I_u}y_j\right)

Where the y_j terms are a new set of item factors that capture implicit interactions. Here, an implicit rating describes the fact that a user u rated an item j, regardless of the rating value. If user u is unknown, then the bias b_u and the factors p_u are assumed to be zero. The same applies for item i with b_i, q_i and y_i. Hyper-parameters:

	 Reg        - The regularization parameter of the cost function that is
               optimized. Default is 0.02.
	 Lr         - The learning rate of SGD. Default is 0.007.
	 NFactors   - The number of latent factors. Default is 20.
	 NEpochs    - The number of iteration of the SGD procedure. Default is 20.
	 InitMean   - The mean of initial random latent factors. Default is 0.
	 InitStdDev - The standard deviation of initial random latent factors. Default is 0.1.

func NewSVDpp Uses

func NewSVDpp(params base.Params) *SVDpp

NewSVDpp creates a SVD++ model.

func (*SVDpp) Fit Uses

func (svd *SVDpp) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the SVD++ model.

func (*SVDpp) Predict Uses

func (svd *SVDpp) Predict(userId int, itemId int) float64

Predict by the SVD++ model.

func (*SVDpp) SetParams Uses

func (svd *SVDpp) SetParams(params base.Params)

SetParams sets hyper-parameters of the SVD++ model.

type SlopeOne Uses

type SlopeOne struct {
    Base
    GlobalMean  float64                // Mean of ratings in training set
    UserRatings []*base.MarginalSubSet // Ratings by each user
    UserMeans   []float64              // Mean of each user's ratings
    Dev         [][]float64            // Deviations
}

SlopeOne [4] predicts ratings by the form f(x) = x + b, which precompute the average difference between the ratings of one item and another for users who rated both.

First, deviations between pairs of items are computed. Given a training set χ, and any two items j and i with ratings u_j and u_i respectively in some user evaluation u (annotated as u∈S_{j,i}(χ)), the average deviation of item i with respect to item j is computed by:

dev_{j,i} = \sum_{u∈S_{j,i}(χ)} \frac{u_j-u_i} {card(S_{j,i}(χ)}

The computation on deviations could be parallelized.

In the predicting stage, Given that dev_{j,i} + u_i is a prediction for u_j given u_i, a reasonable predictor might be the average of all such predictions

P(u)_j = \frac{1}{card(R_j) \sum_{i∈R_j}(dev_{j,i} + u_i)

where R_j = {i|i ∈ S(u), i \ne j, card(S_{j,i}(χ)) > 0} is the set of all relevant items. The subset of the set of items consisting of all those items which are rated in u is S(u).

func NewSlopOne Uses

func NewSlopOne(params base.Params) *SlopeOne

NewSlopOne creates a SlopeOne model.

func (*SlopeOne) Fit Uses

func (so *SlopeOne) Fit(trainSet core.DataSetInterface, setters *base.RuntimeOptions)

Fit the SlopeOne model.

func (*SlopeOne) Predict Uses

func (so *SlopeOne) Predict(userId, itemId int) float64

Predict by the SlopeOne model.

type WRMF Uses

type WRMF struct {
    Base
    // Model parameters
    UserFactor *mat.Dense // p_u
    ItemFactor *mat.Dense // q_i

    // Fallback model
    UserRatings []*base.MarginalSubSet
    ItemPop     *ItemPop
    // contains filtered or unexported fields
}

WRMF [7] is the Weighted Regularized Matrix Factorization, which exploits unique properties of implicit feedback datasets. It treats the data as indication of positive and negative preference associated with vastly varying confidence levels. This leads to a factor model which is especially tailored for implicit feedback recommenders. Authors also proposed a scalable optimization procedure, which scales linearly with the data size. Hyper-parameters:

NFactors   - The number of latent factors. Default is 10.
NEpochs    - The number of training epochs. Default is 50.
InitMean   - The mean of initial latent factors. Default is 0.
InitStdDev - The standard deviation of initial latent factors. Default is 0.1.
Reg        - The strength of regularization.

func NewWRMF Uses

func NewWRMF(params base.Params) *WRMF

NewWRMF creates a WRMF model.

func (*WRMF) Fit Uses

func (mf *WRMF) Fit(set core.DataSetInterface, options *base.RuntimeOptions)

Fit the WRMF model.

func (*WRMF) Predict Uses

func (mf *WRMF) Predict(userId, itemId int) float64

Predict by the WRMF model.

func (*WRMF) SetParams Uses

func (mf *WRMF) SetParams(params base.Params)

SetParams sets hyper-parameters for the WRMF model.

Package model imports 8 packages (graph) and is imported by 4 packages. Updated 2019-06-04. Refresh now. Tools for package owners.