sbr-go: github.com/maciejkula/sbr-go Index | Files | Directories

package sbr

import "github.com/maciejkula/sbr-go"

A recommender system package for Go.

Sbr implements cutting-edge sequence-based recommenders: for every user, we examine what they have interacted up to now to predict what they are going to consume next.

Implemented models: - LSTM: a model that uses an LSTM network over the sequence of a user's interaction

to predict their next action;

- EWMA: a model that uses a simpler exponentially-weighted average of past actions

to predict future interactions.

Which model performs the best will depend on your dataset. The EWMA model is much quicker to fit, and will probably be a good starting point.

Usage

You can fit a model on the Movielens 100K dataset in about 10 seconds using the following (taken from https://github.com/maciejkula/sbr-go/blob/master/examples/movielens/main.go):

 import (
	 "fmt"
	 "math/rand"

	 sbr "github.com/maciejkula/sbr-go"
 )

 data, err := sbr.GetMovielens()
 if err != nil {
     panic(err)
 }
 fmt.Printf("Loaded movielens data: %v users and %v items for a total of %v interactions\n",
     data.NumUsers(), data.NumItems(), data.Len())

 // Split into test and train.
 rng := rand.New(rand.NewSource(42))
 train, test := sbr.TrainTestSplit(data, 0.2, rng)
 fmt.Printf("Train len %v, test len %v\n", train.Len(), test.Len())

 // Instantiate the model.
 model := sbr.NewImplicitLSTMModel(train.NumItems())

 // Set the hyperparameters.
 model.ItemEmbeddingDim = 32
 model.LearningRate = 0.16
 model.L2Penalty = 0.0004
 model.NumEpochs = 10
 model.NumThreads = 1

 // Set random seed
 var randomSeed [16]byte
 for idx := range randomSeed {
     randomSeed[idx] = 42
 }
 model.RandomSeed = randomSeed

 // Fit the model.
 fmt.Printf("Fitting the model...\n")
 loss, err := model.Fit(&train)
 if err != nil {
     panic(err)
 }

 // And evaluate.
 fmt.Printf("Evaluating the model...\n")
 mrr, err := model.MRRScore(&test)
 if err != nil {
     panic(err)
 }
 fmt.Printf("Loss %v, MRR: %v\n", loss, mrr)

Installation

Run

go get github.com/maciejkula/sbr-go

followed by

make

in the installation directory. This wil download the package's native dependencies. On both OSX and Linux, the resulting binaries are fully statically linked, and you can deploy them like any other Go binary.

Index

Package Files

data.go sbr.go

Constants

const (
    // Bayesian personalised ranking loss.
    BPR Loss = 0
    // Pairwise hinge loss.
    Hinge Loss = 1
    // WARP loss. More accurate in most cases than
    // the other loss functions at the expense of
    // fitting speed.
    WARP Loss = 2
    // ADAM optimizer.
    Adam Optimizer = 0
    // Adagrad optimizer.
    Adagrad Optimizer = 1
)

type ImplicitEWMAModel Uses

type ImplicitEWMAModel struct {
    // Number of items in the model.
    NumItems int
    // Maximum sequence length to consider. Setting
    // this to lower values will yield models that
    // are faster to train and evaluate, but have
    // a shorter memory.
    MaxSequenceLength int
    // Dimension of item embeddings. Setting this to
    // higher values will yield models that are slower
    // to fit but are potentially more expressive (at
    // the risk of overfitting).
    ItemEmbeddingDim int
    // Initial learning rate.
    LearningRate float32
    // L2 penalty.
    L2Penalty float32
    // Number of threads to use for training.
    NumThreads int
    // Number of epochs to use for training. To run more epochs,
    // call the fit method multiple times.
    NumEpochs int
    // Type of loss function to use.
    Loss Loss
    // Optimizer to use.
    Optimizer  Optimizer
    RandomSeed [16]byte
    // contains filtered or unexported fields
}

An implicit-feedback EWMA-based sequence model.

func NewImplicitEWMAModel Uses

func NewImplicitEWMAModel(numItems int) *ImplicitEWMAModel

Build a new model with a capacity to represent a certain number of items. In order to avoid leaking memory, the model must be freed usint its Free method once no longer in use.

func (*ImplicitEWMAModel) Fit Uses

func (self *ImplicitEWMAModel) Fit(data *Interactions) (float32, error)

Fit the model on the supplied data, returning the loss value after fitting. Calling this multiple times will resume training.

func (*ImplicitEWMAModel) Free Uses

func (self *ImplicitEWMAModel) Free()

Free the memory associated with the underlying model.

Unlike other methods of the model, calling Free is _not_ thread safe. Use an external synchronisation method when freeing a model used from multiple goroutines.

func (*ImplicitEWMAModel) MRRScore Uses

func (self *ImplicitEWMAModel) MRRScore(data *Interactions) (float32, error)

Compute the mean reciprocal rank score of the model on supplied interaction data.

Higher MRR values reflect better predictive performance of the model. The score is calculated by taking all but the last interactions of all users as their history, then making predictions for the last item they are going to see.

func (*ImplicitEWMAModel) MarshalBinary Uses

func (self *ImplicitEWMAModel) MarshalBinary() ([]byte, error)

Serialize the model into a byte array. Satisfies the encoding.BinaryMarshaler interface.

func (*ImplicitEWMAModel) Predict Uses

func (self *ImplicitEWMAModel) Predict(interactionHistory []int, itemsToScore []int) ([]float32, error)

Make predictions. Provides scores for itemsToScore for a user who has seen interactionHistory items. Items in the history argument should be arranged chronologically, from the earliest seen item to the latest seen item.

Returns a slice of scores for the supplied items, where a higher score indicates a better recommendation.

func (*ImplicitEWMAModel) UnmarshalBinary Uses

func (self *ImplicitEWMAModel) UnmarshalBinary(data []byte) error

Deserialize the model from a byte array. Satisfies the encoding.BinaryUnmarshaler interface.

type ImplicitLSTMModel Uses

type ImplicitLSTMModel struct {
    // Number of items in the model.
    NumItems int
    // Maximum sequence length to consider. Setting
    // this to lower values will yield models that
    // are faster to train and evaluate, but have
    // a shorter memory.
    MaxSequenceLength int
    // Dimension of item embeddings. Setting this to
    // higher values will yield models that are slower
    // to fit but are potentially more expressive (at
    // the risk of overfitting).
    ItemEmbeddingDim int
    // Initial learning rate.
    LearningRate float32
    // L2 penalty.
    L2Penalty float32
    // Whether the LSTM should use coupled forget and update
    // gates, yielding a model that's faster to train.
    Coupled bool
    // Number of threads to use for training.
    NumThreads int
    // Number of epochs to use for training. To run more epochs,
    // call the fit method multiple times.
    NumEpochs int
    // Type of loss function to use.
    Loss Loss
    // Optimizer to use.
    Optimizer  Optimizer
    RandomSeed [16]byte
    // contains filtered or unexported fields
}

An implicit-feedback LSTM-based sequence model.

func NewImplicitLSTMModel Uses

func NewImplicitLSTMModel(numItems int) *ImplicitLSTMModel

Build a new model with a capacity to represent a certain number of items. In order to avoid leaking memory, the model must be freed usint its Free method once no longer in use.

func (*ImplicitLSTMModel) Fit Uses

func (self *ImplicitLSTMModel) Fit(data *Interactions) (float32, error)

Fit the model on the supplied data, returning the loss value after fitting. Calling this multiple times will resume training.

func (*ImplicitLSTMModel) Free Uses

func (self *ImplicitLSTMModel) Free()

Free the memory associated with the underlying model.

Unlike other methods of the model, calling Free is _not_ thread safe. Use an external synchronisation method when freeing a model used from multiple goroutines.

func (*ImplicitLSTMModel) MRRScore Uses

func (self *ImplicitLSTMModel) MRRScore(data *Interactions) (float32, error)

Compute the mean reciprocal rank score of the model on supplied interaction data.

Higher MRR values reflect better predictive performance of the model. The score is calculated by taking all but the last interactions of all users as their history, then making predictions for the last item they are going to see.

func (*ImplicitLSTMModel) MarshalBinary Uses

func (self *ImplicitLSTMModel) MarshalBinary() ([]byte, error)

Serialize the model into a byte array. Satisfies the encoding.BinaryMarshaler interface.

func (*ImplicitLSTMModel) Predict Uses

func (self *ImplicitLSTMModel) Predict(interactionHistory []int, itemsToScore []int) ([]float32, error)

Make predictions. Provides scores for itemsToScore for a user who has seen interactionHistory items. Items in the history argument should be arranged chronologically, from the earliest seen item to the latest seen item.

Returns a slice of scores for the supplied items, where a higher score indicates a better recommendation.

func (*ImplicitLSTMModel) UnmarshalBinary Uses

func (self *ImplicitLSTMModel) UnmarshalBinary(data []byte) error

Deserialize the model from a byte array. Satisfies the encoding.BinaryUnmarshaler interface.

type Indexer Uses

type Indexer struct {
    // contains filtered or unexported fields
}

Helper for translating user and item ids into contiguous indices.

func NewIndexer Uses

func NewIndexer() Indexer

Build a new indexer.

func (*Indexer) Add Uses

func (self *Indexer) Add(id string) int

Add a new id to the indexer, returning its model index.

func (*Indexer) GetId Uses

func (self *Indexer) GetId(idx int) (string, bool)

Get the id from a model index.

type Interactions Uses

type Interactions struct {
    // contains filtered or unexported fields
}

Contains interactons for training the model.

func GetMovielens Uses

func GetMovielens() (*Interactions, error)

Download and return the Movielens 100K dataset.

func NewInteractions Uses

func NewInteractions(numUsers int, numItems int) Interactions

Construct new empty interactions.

func TrainTestSplit Uses

func TrainTestSplit(data *Interactions, testFraction float64, rng *rand.Rand) (Interactions, Interactions)

Split the interaction data into training and test sets. The data is split so that there is no overlap between users in training and test sets, making perfomance evaluation reflect the model's perfomance on entirely new users.

Returns a tuple of (training, test) data.

func (*Interactions) Append Uses

func (self *Interactions) Append(userId int, itemId int, timestamp int)

Add a (user, item, timestamp) triple to the dataset.

func (*Interactions) Len Uses

func (self *Interactions) Len() int

Return the number of interactions.

func (*Interactions) NumItems Uses

func (self *Interactions) NumItems() int

Get the total number of distinct items in the data.

func (*Interactions) NumUsers Uses

func (self *Interactions) NumUsers() int

Get the total number of distinct users in the data.

type Loss Uses

type Loss int

type Optimizer Uses

type Optimizer int

Directories

PathSynopsis
build
examples/movielens

Package sbr imports 16 packages (graph) and is imported by 1 packages. Updated 2018-07-30. Refresh now. Tools for package owners.