reco

command

v0.0.0-...-3e71865 Latest Latest Go to latest Published: May 21, 2014 License: BSD-3-Clause Imports: 19 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/akualab/occult

Links

Open Source Insights

README ¶

Example: Predict Movie Ratings using Collaborative Filtering

This examples implements various simple collaborative filtering algorithms. The purpose of the example is to show how to use Occult to implement machine learning algorithms.

Install

Install leveldb which is used to store the data set. I tested in MacOSX: brew install leveldb

# -u will update existing package, -x shows what's going on.
go get -u -x  github.com/akualab/occult/examples/reco

Finally, run the example on a single node:

cd MY_GOPATH/src/github.com/akualab/occult/examples/reco/
go build
reco

Now, let's try to run on two nodes. Open two terminals on the same machine and type in sequence:

# Starts a server with id node 1.
reco -config=reco-cluster.yaml -node=1 -server -v=2 -logtostderr

# Starts a client with id node 0.
reco -config=reco-cluster.yaml -node=0 -v=2 -logtostderr

Each node will start a server. Once the server is up, each node tries to establish a connection to the remote nodes. Finally, when all remote nodes are connected, the client (node=0) will start running the processors and assigning work to the nodes.

The config file reco-config.yaml provides the network address of the nodes. (Later we will add a cluster coordination layer to manage the nodes.)

app:
  name: "cf"
  cache_cap: 1000
cluster:
  name: "local"
  nodes:
    - id: 0
      addr: ":7000"
    - id: 1
      addr: ":7001"

To get a more verbose log, set v to a higher number. The option -v=5 will dump a lot of data. See glog for more logging command options.

The server node (node=1) waits for work requests from the client. The client (node=0) coordinates the work according to the router algorithm. For now, we use a trivial round-robin router that assigns block of keys to each node. Note that the client is also doing work in this case. Should be possible to set up nodes to be servers or clients or both. Even multiple clients running the same program should work.

To remove the data set, run: rm -rf out.

Walkthrough

reco.go

This file has the entry point for the program main() and calls the top level functions:

Download data set (if first time).
Build the leveldb databases (for train and test).
Train model.
Evaluate predictions on a disjoint data set.

train.go

This file shows you how to use occult to implement the trainer.

Reads data from database as needed.
Computes various aggregations.
Trains a simple matrix factorization algorithms using gradient descent.
Returns the model in data struct CF.

NOTE: the processor output is of type occult.Value which is an interface. To communicate across nodes, we send the values using GOB encoding. For GOB to work with interfaces, we need to register these custom types. (There is no error when the type is not registered. Something that can result in many hours of debugging!)

eval.go

Runs the evaluation using the CF data struct trained by trainer and prints results.

cf.go

Defines various data structures and methods that are used to train and evaluate results.

data.go

Low level details to download data, build DB, and query values.

Compare concurrent computation (single node).

The Processor method MapAll() executes the process using runtime.NumCPU() workers.

Task: ratings distribution

Host: MacBook Air

Concurrent   Num Procs     Num Workers    Time
No           2             NA             1.85 s
Yes          2             2              1.20 s (35% speedup)

Test train times using LRU cache. As you can see, without the cache, the processing would be about two orders of magnitude slower. (Not a fair comparison, though, global values are computed in a first pass reading all the data, with no cache the program needs to re-compute global values every time they are needed.) The advantage, however, is that adding more cache saves you from having to optimize manually.

Notes:
- leveldb cache disabled to simulate slow data access.
- cache size is based on num elements (not memory usage)

Chunk Size   Cache Cap     Time
200          c <= 300      ~5 minutes
200          c >= 400      ~7 seconds
100          400           ~5 minutes

Experiment on Intel i7 with 4 cores (8 logical cores):

num_workers block_size  go_max_proc num_nodes   time    improv.
----------------------------------------------------------------
1           1           1           1           5.4     baseline
2           1           1           1           4.4     18%
4           1           1           1           5.4
2           1           2           1           4.5
2           1           4           1           4.5
4           1           4           1           4.5
8           1           8           1           4.6
----------------------------------------------------------------
1           1           1           2           8.6     -6%
2           1           1           2           7.6
2           1           2           2           5.5
4           1           4           2           3.9
8           1           8           2           3.2
8           4           8           2           3.1
8           8           8           2           3.0*    44%
8           12          8           2           3.1
----------------------------------------------------------------

num_workers: concurrent workers when executing a map operation.
block_size: number of values sent as a batch across nodes in the cluster.
go_max_proc: num processes in Go runtime.
num_nodes: compare single node vs cluster with two nodes on the same host.
time: time that takes to compute the correct result (excluding initialization)
best performance on single node is 4.4 due to using concurrent workers.
adding a node and running with baseline config degrades perf. by 6%
best perf. on cluster is 3.0 or 44% better than baseline.
these improvements are on top of using caching.

Conclusion: results look encouraging. Still need to test on multiple hosts to see how it scales. Need a more compute intensive task with a big data set stored on a real data store.

References

Documentation ¶

Overview ¶

Implement various collaborative filtering algorithms [1] and evaluate using the movie lense data set [2] [1] http://www.stanford.edu/~lmackey/papers/cf_slides-pml09.pdf [2] http://grouplens.org/datasets/movielens/ See README.md for details.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL