ntm

package module
v0.0.0-...-552d52e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 18, 2019 License: AGPL-3.0 Imports: 6 Imported by: 0

README

Neural Turing Machines

Package ntm implements the Neural Turing Machine architecture as described in A.Graves, G. Wayne, and I. Danihelka. arXiv preprint arXiv:1410.5401, 2014.

Using this package along its subpackages, the "copy", "repeatcopy" and "ngram" tasks mentioned in the paper were verified. For each of these tasks, the successfully trained models are saved under the filenames "seedA_B", where A is the number indicating the seed provided to rand.Seed in the training process, and B is the iteration number in which the trained weights converged.

Reproducing results in the paper

The following sections detail the steps of verifying the results in the paper. All commands are assumed to be run in the $GOPATH/github.com/fumin/ntm folder.

Copy

Train

To start training, run go run copytask/train/main.go which not only commences training but also starts a web server that would be convenient to track progress. To print debug information about the training process, run curl http://localhost:8082/PrintDebug. Run it twice to close debug info. To track the cross-entropy loss during the training process, run curl http://localhost:8082/Loss. To save the trained weights to disk, run curl http://localhost:8082/Weights > weights.

Testing

To test the saved weights in the previous training step, run go run copytask/test/main.go -weightsFile=weights. Alternatively, you can also specify one of the successfully trained weights in the copytask/test folder such as the file copytask/test/seed2_19000. Upon running the above command, a web server would be started which can be accessed at http://localhost:9000/. Below are screenshots of the web page showing the testing results for a test case of length 20. The first figure shows the input, output, and predictions of the NTM, and the second figure shows the addressing weights of the memory head.

The figure below shows the results for the test case of length 120. As mentioned in the paper, the NTM is able to perform pretty well in this case even though it is only trained on sequences whose length is at most 20.

Repeat copy

To experiment on the repeat copy task, follow the steps of the copy task except changing the package from copytask to repeatcopy.

In this task, I deviated from the paper a bit in an attempt to see if we could general NTM to unseed repeat numbers. In particular, I think that the paper's way of representing the repeat number as a scalar which is normalized to [0, 1] seems a bit too artificial, and I took a different approach by encoding the repeat number as a sequence of binary inputs. The reasoning behind my approach is that by distributing the encoding through time, there would be no upper limits on the repeat number, and given NTMs relatively strong memorizing abilities, this act of distributing through time should not pose too big a problem to NTMs. In addition, I also gave the NTMs two memory heads instead of one as in the paper for these repeat copy tasks. However, in the end the NTMs still was not able to generalize well on the repeat number.

Below, I first show the results on the test case of repeat 7 and length 7. For this test case, we see the NTM is able to solve it perfectly by emitting the end signal unambiguously on the last time instant. Moreover, we see that the NTM solves it by assigning the first memory head the reponsibility of keeping count of the repeat times, and the second memory head the responsibility of replaying the input sequence.

Next, we generalize the NTM to configurations unseen in the training step. The below figure shows the results on generalizing on the repeat number to 15. We see that the NTM fails on this generalization.

The below figure shows the results on generalizing on the sequence length to 15. We see that the NTM does a fairly good job as mentioned in the paper.

Dynamic N-grams

To experiment on the dynamic n-grams task, follow the steps of the copy task except changing the package from copytask to ngram.

The figure below shows the results of this task. We see that the bits-per-sequence loss is 133 which is close to the theoretical optimal value given by Bayesian analysis in the paper. Moreover, by observing the fact that the memory weights for the same 5-bit prefix remains the same throughout the entire testing sequence, we verified the paper's claim that the NTM solved this task by emulating the optimal Bayesian approach of keeping track of transitional counts.

Acrostic generation

I applied NTMs to automatically generate acrostics. An acrostic is a poem in which the first word of each line in the text spells out a message. Acrostics have a rich history in ancient China where literary inquisitions were severe and common, and continues to enjoy much popularity in today's Chinese societies such as Taiwan. The example below shows an acrostic carrying the message "vote to remove Senator 蔡正元 on the 14th", referring to the Senator's recall election on 2015/02/14.

The poem on the left is generated by a combination of a 2-gram model and a set of hand-crafted rules and features, whereas the poem on the right is generated by a NTM whose only learning material is the training corpus. Those whose read classical Chinese should notice that compared to the 2-gram poem on the left, the poem generated by a NTM is grammatically more correct, and resembles more that of written by a true person.

The NTMs in this experiment were trained with the Tang poetry collection 全唐詩. The vocabulary is limited to the top 3000 popular characters in the collection, while the rest are designated as unknown. During training, the network first receives instructions of what and where the keywords are, and is then asked to produce the full poem with no further input. In the example below, the top row are the inputs and the bottom row the outputs. Moreover, one of the instructions in the top input row is for the character 鄉 to appear at the fourth position of the first line.

After training, the NTM is able to achieve a bits-per-character of 7.6600, which is comparable to the 2-gram entropy estimate of 7.4451 on the same corpus over the same 3000 word vocabulary. Moreover, the NTM is able to almost perfectly emit the "linefeed" character at the specified positions, suggesting the fact that it has learned long range dependencies that exceed the capabilities of a 2-gram model.

More details about this experiment can be found in the slides of this talk.

Below are instructions on using this code to generate acrostics with NTMs, which assume we are already in the "poem" folder by running cd poem. To train a NTM to do acrostics, run go run train/main.go as in the steps above for the copy and repeat tasks. To generate acrostics using your trained model or one that comes along this package, run go run test/main.go -weightsFile=test/h1Size512_numHeads8_n128_m32/seed9_78100_5p6573 and possibly substituting the option -weightsFile with a different file.

Testing

To run the tests of this package, run go test -test.v.

Documentation

Overview

Package ntm implements the Neural Turing Machine architecture as described in A.Graves, G. Wayne, and I. Danihelka. arXiv preprint arXiv:1410.5401, 2014.

Using this package along its subpackages, the "copy", "repeatcopy" and "ngram" tasks mentioned in the paper were verified. For each of these tasks, the successfully trained models are saved under the filenames "seedA_B", where A is the number indicating the seed provided to rand.Seed in the training process, and B is the iteration number in which the trained weights converged.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func HeadWeights

func HeadWeights(machines []*NTM) [][][]float64

HeadWeights returns the addressing weights of all memory heads across time. The top level elements represent each head. The second level elements represent every time instant.

func NewEmptyController1

func NewEmptyController1(xSize, ySize, h1Size, numHeads, n, m int) *controller1

NewEmptyController1 returns a new controller1 which is a single layer feedforward network. The returned controller1 is empty in that all its network weights are initialized as 0.

func Predictions

func Predictions(machines []*NTM) [][]float64

Predictions returns the predictions of a NTM across time.

func Sigmoid

func Sigmoid(x float64) float64

Sigmoid computes 1 / (1 + math.Exp(-x))

func Sprint2

func Sprint2(t [][]float64) string

Sprint2 pretty prints a 2 dimensional tensor.

Types

type Controller

type Controller interface {
	// Heads returns the emitted memory heads.
	Heads() []*Head
	// YVal returns the values of the output of the Controller.
	YVal() []float64
	// YVal returns the gradients of the output of the Controller.
	YGrad() []float64

	// Forward creates a new Controller which shares the same internal weights,
	// and performs a forward pass whose results can be retrived by Heads and Y.
	Forward(reads []*memRead, x []float64) Controller
	// Backward performs a backward pass,
	// assuming the gradients on Heads and Y are already set.
	Backward()

	// Wtm1BiasVal returns the values of the bias of the previous weight.
	// The layout is |-- 1st head weights (size memoryN) --|-- 2nd head --|-- ... --|
	// The length of the returned slice is numHeads * memoryN.
	Wtm1BiasVal() []float64
	Wtm1BiasGrad() []float64

	// M1mt1BiasVal returns the values of the bias of the memory bank.
	// The returned matrix is in row major order.
	Mtm1BiasVal() []float64
	Mtm1BiasGrad() []float64

	// WeightsVal returns the values of all weights.
	WeightsVal() []float64
	// WeightsGrad returns the gradients of all weights.
	WeightsGrad() []float64
	// WeightsDesc returns the descriptions of a weight.
	WeightsDesc(i int) string

	// NumHeads returns the number of memory heads of a controller.
	NumHeads() int
	// MemoryN returns the number of vectors of the memory bank of a controller.
	MemoryN() int
	// MemoryM returns the size of a vector in the memory bank of a controller.
	MemoryM() int
}

The Controller interface is implemented by NTM controller networks that wish to operate with memory banks in a NTM.

type DensityModel

type DensityModel interface {
	// Model sets the value and gradient of Units of the output layer.
	Model(t int, yHVal []float64, yHGrad []float64)

	// Loss is the loss definition of this model.
	Loss(output [][]float64) float64
}

A DensityModel is a model of how the last layer of a network gets transformed into the final output.

type Head struct {
	Wtm1 *refocus // the weights at time t-1
	M    int      // size of a row in the memory
	// contains filtered or unexported fields
}

A Head is a read write head on a memory bank. It carriess information that is required to operate on a memory bank according to the NTM architecture.

func NewHead

func NewHead(m int) *Head

NewHead creates a new memory head.

func (*Head) AddGrad

func (h *Head) AddGrad() []float64

func (*Head) AddVal

func (h *Head) AddVal() []float64

AddVal returns the add vector of a memory head.

func (*Head) BetaGrad

func (h *Head) BetaGrad() *float64

func (*Head) BetaVal

func (h *Head) BetaVal() *float64

BetaVal: Beta returns the key strength of a content addressing step.

func (*Head) EraseGrad

func (h *Head) EraseGrad() []float64

func (*Head) EraseVal

func (h *Head) EraseVal() []float64

EraseVal returns the erase vector of a memory head.

func (*Head) GGrad

func (h *Head) GGrad() *float64

func (*Head) GVal

func (h *Head) GVal() *float64

GVal: G returns the degree in which we want to choose content-addressing over location-based-addressing.

func (*Head) GammaGrad

func (h *Head) GammaGrad() *float64

func (*Head) GammaVal

func (h *Head) GammaVal() *float64

GammaVal: Gamma returns the degree in which the addressing weights are sharpened.

func (*Head) KGrad

func (h *Head) KGrad() []float64

func (*Head) KVal

func (h *Head) KVal() []float64

KVal: K returns a head's key vector, which is the target data in the content addressing step.

func (*Head) SGrad

func (h *Head) SGrad() *float64

func (*Head) SVal

func (h *Head) SVal() *float64

SVal: S returns a value indicating how much the weightings are rotated in a location-based-addressing step.

type LogisticModel

type LogisticModel struct {
	// Y is the strength of the output unit at each time step.
	Y [][]float64
}

A LogisticModel models its outputs as logistic sigmoids.

func (*LogisticModel) Loss

func (m *LogisticModel) Loss(output [][]float64) float64

Loss returns the cross entropy loss.

func (*LogisticModel) Model

func (m *LogisticModel) Model(t int, yHVal []float64, yHGrad []float64)

Model sets the values and gradients of the output units.

type MultinomialModel

type MultinomialModel struct {
	// Y is the class of the output at each time step.
	Y []int
}

A MultinomialModel models its outputs as following the multinomial distribution.

func (*MultinomialModel) Loss

func (m *MultinomialModel) Loss(output [][]float64) float64

func (*MultinomialModel) Model

func (m *MultinomialModel) Model(t int, yHVal []float64, yHGrad []float64)

Model sets the values and gradients of the output units.

type NTM

type NTM struct {
	Controller Controller
	// contains filtered or unexported fields
}

A NTM is a neural turing machine as described in A.Graves, G. Wayne, and I. Danihelka. arXiv preprint arXiv:1410.5401, 2014.

func ForwardBackward

func ForwardBackward(c Controller, in [][]float64, out DensityModel) []*NTM

ForwardBackward computes a controller's prediction and gradients with respect to the given ground truth input and output values.

func MakeEmptyNTM

func MakeEmptyNTM(c Controller) *NTM

MakeEmptyNTM makes a NTM with its memory and head weights set to their bias values, based on the controller.

func NewNTM

func NewNTM(old *NTM, x []float64) *NTM

NewNTM creates a new NTM.

type RMSProp

type RMSProp struct {
	C Controller
	N []float64
	G []float64
	D []float64
}

RMSProp implements the rmsprop algorithm. The detailed updating equations are given in Graves, Alex (2013). Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850.

func NewRMSProp

func NewRMSProp(c Controller) *RMSProp

func (*RMSProp) Train

func (r *RMSProp) Train(x [][]float64, y DensityModel, a, b, c, d float64) []*NTM

type SGDMomentum

type SGDMomentum struct {
	C     Controller
	PrevD []float64
}

SGDMomentum implements stochastic gradient descent with momentum.

func NewSGDMomentum

func NewSGDMomentum(c Controller) *SGDMomentum

func (*SGDMomentum) Train

func (s *SGDMomentum) Train(x [][]float64, y DensityModel, alpha, mt float64) []*NTM

type Unit

type Unit struct {
	Val  float64 // value at node
	Grad float64 // gradient at node
}

A Unit is a node in a neural network, containing fields that are essential to efficiently compute gradients in the backward pass of a stochastic gradient descent training process.

func (Unit) String

func (u Unit) String() string

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL