spsa

package module
v0.0.0-...-eb3117f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 20, 2013 License: MIT Imports: 3 Imported by: 0

README

SPSA in Go Build Status

Simultaneous Perturbation Stochastic Approximation is a global optimizer for continuous loss functions of any dimension. It has a strong theoretical foundation with very few knobs that must be tuned. While it doesn't get the press of Genetic Algorithms or other Evolutionary Computing, it is on the same level with a better foundation.

Documentation

See godoc or the examples

Using SPSA

SPSA minimizes a loss function, so maximization problems must negate the fitness function. It can optionally use a constraint mapper that runs each round of SPSA. SPSA is an iterative or recursive optimization algorithm based as a stochastic extension of gradient descent. Like the Finite Difference (FDSA) approximation to the gradient, SPSA uses Simultaneous Perturbation to estimate the gradient of the loss function being minimized. However, it only uses two loss measurements, regardless of the dimension of the parameter vector, whereas FDSA uses 2p where p is the dimension of the parameter vector. Surprisingly, this shortcut has no ill effect on convergence rate and only a small effect on convergence criteria.

Since SPSA uses a small amount of randomness in its gradient estimate, it also produces some noise. This noise is the good kind however, because it promotes SPSA to become a global optimizer with the same convergence rate as FDSA!

SPSA has fewer tuning knobs than many other global optimization methods, but it still requires some work. The most important parameter is a in the a(k) = a / (k + 1 + A) ^ alpha gain sequence (see tuning below). It can wildly affect results. If you wish to limit function measurements, since SPSA uses two function measurements per iteration, pass in N / 2 as the number of rounds to run when using SPSA.

For more information on SPSA, please see Spall's papers from his website.

Tuning

To tune SPSA, I suggest the Semiautomatic Tuning method by Spall introduced in his book Introduction to Stochastic Search and Optimization (ISSO). There are 3 main knobs, the two gain sequences ( a and c), and the perturbation distribution, delta. Delta is best chosen as Bernoulli +/- 1, as that is asymptotically optimal (it must not be Normal or Uniform, see ISSO). There are rules about the properties of the gain sequences. The standard form works well, and follows the forms:

a(k) = a / (k + 1 + A) ^ alpha
c(k) = c / (k + 1) ^ gamma

Here, the values alpha and gamma are tied together, asymptotically optimial is alpha = 1 and gamma = 1/6, but the values alpha = .602 and gamma = .101 work well for finite cases. A should be about 10% of the total iterations, and c should be approximately the standard deviation of the loss function in question. a is the most volatile parameter, and must be tuned carefully. In general, one should pick a such that the first step is not too large so as to send the algorithm in the wrong direction.

References

  • SPSA Website
  • Introduction to Stochastic Search and Optimization. James Spall. Wiley 2003.

License

The MIT License found in the LICENSE file.

Documentation

Overview

Package spsa provides the Simultaneous Perturbation Stochastic Approximation method.

Much of the notation is taken from Introduction To Stochastic Search and Optimization (ISSO), James Spall's book on Stochastic Optimization. (published by Wiley 2003)

Example Usage:

This example uses the core optimization api with access to all the tunable knobs.

spsa := &SPSA{
	L: AbsoluteSum, // Loss Function
	C: NoConstraints,
	Theta: Vector{1,1,1,1,1},
	Ak: StandardAk(1, 100, .602),
	Ck: StandardCk(.1, .101),
	Delta: Bernoulli{1},
}
theta := spsa.Run(1000)

This example uses the helper function Optimize which shortens the boilerplate with default options.

theta := Optimize(AbsoluteSum/*Loss function*/, Vector{1,1,1,1,1}/*Theta0*/, 100/*n*/, 1/*a*/, .1/*c*/)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func AbsoluteSum

func AbsoluteSum(v Vector) (a float64)

Basic absolute sum loss function which is used for testing

func Rosenbrock

func Rosenbrock(v Vector) (a float64)

Types

type Bernoulli

type Bernoulli struct {
	// contains filtered or unexported fields
}

The bernoulli +/- r distribution.

func (Bernoulli) Sample

func (b Bernoulli) Sample() float64

type BoundedConstraints

type BoundedConstraints []Bounds

An array of bounds on an array of variables. This object's Constrain function can be used as a ConstraintFunction for SPSA.

func (BoundedConstraints) Constrain

func (bc BoundedConstraints) Constrain(theta Vector) Vector

Constrain theta by mapping each value into its bounded domain. (in place)

type Bounds

type Bounds struct {
	Lower, Upper float64
}

A pair of bounds on a variable. The first is the lower bound and the second is the upper bound

type ConstraintFunction

type ConstraintFunction func(Vector) Vector

Map the parameter vector to a constrained version of itself.

type GainSequence

type GainSequence <-chan float64

Gain sequences are infinite iterators of floats. The must follow the conditions specified in ISSO.

func StandardAk

func StandardAk(a, A, alpha float64) GainSequence

Create an infinite iterator of a_k gain values in standard form. Standard form is a_k = a / (k + 1 + A) ^ alpha Semiauomatic tuning says that A should be roughly 10% of the iterations, and alpha should be .602. However, if running for a very long time, alpha = 1.0 might be more advantageous.

func StandardCk

func StandardCk(c, gamma float64) GainSequence

Create an infinite iterator of c_k gain values in standard form. Standard form is c_k = c / (k + 1) ^ gamma Semiautomatic tuning says that c = sqrt(Var(L(x))) where L is the loss function being optimized. Gamma has restrictions based on alpha (see ISSO), but for best results in finite samples, use gamma = .101.

type LossFunction

type LossFunction func(Vector) float64

A loss function is a vector-valued to real function. It will be minimized in SPSA. (Negate maximization functions to act as Loss functions.)

type PerturbationDistribution

type PerturbationDistribution interface {
	Sample() float64
}

A perturbation distribution is used to simultaneously perturb the otimization criteria to approximate the loss function's gradient. It must have special properties, the most restrictive is E[1/X] is bounded. This rules out uniform and normal. The asymptotically optimal distribution is Bernoulli +/- 1.

type SPSA

type SPSA struct {
	// The parameter vector in question. Initialize with Theta0 starting point.
	Theta Vector

	L      LossFunction
	Ak, Ck GainSequence
	Delta  PerturbationDistribution
	C      ConstraintFunction
}

An instance of the SPSA optimization algorithm. Initialize with all the parameters as object instantiation.

func (*SPSA) Run

func (spsa *SPSA) Run(rounds int) Vector

Helper function to run many rounds of SPSA and return the current Theta value.

type SegmentedUniform

type SegmentedUniform struct {
	// contains filtered or unexported fields
}

The segmented/mirrored uniform distribution. Samples with equal probability all real numbers in [a,b] U [-b,-a] where 0 < a < b.

func (SegmentedUniform) Sample

func (su SegmentedUniform) Sample() float64

type Vector

type Vector []float64

A simple real vector type for better readability. All operations are out-of-place.

func NoConstraints

func NoConstraints(a Vector) Vector

A ConstraintFunction that is just the identity mapper

func Optimize

func Optimize(L LossFunction, theta0 Vector, n int, a, c float64, C ...ConstraintFunction) Vector

A helper function to optimize a loss function using SPSA using mostly default options. It uses standard ak and ck gain sequences, bernoulli +/- 1 perturbation distribution and n rounds. The constraint function is optional.

func SampleN

func SampleN(n int, d PerturbationDistribution) Vector

func (Vector) Add

func (a Vector) Add(b Vector) Vector

Add a and b. (out of place)

func (Vector) Copy

func (a Vector) Copy() Vector

Copy a to a new vector.

func (Vector) Mean

func (a Vector) Mean() (m float64)

Mean of a

func (Vector) MeanSquare

func (a Vector) MeanSquare() (x float64)

Mean squared of a

func (Vector) Scale

func (a Vector) Scale(s float64) Vector

Scale a by s. Returns the new vector. (out of place)

func (Vector) String

func (a Vector) String() (s string)

String form

func (Vector) Subtract

func (a Vector) Subtract(b Vector) Vector

Add b from a. (out of place)

func (Vector) Sum

func (a Vector) Sum() (s float64)

Sum a

func (Vector) Var

func (a Vector) Var() (x float64)

Variance of a

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL