markov

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 28, 2019 License: MIT Imports: 4 Imported by: 0

README

Markov Chains in Go

Go Report Card GoDoc

If you don't know what a Markov Chain is I recommend reading up on it and checking out this visual explanation. To put it simply, Markov Chain represent probabilistic state changes inside a Finite-State-Machine. The state transition probabilities can be easily represented as a matrix but in practice (software) that leaves lots of entries being 0 and taking up memory, so a more elegant solution is a nested hash map.

In the code above the following are used:

// represents a grouping of individual words
// eg: []string{"I", "am", "Alex"}, this can be extracted
// from an original string of any form or shape:
// "I am Alex", "I   am  Alex", "I:am:Alex"
// and it's all up to the caller to split their strings into sequences
type Sequence []string
// a pairs represents a possible transition between a sequence of n words
// and the next (single) word
// the Current sequence must be of an equal lenght to the chain pair size
// meaning you can't have some transitions for 2-grouped words and 1-grouped words
type Pair struct {
	Current Sequence
	Next    string
}
// by having a
type transitionMap map[string]int
// and then nested inside
frequencyMatrix map[string]transitionMap
// we generate our mapping of all encountered
// sequences to their respective next word
// and the number of times this occurs

Once your wrap your head around these structures, the rest of the functions are easy to understand.

Examples

The text below was generated on a 2-paired sequence chain with a student movie script as input.

You know I can't KEN I understand. KEN opens the door to
a soaking wet KEN, who stands on the pink scissors and
picks them up, toying with them. KEN glances around for
his wallet. DEBRA (Comforting) There's nothing to be
embarrassed about you know. Lots of people it happens.
KEN grunts. KEN Yeah. The bird chirps from its cage in
the corner. KEN tries to scream but is unable to pull in
oxygen. The pressure in his ears begins to burst as the
bathtub overflows. 29. A figure appears above him,
blurry through the water.

Download & Install

If you have Go installed, you can simply run:

go get cpl.li/go/markov/markov-cli

Usage

I provided an example main function with stdin input and basic flag parsing for generating n words from the input data.

package main

import (
	"flag"
	"fmt"
	"io/ioutil"
	"log"
	"os"
	"strings"

	"cpl.li/go/markov"
)

func main() {
	// handle flags
	maxWords := flag.Int("words", 100, "max words to generate (default 100)")
	pairSize := flag.Int("pairs", 2, "size of a word pair (default 2)")
	flag.Parse()

	c := NewChain(*pairSize) // create markov chain

	// read stdin
	data, err := ioutil.ReadAll(os.Stdin)
	if err != nil {
		log.Fatal(err)
	}

	// give data as sequence to chain model
	c.Add(strings.Fields(string(data)))

	b := c.NewBuilder(nil)             // create builder on top of chain
	b.Generate(*maxWords - c.PairSize) // generate new words
	fmt.Println(b.String())            // print end product
}

Documentation

Overview

Package markov provides a markov chain implementation which allows you to "train" a model using any form of text as input. The markov chain will split the text sequence into pairs and then generate the transition mapping.

A Builder implementation also exists, this can be generated on top of a chain in order to generate a continuous flow of new words.

MIT License Copyright (c) 2019 Alexandru-Paul Copil

Example (Basic)

This example shows a general usecase for the Markov Chain and the builder. It takes input from `stdin` and trains the markov chain then generates a given number of words nd prints out the fully generated string. The flags can configure the max number of words to generate and the sequence pairing to be used when "training" the markov chain.

package main

import (
	"flag"
	"fmt"
	"io/ioutil"
	"log"
	"os"
	"strings"

	"cpl.li/go/markov"
)

func main() {
	// handle flags
	maxWords := flag.Int("words", 100, "max words to generate (default 100)")
	pairSize := flag.Int("pairs", 2, "size of a word pair (default 2)")
	flag.Parse()

	c := markov.NewChain(*pairSize) // create markov chain

	// read stdin
	data, err := ioutil.ReadAll(os.Stdin)
	if err != nil {
		log.Fatal(err)
	}

	// give data as sequence to chain model
	c.Add(strings.Fields(string(data)))

	b := c.NewBuilder(nil)             // create builder on top of chain
	b.Generate(*maxWords - c.PairSize) // generate new words
	fmt.Println(b.String())            // print end product
}
Output:

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Builder

type Builder struct {
	// contains filtered or unexported fields
}

Builder spawns from a Markov chain and using the generated tokens from it, creates sequences of words.

func (*Builder) Generate

func (b *Builder) Generate(count int) int

Generate will tell the builder to poll the markov chain for at most `count` new words to append to the builders sequence. The function will return the real number of generated words, 0 meaning no new words could be generated.

func (*Builder) String

func (b *Builder) String() string

String will return the word sequence as a single string of all words separated by spaces.

type Chain

type Chain struct {
	PairSize int
	// contains filtered or unexported fields
}

Chain represents a Markov chain composed of given length pairs extracted from provided sequences.

func NewChain

func NewChain(pairSize int) *Chain

NewChain generates a Chain with pairs of given length.

func (*Chain) Add

func (c *Chain) Add(sequence Sequence)

Add adds the transition counts to the chain for a given sequence of words.

func (*Chain) NewBuilder

func (c *Chain) NewBuilder(seed Sequence) Builder

NewBuilder creates a Markov sequence builder form the current chain.

func (*Chain) Next

func (c *Chain) Next(seed Sequence) string

Next will give you the next possible token for a certain sequence based on a random weighted decision.

func (*Chain) TransitionProbability

func (c *Chain) TransitionProbability(p Pair) (float64, error)

TransitionProbability returns the probability of transition between the current and next state of a pair.

type Pair

type Pair struct {
	Current Sequence
	Next    string
}

Pair represents a state transition between a set of 1 or more words and the next word in the Sequence.

type Sequence

type Sequence []string

Sequence represents a set of 1 or more words used to build a Markov chain.

func (Sequence) Pairs

func (s Sequence) Pairs(size int) []Pair

Pairs extracts pairs with states of given size transitioning to new words.

func (Sequence) String

func (s Sequence) String() string

String returns the words of the sequence as a single string formed of space separeted words.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL