randtxt

package module
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 14, 2018 License: Apache-2.0 Imports: 8 Imported by: 0

README

randtxt GoDoc

Generates random text from Markov chains of tagged source text.

An example chain is included which was derived from Plato's Ion:

$ go get github.com/pboyd/randtxt
$ go run github.com/pboyd/randtxt/cmd/gentext -chain $GOPATH/src/github.com/pboyd/randtxt/testfiles/ion/trigram.mkv

Have you already forgotten what you were saying? A rhapsode ought to interpret the mind of the poet. For the rhapsode ought to interpret the mind of the poet. For the poet is a light and winged and holy thing, and there is Phanosthenes of Andros, and Heraclides of Clazomenae, whom they have also appointed to the command of their armies and to other offices, although aliens, after they had shown their merit. And will they not choose Ion the Ephesian to be their general, and honour him, if he prove himself worthy?

To build a chain, use the Stanford POS Tagger to generate tagged text, then run cmd/readtsv. For example:

go run github.com/pboyd/randtxt/cmd/readtsv -chain output.mkv $GOPATH/src/github.com/pboyd/randtxt/testfiles/ion/tagged.tsv

I wrote about the design here.

License

This package is released under the terms of the Apache 2.0 license. See LICENSE.TXT.

Documentation

Overview

Package randtxt contains a random text generator.

Index

Constants

This section is empty.

Variables

View Source
var PennTreebankTagSet = pennTreebankTagSet{}

PennTreebankTagSet is a TagSet for the English Penn Treebank tagset, as used by the Stanford POS tagger.

More details:

https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html https://nlp.stanford.edu/software/tagger.shtml

Functions

This section is empty.

Types

type Generator

type Generator struct {

	// TagSet is the language and tagset specific rules. This should match
	// the TagSet used when the model was built.
	TagSet TagSet
	// contains filtered or unexported fields
}

Generator generates random text from a model built by ModelBuilder.

func NewGenerator

func NewGenerator(chain markov.Chain) (*Generator, error)

NewGenerator returns a new generator. Returns an error if the chain has an unrecognized format.

func (*Generator) Paragraph

func (g *Generator) Paragraph(min, max int) (string, error)

Paragraph returns a paragraph containing between "min" and "max" sentences.

func (*Generator) WriteParagraph added in v1.0.1

func (g *Generator) WriteParagraph(out io.Writer, min, max int) error

WriteParagraph writes a paragraph of random text to "out". The paragraph will contain between "min" and "max" sentences.

type ModelBuilder

type ModelBuilder struct {
	TagSet TagSet
	// contains filtered or unexported fields
}

ModelBuilder builds a model that Generator can use.

func NewModelBuilder

func NewModelBuilder(chain markov.WriteChain, ngramSize int) *ModelBuilder

NewModelBuilder creates a ModelBuilder instance.

The model will be written to "chain".

ngramSize is the number of words to include in each ngram. Must be greater than 1.

See cmd/readtsv for an example.

func (*ModelBuilder) Feed

func (b *ModelBuilder) Feed(sources ...<-chan Tag) error

Feed reads tags from one or more channels and writes them to the output chain.

type Tag

type Tag struct {
	Text string

	// POS is the part of speech tag for the text.
	POS string
}

Tag represents a single tagged word.

func (Tag) IsZero

func (t Tag) IsZero() bool

IsZero tests if the tag is the empty zero value.

func (Tag) String

func (t Tag) String() string

String returns the tag in "Text/POS" form.

type TagSet

type TagSet interface {
	// Join returns the text from "tag" prepended with the separator that
	// should be between "prev" and "tag".
	//
	// "prev" is the zero tag at the beginning of the text.
	Join(tag, prev Tag) string

	// Normalize converts "tag" to a consistent form. If the returned tag
	// text is blank the tag is ignored.
	Normalize(tag, prev Tag) Tag
}

TagSet contains code specific to a language and tagset.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL