prose.v2: gopkg.in/jdkato/prose.v2 Index | Examples | Files

package prose

import "gopkg.in/jdkato/prose.v2"

Package prose is a repository of packages related to text processing, including tokenization, part-of-speech tagging, and named-entity extraction.

Index

Examples

Package Files

data.go doc.go document.go extract.go model.go segment.go tag.go tokenize.go types.go utilities.go words.go

func Asset Uses

func Asset(name string) ([]byte, error)

Asset loads and returns the asset for the given name. It returns an error if the asset could not be found or could not be loaded.

func AssetDir Uses

func AssetDir(name string) ([]string, error)

AssetDir returns the file names below a certain directory embedded in the file by go-bindata. For example if you run go-bindata on data/... and data contains the following hierarchy:

data/
  foo.txt
  img/
    a.png
    b.png

then AssetDir("data") would return []string{"foo.txt", "img"} AssetDir("data/img") would return []string{"a.png", "b.png"} AssetDir("foo.txt") and AssetDir("notexist") would return an error AssetDir("") will return []string{"data"}.

func AssetInfo Uses

func AssetInfo(name string) (os.FileInfo, error)

AssetInfo loads and returns the asset info for the given name. It returns an error if the asset could not be found or could not be loaded.

func AssetNames Uses

func AssetNames() []string

AssetNames returns the names of the assets.

func MustAsset Uses

func MustAsset(name string) []byte

MustAsset is like Asset but panics when Asset would return an error. It simplifies safe initialization of global variables.

func RestoreAsset Uses

func RestoreAsset(dir, name string) error

RestoreAsset restores an asset under the given directory

func RestoreAssets Uses

func RestoreAssets(dir, name string) error

RestoreAssets restores an asset under the given directory recursively

type DataSource Uses

type DataSource func(model *Model)

DataSource provides training data to a Model.

func UsingEntities Uses

func UsingEntities(data []EntityContext) DataSource

UsingEntities creates a NER from labeled data.

type DocOpt Uses

type DocOpt func(doc *Document, opts *DocOpts)

A DocOpt represents a setting that changes the document creation process.

For example, it might disable named-entity extraction:

doc := prose.NewDocument("...", prose.WithExtraction(false))

func UsingModel Uses

func UsingModel(model *Model) DocOpt

UsingModel can enable (the default) or disable named-entity extraction.

func WithExtraction Uses

func WithExtraction(include bool) DocOpt

WithExtraction can enable (the default) or disable named-entity extraction.

func WithSegmentation Uses

func WithSegmentation(include bool) DocOpt

WithSegmentation can enable (the default) or disable sentence segmentation.

func WithTagging Uses

func WithTagging(include bool) DocOpt

WithTagging can enable (the default) or disable POS tagging.

func WithTokenization Uses

func WithTokenization(include bool) DocOpt

WithTokenization can enable (the default) or disable tokenization.

type DocOpts Uses

type DocOpts struct {
    Extract  bool // If true, include named-entity extraction
    Segment  bool // If true, include segmentation
    Tag      bool // If true, include POS tagging
    Tokenize bool // If true, include tokenization
}

DocOpts controls the Document creation process:

type Document Uses

type Document struct {
    Model *Model
    Text  string
    // contains filtered or unexported fields
}

A Document represents a parsed body of text.

func NewDocument Uses

func NewDocument(text string, opts ...DocOpt) (*Document, error)

NewDocument creates a Document according to the user-specified options.

For example,

doc := prose.NewDocument("...")

func (*Document) Entities Uses

func (doc *Document) Entities() []Entity

Entities returns `doc`'s entities.

func (*Document) Sentences Uses

func (doc *Document) Sentences() []Sentence

Sentences returns `doc`'s sentences.

func (*Document) Tokens Uses

func (doc *Document) Tokens() []Token

Tokens returns `doc`'s tokens.

type Entity Uses

type Entity struct {
    Text  string // The entity's actual content.
    Label string // The entity's label.
}

An Entity represents an individual named-entity.

type EntityContext Uses

type EntityContext struct {
    // Is this is a correct entity?
    //
    // Some annotation software, e.g. Prodigy, include entities "rejected" by
    // its user. This allows us to handle those cases.
    Accept bool

    Spans []LabeledEntity // The entity locations relative to `Text`.
    Text  string          // The sentence containing the entities.
}

EntityContext represents text containing named-entities.

type LabeledEntity Uses

type LabeledEntity struct {
    Start int
    End   int
    Label string
}

LabeledEntity represents an externally-labeled named-entity.

type Model Uses

type Model struct {
    Name string
    // contains filtered or unexported fields
}

A Model holds the structures and data used internally by prose.

func ModelFromData Uses

func ModelFromData(name string, sources ...DataSource) *Model

ModelFromData creates a new Model from user-provided training data.

func ModelFromDisk Uses

func ModelFromDisk(path string) *Model

ModelFromDisk loads a Model from the user-provided location.

func (*Model) Write Uses

func (m *Model) Write(path string) error

Write saves a Model to the user-provided location.

type Sentence Uses

type Sentence struct {
    Text string // The sentence's text.
}

A Sentence represents a segmented portion of text.

type Token Uses

type Token struct {
    Tag   string // The token's part-of-speech tag.
    Text  string // The token's actual content.
    Label string // The token's IOB label.
}

A Token represents an individual token of text such as a word or punctuation symbol.

type TupleSlice Uses

type TupleSlice [][][]string

TupleSlice is a slice of tuples in the form (words, tags).

func ReadTagged Uses

func ReadTagged(text, sep string) TupleSlice

ReadTagged converts pre-tagged input into a TupleSlice suitable for training.

Code:

tagged := "Pierre|NNP Vinken|NNP ,|, 61|CD years|NNS"
fmt.Println(ReadTagged(tagged, "|"))

Output:

[[[Pierre Vinken , 61 years] [NNP NNP , CD NNS]]]

func (TupleSlice) Len Uses

func (t TupleSlice) Len() int

Len returns the length of a Tuple.

func (TupleSlice) Swap Uses

func (t TupleSlice) Swap(i, j int)

Swap switches the ith and jth elements in a Tuple.

Package prose imports 22 packages (graph) and is imported by 1 packages. Updated 2018-10-23. Refresh now. Tools for package owners.