summarize

package
v0.0.0-...-c24611c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 29, 2017 License: MIT Imports: 8 Imported by: 0

Documentation

Overview

Package summarize implements functions for analyzing readability and usage statistics of text.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Syllables

func Syllables(word string) int

Syllables returns the number of syllables in the string word.

NOTE: This function expects a word (not raw text) as input.

Types

type Assessment

type Assessment struct {
	// assessments returning an estimated grade level
	AutomatedReadability float64
	ColemanLiau          float64
	FleschKincaid        float64
	GunningFog           float64
	SMOG                 float64

	// mean & standard deviation of the above estimated grade levels
	MeanGradeLevel   float64
	StdDevGradeLevel float64

	// assessments returning non-grade numerical scores
	DaleChall   float64
	ReadingEase float64
}

An Assessment provides comprehensive access to a Document's metrics.

type Document

type Document struct {
	Content         string         // Actual text
	NumCharacters   float64        // Number of Characters
	NumComplexWords float64        // PolysylWords without common suffixes
	NumPolysylWords float64        // Number of words with > 2 syllables
	NumSentences    float64        // Number of sentences
	NumSyllables    float64        // Number of syllables
	NumWords        float64        // Number of words
	Sentences       []Sentence     // the Document's sentences
	WordFrequency   map[string]int // [word]frequency

	SentenceTokenizer tokenize.ProseTokenizer
	WordTokenizer     tokenize.ProseTokenizer
}

A Document represents a collection of text to be analyzed.

A Document's calculations depend on its word and sentence tokenizers. You can use the defaults by invoking NewDocument, choose another implemention from the tokenize package, or use your own (as long as it implements the ProseTokenizer interface). For example,

d := Document{Content: ..., WordTokenizer: ..., SentenceTokenizer: ...}
d.Initialize()

func NewDocument

func NewDocument(text string) *Document

NewDocument is a Document constructor that takes a string as an argument. It then calculates the data necessary for computing readability and usage statistics.

This is a convenience wrapper around the Document initialization process that defaults to using a WordBoundaryTokenizer and a PunktSentenceTokenizer as its word and sentence tokenizers, respectively.

func (*Document) Assess

func (d *Document) Assess() *Assessment

Assess returns an Assessment for the Document d.

func (*Document) AutomatedReadability

func (d *Document) AutomatedReadability() float64

AutomatedReadability computes the automated readability index score (https://en.wikipedia.org/wiki/Automated_readability_index).

func (*Document) ColemanLiau

func (d *Document) ColemanLiau() float64

ColemanLiau computes the Coleman–Liau index score (https://en.wikipedia.org/wiki/Coleman%E2%80%93Liau_index).

func (*Document) DaleChall

func (d *Document) DaleChall() float64

DaleChall computes the Dale–Chall score (https://en.wikipedia.org/wiki/Dale%E2%80%93Chall_readability_formula).

func (*Document) FleschKincaid

func (d *Document) FleschKincaid() float64

FleschKincaid computes the Flesch–Kincaid grade level (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests).

func (*Document) FleschReadingEase

func (d *Document) FleschReadingEase() float64

FleschReadingEase computes the Flesch reading-ease score (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests).

func (*Document) GunningFog

func (d *Document) GunningFog() float64

GunningFog computes the Gunning Fog index score (https://en.wikipedia.org/wiki/Gunning_fog_index).

func (*Document) Initialize

func (d *Document) Initialize()

Initialize calculates the data necessary for computing readability and usage statistics.

func (*Document) MeanWordLength

func (d *Document) MeanWordLength() float64

MeanWordLength returns the mean number of characters per word.

func (*Document) SMOG

func (d *Document) SMOG() float64

SMOG computes the SMOG grade (https://en.wikipedia.org/wiki/SMOG).

func (*Document) WordDensity

func (d *Document) WordDensity() map[string]float64

WordDensity returns a map of each word and its density.

type Sentence

type Sentence struct {
	Text   string // the actual text
	Length int    // the number of words
	Words  []Word // the words in this sentence
}

A Sentence represents a single sentence in a Document.

type Word

type Word struct {
	Text      string // the actual text
	Syllables int    // the number of syllables
}

A Word represents a single word in a Document.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL