summarize

package
v1.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 22, 2020 License: MIT Imports: 9 Imported by: 5

Documentation

Overview

Package summarize implements utilities for computing readability scores, usage statistics, and TL;DR summaries of text.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Syllables

func Syllables(word string) int

Syllables returns the number of syllables in the string word.

NOTE: This function expects a word (not raw text) as input.

Types

type Assessment

type Assessment struct {
	// assessments returning an estimated grade level
	AutomatedReadability float64
	ColemanLiau          float64
	FleschKincaid        float64
	GunningFog           float64
	SMOG                 float64
	LIX                  float64

	// mean & standard deviation of the above estimated grade levels
	MeanGradeLevel   float64
	StdDevGradeLevel float64

	// assessments returning non-grade numerical scores
	DaleChall   float64
	ReadingEase float64
}

An Assessment provides comprehensive access to a Document's metrics.

type Document

type Document struct {
	Content         string         // Actual text
	NumCharacters   float64        // Number of Characters
	NumComplexWords float64        // PolysylWords without common suffixes
	NumParagraphs   float64        // Number of paragraphs
	NumPolysylWords float64        // Number of words with > 2 syllables
	NumSentences    float64        // Number of sentences
	NumSyllables    float64        // Number of syllables
	NumWords        float64        // Number of words
	NumLongWords    float64        // Number of long words
	Sentences       []Sentence     // the Document's sentences
	WordFrequency   map[string]int // [word]frequency

	SentenceTokenizer tokenize.ProseTokenizer
	WordTokenizer     tokenize.ProseTokenizer
}

A Document represents a collection of text to be analyzed.

A Document's calculations depend on its word and sentence tokenizers. You can use the defaults by invoking NewDocument, choose another implemention from the tokenize package, or use your own (as long as it implements the ProseTokenizer interface). For example,

d := Document{Content: ..., WordTokenizer: ..., SentenceTokenizer: ...}
d.Initialize()

func NewDocument

func NewDocument(text string) *Document

NewDocument is a Document constructor that takes a string as an argument. It then calculates the data necessary for computing readability and usage statistics.

This is a convenience wrapper around the Document initialization process that defaults to using a WordBoundaryTokenizer and a PunktSentenceTokenizer as its word and sentence tokenizers, respectively.

func (*Document) Assess

func (d *Document) Assess() *Assessment

Assess returns an Assessment for the Document d.

func (*Document) AutomatedReadability

func (d *Document) AutomatedReadability() float64

AutomatedReadability computes the automated readability index score (https://en.wikipedia.org/wiki/Automated_readability_index).

func (*Document) ColemanLiau

func (d *Document) ColemanLiau() float64

ColemanLiau computes the Coleman–Liau index score (https://en.wikipedia.org/wiki/Coleman%E2%80%93Liau_index).

func (*Document) DaleChall

func (d *Document) DaleChall() float64

DaleChall computes the Dale–Chall score (https://en.wikipedia.org/wiki/Dale%E2%80%93Chall_readability_formula).

func (*Document) FleschKincaid

func (d *Document) FleschKincaid() float64

FleschKincaid computes the Flesch–Kincaid grade level (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests).

func (*Document) FleschReadingEase

func (d *Document) FleschReadingEase() float64

FleschReadingEase computes the Flesch reading-ease score (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests).

func (*Document) GunningFog

func (d *Document) GunningFog() float64

GunningFog computes the Gunning Fog index score (https://en.wikipedia.org/wiki/Gunning_fog_index).

func (*Document) Initialize

func (d *Document) Initialize()

Initialize calculates the data necessary for computing readability and usage statistics.

func (*Document) Keywords

func (d *Document) Keywords() map[string]int

Keywords returns a Document's words in the form

map[word]count

omitting stop words and normalizing case.

func (*Document) LIX added in v1.1.1

func (d *Document) LIX() float64

LIX computes readability measure (https://en.wikipedia.org/wiki/Lix_(readability_test)) .

func (*Document) MeanWordLength

func (d *Document) MeanWordLength() float64

MeanWordLength returns the mean number of characters per word.

func (*Document) SMOG

func (d *Document) SMOG() float64

SMOG computes the SMOG grade (https://en.wikipedia.org/wiki/SMOG).

func (*Document) Summary

func (d *Document) Summary(n int) []RankedParagraph

Summary returns a Document's n highest ranked paragraphs according to keyword frequency.

func (*Document) WordDensity

func (d *Document) WordDensity() map[string]float64

WordDensity returns a map of each word and its density.

type RankedParagraph

type RankedParagraph struct {
	Sentences []Sentence
	Position  int // the zero-based position within a Document
	Rank      int
}

A RankedParagraph is a paragraph ranked by its number of keywords.

type Sentence

type Sentence struct {
	Text      string // the actual text
	Length    int    // the number of words
	Words     []Word // the words in this sentence
	Paragraph int
}

A Sentence represents a single sentence in a Document.

type Word

type Word struct {
	Text      string // the actual text
	Syllables int    // the number of syllables
}

A Word represents a single word in a Document.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL