prose: github.com/jdkato/prose/summarize Index | Files

package summarize

import "github.com/jdkato/prose/summarize"

Package summarize implements utilities for computing readability scores, usage statistics, and TL;DR summaries of text.

Index

Package Files

easy.go readability.go stop.go summarize.go syllables.go usage.go

func Syllables Uses

func Syllables(word string) int

Syllables returns the number of syllables in the string word.

NOTE: This function expects a word (not raw text) as input.

type Assessment Uses

type Assessment struct {
    // assessments returning an estimated grade level
    AutomatedReadability float64
    ColemanLiau          float64
    FleschKincaid        float64
    GunningFog           float64
    SMOG                 float64
    LIX                  float64

    // mean & standard deviation of the above estimated grade levels
    MeanGradeLevel   float64
    StdDevGradeLevel float64

    // assessments returning non-grade numerical scores
    DaleChall   float64
    ReadingEase float64
}

An Assessment provides comprehensive access to a Document's metrics.

type Document Uses

type Document struct {
    Content         string         // Actual text
    NumCharacters   float64        // Number of Characters
    NumComplexWords float64        // PolysylWords without common suffixes
    NumParagraphs   float64        // Number of paragraphs
    NumPolysylWords float64        // Number of words with > 2 syllables
    NumSentences    float64        // Number of sentences
    NumSyllables    float64        // Number of syllables
    NumWords        float64        // Number of words
    NumLongWords    float64        // Number of long words
    Sentences       []Sentence     // the Document's sentences
    WordFrequency   map[string]int // [word]frequency

    SentenceTokenizer tokenize.ProseTokenizer
    WordTokenizer     tokenize.ProseTokenizer
}

A Document represents a collection of text to be analyzed.

A Document's calculations depend on its word and sentence tokenizers. You can use the defaults by invoking NewDocument, choose another implemention from the tokenize package, or use your own (as long as it implements the ProseTokenizer interface). For example,

d := Document{Content: ..., WordTokenizer: ..., SentenceTokenizer: ...}
d.Initialize()

func NewDocument Uses

func NewDocument(text string) *Document

NewDocument is a Document constructor that takes a string as an argument. It then calculates the data necessary for computing readability and usage statistics.

This is a convenience wrapper around the Document initialization process that defaults to using a WordBoundaryTokenizer and a PunktSentenceTokenizer as its word and sentence tokenizers, respectively.

func (*Document) Assess Uses

func (d *Document) Assess() *Assessment

Assess returns an Assessment for the Document d.

func (*Document) AutomatedReadability Uses

func (d *Document) AutomatedReadability() float64

AutomatedReadability computes the automated readability index score (https://en.wikipedia.org/wiki/Automated_readability_index).

func (*Document) ColemanLiau Uses

func (d *Document) ColemanLiau() float64

ColemanLiau computes the Coleman–Liau index score (https://en.wikipedia.org/wiki/Coleman%E2%80%93Liau_index).

func (*Document) DaleChall Uses

func (d *Document) DaleChall() float64

DaleChall computes the Dale–Chall score (https://en.wikipedia.org/wiki/Dale%E2%80%93Chall_readability_formula).

func (*Document) FleschKincaid Uses

func (d *Document) FleschKincaid() float64

FleschKincaid computes the Flesch–Kincaid grade level (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests).

func (*Document) FleschReadingEase Uses

func (d *Document) FleschReadingEase() float64

FleschReadingEase computes the Flesch reading-ease score (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests).

func (*Document) GunningFog Uses

func (d *Document) GunningFog() float64

GunningFog computes the Gunning Fog index score (https://en.wikipedia.org/wiki/Gunning_fog_index).

func (*Document) Initialize Uses

func (d *Document) Initialize()

Initialize calculates the data necessary for computing readability and usage statistics.

func (*Document) Keywords Uses

func (d *Document) Keywords() map[string]int

Keywords returns a Document's words in the form

map[word]count

omitting stop words and normalizing case.

func (*Document) LIX Uses

func (d *Document) LIX() float64

LIX computes readability measure (https://en.wikipedia.org/wiki/Lix_(readability_test)) .

func (*Document) MeanWordLength Uses

func (d *Document) MeanWordLength() float64

MeanWordLength returns the mean number of characters per word.

func (*Document) SMOG Uses

func (d *Document) SMOG() float64

SMOG computes the SMOG grade (https://en.wikipedia.org/wiki/SMOG).

func (*Document) Summary Uses

func (d *Document) Summary(n int) []RankedParagraph

Summary returns a Document's n highest ranked paragraphs according to keyword frequency.

func (*Document) WordDensity Uses

func (d *Document) WordDensity() map[string]float64

WordDensity returns a map of each word and its density.

type RankedParagraph Uses

type RankedParagraph struct {
    Sentences []Sentence
    Position  int // the zero-based position within a Document
    Rank      int
}

A RankedParagraph is a paragraph ranked by its number of keywords.

type Sentence Uses

type Sentence struct {
    Text      string // the actual text
    Length    int    // the number of words
    Words     []Word // the words in this sentence
    Paragraph int
}

A Sentence represents a single sentence in a Document.

type Word Uses

type Word struct {
    Text      string // the actual text
    Syllables int    // the number of syllables
}

A Word represents a single word in a Document.

Package summarize imports 9 packages (graph) and is imported by 5 packages. Updated 2019-08-19. Refresh now. Tools for package owners.