analysis

package
v1.0.21 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 2, 2022 License: AGPL-3.0 Imports: 16 Imported by: 0

Documentation

Index

Constants

View Source
const (
	Noun uint32 = 1 << iota //  NOUN  СУЩ      имя существительное
	AdjF                    //  ADJF  ПРИЛ     имя прилагательное (полное)
	AdjS                    //  ADJS  КР_ПРИЛ  имя прилагательное (краткое)
	Comp                    //  COMP  КОМП     компаратив
	Verb                    //  VERB  ГЛ       глагол (личная форма)
	Infn                    //  INFN  ИНФ      глагол (инфинитив)
	PrtF                    //  PRTF  ПРИЧ     причастие (полное)
	PrtS                    //  PRTS  КР_ПРИЧ  причастие (краткое)
	Grnd                    //  GRND  ДЕЕПР    деепричастие
	Numr                    //  NUMR  ЧИСЛ     числительное
	Advb                    //  ADVB  Н        наречие
	Npro                    //  NPRO  МС       местоимение-существительное
	Pred                    //  PRED  ПРЕДК    предикатив
	Prep                    //  PREP  ПР       предлог
	Conj                    //  CONJ  СОЮЗ     союз
	Prcl                    //  PRCL  ЧАСТ     частица
	Intj                    //  INTJ  МЕЖД     междометие

	Adj      = AdjF | AdjS
	Prt      = PrtF | PrtS
	VerbPlus = Verb | Infn | PrtF | PrtS | Grnd
	Unknown  = AdjF | AdjS | Comp | Verb | Infn | PrtF | PrtS | Grnd | Numr | Advb | Npro | Pred | Prep | Conj | Prcl | Intj
)

Variables

This section is empty.

Functions

func MI

func MI(size, ab, amulb, span int) float64

https://www.english-corpora.org/mutualInformation.asp

In our corpora, Mutual Information is calculated as follows:

MI = log ( (AB * sizeCorpus) / (A * B * span) ) / log (2)

Suppose we are calculating the MI for the collocate color near purple in BNC.

A = frequency of node word (e.g. purple): 1262 B = frequency of collocate (e.g. color): 115 AB = frequency of collocate near the node word (e.g. color near purple): 24 sizeCorpus= size of corpus (# words; in this case the BNC): 96,263,399 span = span of words (e.g. 3 to left and 3 to right of node word): 6 log (2) is literally the log10 of the number 2: .30103

MI = 11.37 = log ( (24 * 96,263,399) / (1262 * 115 * 6) ) / .30103

func MI3

func MI3(size, ab, amulb, span int) float64

Types

type Collocation

type Collocation struct {
	W1, W2 string
	Freq   int
	Score  float64
}

type DB

type DB struct {
	// contains filtered or unexported fields
}

func New

func New(dsn string) (_ *DB, err error)

func (*DB) Add

func (db *DB) Add(text string) error

func (*DB) Close

func (db *DB) Close() error

func (*DB) Collocations

func (db *DB) Collocations(pat1, pat2 string, left, right int, scoreFunc ScoreFunc) ([]Collocation, error)

func (*DB) Drop

func (db *DB) Drop() error

func (*DB) Ngrams

func (db *DB) Ngrams(n int, scoreFunc ScoreFunc) ([]NgramWithScore, error)

func (*DB) Norms

func (db *DB) Norms(minFrequency, minLength int) ([]WordWithFrequency, error)

func (*DB) WordCount

func (db *DB) WordCount() (int, error)

func (*DB) WordFrequencyMap

func (db *DB) WordFrequencyMap() (map[string]int, error)

func (*DB) Words

func (db *DB) Words(minFrequency, minLength int) ([]WordWithFrequency, error)

type NgramWithScore

type NgramWithScore struct {
	Ngram string
	Freq  int
	Score float64
}

type ScoreFunc

type ScoreFunc func(int, int, int, int) float64

type WordScanner

type WordScanner struct {
	// contains filtered or unexported fields
}

func NewWordScanner

func NewWordScanner(r io.Reader) *WordScanner

func (*WordScanner) Err

func (s *WordScanner) Err() error

func (*WordScanner) Scan

func (s *WordScanner) Scan() bool

func (*WordScanner) Word

func (s *WordScanner) Word() (int64, int64, string)

type WordWithFrequency

type WordWithFrequency struct {
	Word string
	Freq int
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL