nlp

package
v0.0.0-...-a268b5b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 2, 2022 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type EnglishWordsCounterPQ

type EnglishWordsCounterPQ struct {
	Map      map[string]*priority.Item
	Priority *priority.Queue

	sync.Mutex
}

EnglishWordsCounterPQ 分词,统计频数,英文适用,基于堆(优先队列)

func (*EnglishWordsCounterPQ) AddSentence

func (c *EnglishWordsCounterPQ) AddSentence(s string)

func (*EnglishWordsCounterPQ) PopMostCommon

func (c *EnglishWordsCounterPQ) PopMostCommon() string

type EnglishWordsCounterQS

type EnglishWordsCounterQS struct {
	Map  map[string]int // [word]: idx_in_List
	List wordsList

	sync.Mutex
}

EnglishWordsCounterQS 是基于顺序表和快速排序的 counter Benchmark 时空性能不如 PQ

func (*EnglishWordsCounterQS) AddSentence

func (c *EnglishWordsCounterQS) AddSentence(s string)

func (*EnglishWordsCounterQS) PopMostCommon

func (c *EnglishWordsCounterQS) PopMostCommon() string

type WordsCounter

type WordsCounter interface {
	// AddSentence 从句子中分词,统计
	AddSentence(s string)
	// PopMostCommon 获取频次最高的词
	PopMostCommon() string
}

WordsCounter 词频统计

目前只有英文的,推荐用 EnglishWordsCounterPQ,这个性能比较好:

$  go test -bench=.  -benchmem
goos: darwin
goarch: amd64
pkg: spotifyplaylist/nlp
cpu: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
BenchmarkEnglishWordCounterPQ-4    33    35152096 ns/op    7236090 B/op    57066 allocs/op
BenchmarkEnglishWordCounterQS-4    28    39783905 ns/op    7251620 B/op    57171 allocs/op

func NewEnglishWordCounterPQ

func NewEnglishWordCounterPQ() WordsCounter

func NewEnglishWordCounterQS

func NewEnglishWordCounterQS() WordsCounter

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL