analyse

package
v0.0.0-...-36c17a1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 3, 2022 License: AGPL-3.0 Imports: 11 Imported by: 0

Documentation

Overview

Package analyse is the Golang implementation of Jieba's analyse module.

Example (ExtractTags)
var t TagExtracter
t.LoadDictionaryAt("../dict.txt")
t.LoadIdfAt("idf.txt")

sentence := "这是一个伸手不见五指的黑夜。我叫孙悟空,我爱北京,我爱Python和C++。"
segments := t.ExtractTags(sentence, 5)
fmt.Printf("Top %d tags:", len(segments))
for _, segment := range segments {
	fmt.Printf(" %s /", segment.Text())
}
Output:

Top 5 tags: Python / C++ / 伸手不见五指 / 孙悟空 / 黑夜 /
Example (TextRank)
t, err := NewTextRankerAt("../dict.txt")
if err != nil {
	panic(err)
}

sentence := "此外,公司拟对全资子公司吉林欧亚置业有限公司增资4.3亿元,增资后,吉林欧亚置业注册资本由7000万元增加到5亿元。吉林欧亚置业主要经营范围为房地产开发及百货零售等业务。目前在建吉林欧亚城市商业综合体项目。2013年,实现营业收入0万元,实现净利润-139.13万元。"

result := t.TextRank(sentence, 10)
for _, segment := range result {
	fmt.Printf("%s %f\n", segment.Text(), segment.Weight())
}
Output:

吉林 1.000000
欧亚 0.878078
置业 0.562048
实现 0.520906
收入 0.384284
增资 0.360591
子公司 0.353132
城市 0.307509
全资 0.306324
商业 0.306138

Index

Examples

Constants

This section is empty.

Variables

View Source
var DefaultStopWordMap = map[string]int{
	"the":   1,
	"of":    1,
	"is":    1,
	"and":   1,
	"to":    1,
	"in":    1,
	"that":  1,
	"we":    1,
	"for":   1,
	"an":    1,
	"are":   1,
	"by":    1,
	"be":    1,
	"as":    1,
	"on":    1,
	"with":  1,
	"can":   1,
	"if":    1,
	"from":  1,
	"which": 1,
	"you":   1,
	"it":    1,
	"this":  1,
	"then":  1,
	"at":    1,
	"have":  1,
	"all":   1,
	"not":   1,
	"one":   1,
	"has":   1,
	"or":    1,
}

DefaultStopWordMap contains some stop words.

Functions

This section is empty.

Types

type Idf

type Idf struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

Idf represents a thread-safe dictionary for all words with their IDFs(Inverse Document Frequency).

func NewIdf

func NewIdf() *Idf

NewIdf creates a new Idf instance.

func (*Idf) AddToken

func (i *Idf) AddToken(token dictionary.Token)

AddToken adds a new word with IDF into it's dictionary.

func (*Idf) Frequency

func (i *Idf) Frequency(key string) (float64, bool)

Frequency returns the IDF of given word.

func (*Idf) Load

func (i *Idf) Load(tokens ...dictionary.Token)

Load loads all tokens into it's dictionary.

type Segment

type Segment struct {
	// contains filtered or unexported fields
}

Segment represents a word with weight.

func (Segment) Text

func (s Segment) Text() string

Text returns the segment's text.

func (Segment) Weight

func (s Segment) Weight() float64

Weight returns the segment's weight.

type Segments

type Segments []Segment

Segments represents a slice of Segment.

func (Segments) Len

func (ss Segments) Len() int

func (Segments) Less

func (ss Segments) Less(i, j int) bool

func (Segments) Swap

func (ss Segments) Swap(i, j int)

type StopWord

type StopWord struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

StopWord is a thread-safe dictionary for all stop words.

func NewStopWord

func NewStopWord() *StopWord

NewStopWord create a new StopWord with default stop words.

func (*StopWord) AddToken

func (s *StopWord) AddToken(token dictionary.Token)

AddToken adds a token into StopWord dictionary.

func (*StopWord) IsStopWord

func (s *StopWord) IsStopWord(word string) bool

IsStopWord checks if a given word is stop word.

func (*StopWord) Load

func (s *StopWord) Load(tokens ...dictionary.Token)

Load loads all tokens into StopWord dictionary.

type TagExtracter

type TagExtracter struct {
	// contains filtered or unexported fields
}

TagExtracter is used to extract tags from sentence.

func (*TagExtracter) ExtractTags

func (t *TagExtracter) ExtractTags(sentence string, topK int) (tags Segments)

ExtractTags extracts the topK key words from sentence.

func (*TagExtracter) LoadDictionary

func (t *TagExtracter) LoadDictionary(file io.Reader) (err error)

LoadDictionary reads the given filename and create a new dictionary.

func (*TagExtracter) LoadDictionaryAt

func (t *TagExtracter) LoadDictionaryAt(file string) (err error)

LoadDictionaryAt reads the given filename and create a new dictionary.

func (*TagExtracter) LoadIdf

func (t *TagExtracter) LoadIdf(file io.Reader) error

LoadIdf reads the given file and create a new Idf dictionary.

func (*TagExtracter) LoadIdfAt

func (t *TagExtracter) LoadIdfAt(fileName string) error

LoadIdfAt reads the given file and create a new Idf dictionary.

func (*TagExtracter) LoadStopWords

func (t *TagExtracter) LoadStopWords(file io.Reader) error

LoadStopWords reads the given file and create a new StopWord dictionary.

func (*TagExtracter) LoadStopWordsAt

func (t *TagExtracter) LoadStopWordsAt(file string) error

LoadStopWordsAt reads the given file and create a new StopWord dictionary.

type TextRanker

type TextRanker posseg.Segmenter

TextRanker is used to extract tags from sentence.

func NewTextRanker

func NewTextRanker(file io.Reader) (*TextRanker, error)

NewTextRanker reads a given file and create a new dictionary file for Textranker.

func NewTextRankerAt

func NewTextRankerAt(file string) (*TextRanker, error)

NewTextRankerAt reads a given file and create a new dictionary file for Textranker.

func (*TextRanker) TextRank

func (t *TextRanker) TextRank(sentence string, topK int) Segments

TextRank extract keywords from sentence using TextRank algorithm. Parameter topK specify how many top keywords to be returned at most.

func (*TextRanker) TextRankWithPOS

func (t *TextRanker) TextRankWithPOS(sentence string, topK int, allowPOS []string) Segments

TextRankWithPOS extracts keywords from sentence using TextRank algorithm. Parameter allowPOS allows a customized pos list.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL