analyse

package

v0.0.0-...-36c17a1 Latest Latest Go to latest Published: Dec 3, 2022 License: AGPL-3.0 Imports: 11 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/fumiama/jieba

Links

Open Source Insights

Documentation ¶

Overview ¶

Package analyse is the Golang implementation of Jieba's analyse module.

Example (ExtractTags) ¶

var t TagExtracter
t.LoadDictionaryAt("../dict.txt")
t.LoadIdfAt("idf.txt")

sentence := "这是一个伸手不见五指的黑夜。我叫孙悟空，我爱北京，我爱Python和C++。"
segments := t.ExtractTags(sentence, 5)
fmt.Printf("Top %d tags:", len(segments))
for _, segment := range segments {
	fmt.Printf(" %s /", segment.Text())
}

Output:

Top 5 tags: Python / C++ / 伸手不见五指 / 孙悟空 / 黑夜 /

Example (TextRank) ¶

t, err := NewTextRankerAt("../dict.txt")
if err != nil {
	panic(err)
}

sentence := "此外，公司拟对全资子公司吉林欧亚置业有限公司增资4.3亿元，增资后，吉林欧亚置业注册资本由7000万元增加到5亿元。吉林欧亚置业主要经营范围为房地产开发及百货零售等业务。目前在建吉林欧亚城市商业综合体项目。2013年，实现营业收入0万元，实现净利润-139.13万元。"

result := t.TextRank(sentence, 10)
for _, segment := range result {
	fmt.Printf("%s %f\n", segment.Text(), segment.Weight())
}

Output:

吉林 1.000000
欧亚 0.878078
置业 0.562048
实现 0.520906
收入 0.384284
增资 0.360591
子公司 0.353132
城市 0.307509
全资 0.306324
商业 0.306138

Index ¶

Variables
type Idf
- func NewIdf() *Idf
type Segment
- func (s Segment) Text() string
- func (s Segment) Weight() float64
type Segments
type StopWord
- func NewStopWord() *StopWord
type TagExtracter
type TextRanker
- func NewTextRanker(file io.Reader) (*TextRanker, error)
- func NewTextRankerAt(file string) (*TextRanker, error)
- func (t *TextRanker) TextRank(sentence string, topK int) Segments
- func (t *TextRanker) TextRankWithPOS(sentence string, topK int, allowPOS []string) Segments

Constants ¶

This section is empty.

Variables ¶

View Source

var DefaultStopWordMap = map[string]int{
	"the":   1,
	"of":    1,
	"is":    1,
	"and":   1,
	"to":    1,
	"in":    1,
	"that":  1,
	"we":    1,
	"for":   1,
	"an":    1,
	"are":   1,
	"by":    1,
	"be":    1,
	"as":    1,
	"on":    1,
	"with":  1,
	"can":   1,
	"if":    1,
	"from":  1,
	"which": 1,
	"you":   1,
	"it":    1,
	"this":  1,
	"then":  1,
	"at":    1,
	"have":  1,
	"all":   1,
	"not":   1,
	"one":   1,
	"has":   1,
	"or":    1,
}

DefaultStopWordMap contains some stop words.

Functions ¶

This section is empty.

Types ¶

type Idf ¶

type Idf struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

Idf represents a thread-safe dictionary for all words with their IDFs(Inverse Document Frequency).

func NewIdf ¶

func NewIdf() *Idf

NewIdf creates a new Idf instance.

func (*Idf) AddToken ¶

func (i *Idf) AddToken(token dictionary.Token)

AddToken adds a new word with IDF into it's dictionary.

func (*Idf) Frequency ¶

func (i *Idf) Frequency(key string) (float64, bool)

Frequency returns the IDF of given word.

func (*Idf) Load ¶

func (i *Idf) Load(tokens ...dictionary.Token)

Load loads all tokens into it's dictionary.

type Segment ¶

type Segment struct {
	// contains filtered or unexported fields
}

Segment represents a word with weight.

func (Segment) Text ¶

func (s Segment) Text() string

Text returns the segment's text.

func (Segment) Weight ¶

func (s Segment) Weight() float64

Weight returns the segment's weight.

type Segments ¶

type Segments []Segment

Segments represents a slice of Segment.

func (Segments) Len ¶

func (ss Segments) Len() int

func (Segments) Less ¶

func (ss Segments) Less(i, j int) bool

func (Segments) Swap ¶

func (ss Segments) Swap(i, j int)

type StopWord ¶

type StopWord struct {
	sync.RWMutex
	// contains filtered or unexported fields
}

StopWord is a thread-safe dictionary for all stop words.

func NewStopWord ¶

func NewStopWord() *StopWord

NewStopWord create a new StopWord with default stop words.

func (*StopWord) AddToken ¶

func (s *StopWord) AddToken(token dictionary.Token)

AddToken adds a token into StopWord dictionary.

func (*StopWord) IsStopWord ¶

func (s *StopWord) IsStopWord(word string) bool

IsStopWord checks if a given word is stop word.

func (*StopWord) Load ¶

func (s *StopWord) Load(tokens ...dictionary.Token)

Load loads all tokens into StopWord dictionary.

type TagExtracter ¶

type TagExtracter struct {
	// contains filtered or unexported fields
}

TagExtracter is used to extract tags from sentence.

func (*TagExtracter) ExtractTags ¶

func (t *TagExtracter) ExtractTags(sentence string, topK int) (tags Segments)

ExtractTags extracts the topK key words from sentence.

func (*TagExtracter) LoadDictionary ¶

func (t *TagExtracter) LoadDictionary(file io.Reader) (err error)

LoadDictionary reads the given filename and create a new dictionary.

func (*TagExtracter) LoadDictionaryAt ¶

func (t *TagExtracter) LoadDictionaryAt(file string) (err error)

LoadDictionaryAt reads the given filename and create a new dictionary.

func (*TagExtracter) LoadIdf ¶

func (t *TagExtracter) LoadIdf(file io.Reader) error

LoadIdf reads the given file and create a new Idf dictionary.

func (*TagExtracter) LoadIdfAt ¶

func (t *TagExtracter) LoadIdfAt(fileName string) error

LoadIdfAt reads the given file and create a new Idf dictionary.

func (*TagExtracter) LoadStopWords ¶

func (t *TagExtracter) LoadStopWords(file io.Reader) error

LoadStopWords reads the given file and create a new StopWord dictionary.

func (*TagExtracter) LoadStopWordsAt ¶

func (t *TagExtracter) LoadStopWordsAt(file string) error

LoadStopWordsAt reads the given file and create a new StopWord dictionary.

type TextRanker ¶

type TextRanker posseg.Segmenter

TextRanker is used to extract tags from sentence.

func NewTextRanker ¶

func NewTextRanker(file io.Reader) (*TextRanker, error)

NewTextRanker reads a given file and create a new dictionary file for Textranker.

func NewTextRankerAt ¶

func NewTextRankerAt(file string) (*TextRanker, error)

NewTextRankerAt reads a given file and create a new dictionary file for Textranker.

func (*TextRanker) TextRank ¶

func (t *TextRanker) TextRank(sentence string, topK int) Segments

TextRank extract keywords from sentence using TextRank algorithm. Parameter topK specify how many top keywords to be returned at most.

func (*TextRanker) TextRankWithPOS ¶

func (t *TextRanker) TextRankWithPOS(sentence string, topK int, allowPOS []string) Segments

TextRankWithPOS extracts keywords from sentence using TextRank algorithm. Parameter allowPOS allows a customized pos list.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL