relevant

package
v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2024 License: Apache-2.0 Imports: 6 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BOW

func BOW(doc Document) []int

BOW turns a document into a bag of words. The words of the document will have been deduplicated. A unique list of word IDs is then returned.

func CalculateBM25Scores added in v0.1.2

func CalculateBM25Scores(query string, documents []string, avgdl float64, k1 float64, b float64) []float64

并行计算所有文档的BM25分数

func Cosine

func Cosine(a []float64, b []float64) float64

func MakeCorpus

func MakeCorpus(a []string) (map[string]int, []string)

func TF

func TF(doc Document) []float64

TF calculates the term frequencies of term. This is useful for scoring functions. It does not make it a unique bag of words.

Types

type Doc

type Doc []int

func (Doc) IDs

func (d Doc) IDs() []int

type DocScore

type DocScore struct {
	ID    int
	Score float64
}

DocScore is a tuple of the document ID and a score

type DocScores

type DocScores []DocScore

DocScores is a list of DocScore

func BM25

func BM25(tf *TFIDF, query Document, docs []Document, k1, b float64) DocScores

BM25 is the scoring function.

k1 should be between 1.2 and 2. b should be around 0.75

func (DocScores) Len

func (ds DocScores) Len() int

func (DocScores) Less

func (ds DocScores) Less(i, j int) bool

func (DocScores) Swap

func (ds DocScores) Swap(i, j int)

type Document

type Document interface {
	IDs() []int
}

Document is a representation of a document.

func MakeDocuments

func MakeDocuments(a []string, c map[string]int) []Document

type ScoreFn

type ScoreFn func(tf *TFIDF, doc Document) []float64

ScoreFn is any function that returns a score of the document.

type TFIDF

type TFIDF struct {
	// Term Frequency
	TF map[int]float64
	// Inverse Document Frequency
	IDF map[int]float64
	// Docs is the count of documents
	Docs int
	// Len is the total length of docs
	Len int
	sync.Mutex
}

TFIDF is a structure holding the relevant state information about TF/IDF

func New

func New() *TFIDF

New creates a new TFIDF structure

func (*TFIDF) Add

func (tf *TFIDF) Add(doc Document)

Add adds a document to the state

func (*TFIDF) CalculateIDF

func (tf *TFIDF) CalculateIDF()

CalculateIDF calculates the inverse document frequency

func (*TFIDF) Score

func (tf *TFIDF) Score(doc Document) []float64

Score calculates the TFIDF score (TF * IDF) for the document without adding the document to the tracked document count.

This function is only useful for a handful of cases. It's recommended you write your own scoring functions.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL