minhash

package
v0.0.0-...-c9b0c51 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 17, 2019 License: MIT Imports: 6 Imported by: 0

Documentation

Index

Constants

View Source
const (
	SimilarityThreshold = 0.799
)

We say that two jobs are too similar if they contain 79.9% of the same unique words

Variables

This section is empty.

Functions

func JaccardDistance

func JaccardDistance(left, right *WordSet) float64

JaccardDistance calculate the similarity between two wordSets Useful for determining if two block of texts are similar

Explanation: https://en.wikipedia.org/wiki/Jaccard_index

func JaccardSimilarity

func JaccardSimilarity(left, right string) float64

Calculates similarity between two arbitrary strings

func MinHashSimilar

func MinHashSimilar(left, right string) bool

MinHashSimilar takes two min hash strings as input and returns whether they are similar.

func Similar

func Similar(left, right string) bool

Similar is based on Jaccard index If their similarity thresholds are too high then we return true

func SimilarWordSets

func SimilarWordSets(left, right *WordSet) bool

Same as Similarity but for word sets

func StringsSimilar

func StringsSimilar(left, right string) bool

compare two strings to see if they are similar

Types

type MinHash

type MinHash []int

func GenerateMinHash

func GenerateMinHash(d string) MinHash

GenerateMinHash generates a minhash from a document string This is used to crate a MinHash from scratch

func MinHashFromStr

func MinHashFromStr(mh string) (MinHash, error)

MinHashFromStr takes a min hash string and converts it into a MinHash object

func (MinHash) Len

func (this MinHash) Len() int

Length of MinHash signature

func (MinHash) Str

func (this MinHash) Str() string

Str returns the string representation of a MinHash. Iterate though the size of the MinHash signature Returns a list of MinHash values delimited by " "

type WordSet

type WordSet struct {
	// contains filtered or unexported fields
}

func NewWordSet

func NewWordSet() *WordSet

func NewWordSetFromText

func NewWordSetFromText(text string) *WordSet

func (*WordSet) Add

func (this *WordSet) Add(word string)

func (*WordSet) Contains

func (this *WordSet) Contains(word string) bool

func (*WordSet) Intersection

func (this *WordSet) Intersection(other *WordSet) int

func (*WordSet) Len

func (this *WordSet) Len() int

func (*WordSet) Remove

func (this *WordSet) Remove(word string)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL