simhash

package
v0.0.0-...-d823e92 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 17, 2021 License: MIT Imports: 9 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// EnStopWords 英文停词表
	EnStopWords = map[string]struct{}{}/* 561 elements not displayed */

	// ChStopWords 中文停词表
	ChStopWords = map[string]struct{}{}/* 1513 elements not displayed */

	// SpStopWords 特殊词停词表
	SpStopWords = map[string]struct{}{}/* 250 elements not displayed */

)

Functions

func Bytes2String

func Bytes2String(buf []byte) string

Bytes2String fast type conversion from byte array to string, both share the same mem pointer.

func GetByteOrder

func GetByteOrder() binary.ByteOrder

Types

type HashWeightPair

type HashWeightPair struct {
	// contains filtered or unexported fields
}

type LanguageType

type LanguageType int8
const (
	ENGLISH LanguageType = 0
	CHINESE LanguageType = 1
)

type SimHash

type SimHash struct {
	// contains filtered or unexported fields
}

SimHash implements the Standard-Cuckoo-Filter mentioned by "Detecting Near-Duplicates for Web Crawling".

func NewSimHash

func NewSimHash(language LanguageType, dict string) *SimHash

func (*SimHash) Fingerprint

func (sh *SimHash) Fingerprint(text []byte, topNOpts ...uint32) uint64

func (*SimHash) FingerprintToString

func (sh *SimHash) FingerprintToString(fingerprint uint64) string

func (*SimHash) IsEqual

func (sh *SimHash) IsEqual(lhs uint64, rhs uint64, nOpts ...uint8) bool

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL