simhash

package module
v0.0.0-...-e206355 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 2, 2018 License: MIT Imports: 2 Imported by: 0

README

simhash

a Golang implementation of Simhash Algorithm

demo

package main

import (
	"fmt"

	"github.com/safeie/simhash"
)

func main() {
	s1 := simhash.Simhash("this is a project for golang implementation of simhash algorithm")
	s2 := simhash.Simhash("this is a project for java   implementation of simhash algorithm")

	fmt.Println("distance:", simhash.Distance(s1, s2))
	fmt.Println("similars:", simhash.Similar(s1, s2))
}

Documentation

Index

Constants

View Source
const (
	// HashSize simhash length,choose 32/64
	HashSize = 32
)

Variables

This section is empty.

Functions

func Distance

func Distance(a, b uint64) int

Distance calc two simhash haiming distance

func Simhash

func Simhash(input string) uint64

Simhash calc given string simhash

func Similar

func Similar(a, b uint64) float64

Similar calc two simhash similar percent

Types

type Tokenizer

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer word tokenizer

func NewTokenizer

func NewTokenizer(chunkSize, overlapSize uint8) *Tokenizer

NewTokenizer create a new tokenizer chunkSize, suggestion value: 4 overlapSize, suggestion value: 1

func (*Tokenizer) Tokenize

func (t *Tokenizer) Tokenize(input string) []TokenizerChunk

Tokenize execute tokenize simple set words weight value: 1

type TokenizerChunk

type TokenizerChunk struct {
	Word   string
	Weight int
}

TokenizerChunk word chunk

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL