stringy

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 23, 2018 License: MIT Imports: 14 Imported by: 0

README

stringy

String analysis functions for search indexing and fuzzy matching

Build Status

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Analyze

func Analyze(in string) (tokens []string)

Analyze normalizes and tokenizes a given input stream

func AnalyzeBytes

func AnalyzeBytes(in []byte) (tokens [][]byte)

AnalyzeBytes normalizes and tokenizes a given input stream

func Bigrams

func Bigrams(tokens []string) (bigrams sort.StringSlice)

Bigrams returns the unique token bigrams for a given ordered list of string tokens

func MSAnalyze

func MSAnalyze(in string) (tokens []string)

MSAnalyze normalizes and tokenizes a given input stream according to rules reverse engineered to match what MS SQL Server full text indexer does

func MSAnalyzeBytes

func MSAnalyzeBytes(in []byte) (tokens [][]byte)

MSAnalyzeBytes normalizes and tokenizes a given input according to rules reverse engineered to match what MS SQL Server full text indexer does

func NGramSimilarity

func NGramSimilarity(a string, b string, ngramLen int) float64

NGramSimilarity calculates the Jaccard similarity of the token ngrams of two input strings

func Shingles

func Shingles(tokens []string) (result []string)

Shingles returns a sorted array of shingle combinations for the given input

func TokenNGrams

func TokenNGrams(in string, ln int) (ngrams []string)

TokenNGrams turns an input like "abcd" into a series of trigrams like ("abc", "bcd") If the input is empty, the result is empty; if the input is 1 or two characters, the output is padded with '$'

func URLAnalyze

func URLAnalyze(in string) (tokens []string)

URLAnalyze attempts to normalize a URL to a simple host name or returns an empty slice

func URLAnalyzeOrEmpty

func URLAnalyzeOrEmpty(in string) (analyzed string)

URLAnalyzeOrEmpty attempts to normalize a URL to a simple host name or returns an empty string

func UnigramsAndBigrams

func UnigramsAndBigrams(tokens []string) (ngrams []string)

UnigramsAndBigrams returns the unique token unigrams and bigrams for a given ordered list of string tokens

func VisitAnalyzedShingles

func VisitAnalyzedShingles(input []byte, tokenizer func(b []byte) [][]byte, visit func(b []byte) (stop bool))

VisitAnalyzedShingles applies the provided tokenizer to the input and then calls the supplied visit function for each shingle of the tokenized input. If input is an empty byte slice, the function returns immediately

func VisitShingles

func VisitShingles(tokens [][]byte, visit func(b []byte) (stop bool))

VisitShingles calls the supplied visit function once per shingle, stopping if the visit function returns true

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL