Documentation ¶
Overview ¶
Package ferret implements a fast in-memory substring search engine
Index ¶
- Variables
- func ErrorCorrect(Word []byte, AllowedBytes []byte) [][]byte
- func ToASCII(r rune) rune
- func UnicodeToLowerASCII(s string) []byte
- type InvertedSuffix
- func (IS *InvertedSuffix) ErrorCorrectingQuery(Word string, ResultsLimit int, ErrorCorrection func([]byte) [][]byte) ([]string, []interface{})
- func (IS *InvertedSuffix) Insert(Word, Result string, Data interface{})
- func (IS *InvertedSuffix) Query(Word string, ResultsLimit int) ([]string, []interface{})
- func (IS *InvertedSuffix) Search(Query []byte) (int, int)
- func (IS *InvertedSuffix) SortedErrorCorrectingQuery(Word string, ResultsLimit int, ErrorCorrection func([]byte) [][]byte, ...) ([]string, []interface{}, []float64)
- func (IS *InvertedSuffix) SortedQuery(Word string, ResultsLimit int, ...) ([]string, []interface{}, []float64)
Constants ¶
This section is empty.
Variables ¶
var AllASCII = []byte{}/* 129 elements not displayed */
AllASCII is all ASCII bytes (0-127)
var LowercaseASCII = []byte{
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105,
106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118,
119, 120, 121, 122, 123, 124, 125, 126,
}
LowercaseASCII is all printable ASCII bytes excluding capitalized letters (A-Z)
var LowercaseLetters = []byte{
97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110,
111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122,
}
LowercaseLetters is All lowercase ASCII bytes (a-z/97-122)
var PrintableASCII = []byte{
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82,
83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113,
114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126,
}
PrintableASCII is all printable ASCII bytes
var UnicodeToASCII = map[rune]rune{
'À': 'A', 'Á': 'A', 'Â': 'A', 'Ã': 'A', 'Ä': 'A', 'Å': 'A',
'Ç': 'C',
'È': 'E', 'É': 'E', 'Ê': 'E', 'Ë': 'E',
'Ì': 'I', 'Í': 'I', 'Î': 'I', 'Ï': 'I',
'Ñ': 'N',
'Ò': 'O', 'Ó': 'O', 'Ô': 'O', 'Õ': 'O', 'Ö': 'O', 'Ø': 'O',
'Ù': 'U', 'Ú': 'U', 'Û': 'U', 'Ü': 'U',
'Ý': 'Y',
'à': 'a', 'á': 'a', 'â': 'a', 'ã': 'a', 'ä': 'a', 'å': 'a',
'ç': 'c',
'è': 'e', 'é': 'e', 'ê': 'e', 'ë': 'e',
'ì': 'i', 'í': 'i', 'î': 'i', 'ï': 'i',
'ð': 'o',
'ñ': 'n',
'ò': 'o', 'ó': 'o', 'ô': 'o', 'õ': 'o', 'ö': 'o', 'ø': 'o',
'ù': 'u', 'ú': 'u', 'û': 'u', 'ü': 'u',
'ý': 'y', 'ÿ': 'y',
}
UnicodeToASCII maps the unicode Latin-1 supplement to ASCII characters without accents
Functions ¶
func ErrorCorrect ¶
ErrorCorrect returns all byte-arrays which are Levenshtein distance of 1 away from Word within an allowed array of byte characters.
func ToASCII ¶
ToASCII converts a single unicode rune to the ASCII equivalent using the UnicodeToASCII table
func UnicodeToLowerASCII ¶
UnicodeToLowerASCII converts a unicode string to ASCII bytes using the UnicodeToASCII table
Types ¶
type InvertedSuffix ¶
type InvertedSuffix struct { WordIndex []int // WordIndex and SuffixIndex are sorted by Words[WordIndex[i]][SuffixIndex[i]:] SuffixIndex []int // WordIndex and SuffixIndex are sorted by Words[WordIndex[i]][SuffixIndex[i]:] Words [][]byte // Words is the list of words (in []byte form) to perform substring searches over Results []string // Results is the string value of the words. Used as a return value Values []interface{} // Values is some data mapped to the words. Can be used for sorting, or as a return value Converter func(string) []byte // Converter converts an inserted word/query to a byte array to search for/with }
InvertedSuffix implements the data structure for substring searches
func New ¶
func New(Words, Results []string, Data []interface{}, Converter func(string) []byte) *InvertedSuffix
New creates an inverted suffix from a dictionary of byte arrays, mapping data, and a string->[]byte converter
func (*InvertedSuffix) ErrorCorrectingQuery ¶
func (IS *InvertedSuffix) ErrorCorrectingQuery(Word string, ResultsLimit int, ErrorCorrection func([]byte) [][]byte) ([]string, []interface{})
ErrorCorrectingQuery returns the strings which contain the query Unsorted, I think it's partially sorted alphabetically Will search for all substrings defined by ErrorCorrection if no results are found on the initial query Input:
Query: The substring to search for. ResultsLimit: Limit the results so you don't return your whole dictionary by accident. Set to -1 for no limit ErrorCorrection: Returns a list of alternate queries
func (*InvertedSuffix) Insert ¶
func (IS *InvertedSuffix) Insert(Word, Result string, Data interface{})
Insert adds a word to the dictionary that IS was built on. This is pretty slow, because of linear-time insertion into an array, so stick to New when you can
func (*InvertedSuffix) Query ¶
func (IS *InvertedSuffix) Query(Word string, ResultsLimit int) ([]string, []interface{})
Query returns the strings which contain the query, and their stored values unsorted Input:
Word: The substring to search for. ResultsLimit: Limit the results to some number of values. Set to -1 for no limit
func (*InvertedSuffix) Search ¶
func (IS *InvertedSuffix) Search(Query []byte) (int, int)
Search performs an exact substring search for the query in the word dictionary Returns the boundaries (low/high) of sorted suffixes which have the query as a prefix This is a low-level interface. I wouldn't recommend using this yourself
func (*InvertedSuffix) SortedErrorCorrectingQuery ¶
func (IS *InvertedSuffix) SortedErrorCorrectingQuery(Word string, ResultsLimit int, ErrorCorrection func([]byte) [][]byte, Sorter func(string, interface{}, int, int) float64) ([]string, []interface{}, []float64)
SortedErrorCorrectingQuery returns the strings which contain the query Sorted. The function sorter produces a value to sort by (largest first) Will search for all substrings defined by ErrorCorrection if no results are found on the initial query Input:
Query: The substring to search for. ResultsLimit: Limit the results so you don't return your whole dictionary by accident. Set to -1 for no limit ErrorCorrection: Returns a list of alternate queries Sorter: Takes (Result, Value, Length, Index (where Query begins in Result)) (string, []byte, int, int), and produces a value (float64) to sort by (largest first).
func (*InvertedSuffix) SortedQuery ¶
func (IS *InvertedSuffix) SortedQuery(Word string, ResultsLimit int, Sorter func(string, interface{}, int, int) float64) ([]string, []interface{}, []float64)
SortedQuery returns the strings which contain the query sorted The function sorter produces a value to sort by (largest first) Input:
Word: The substring to search for. ResultsLimit: Limit the results to some number of values. Set to -1 for no limit Sorter: Takes (Result, Value, Length, Index (where Query begins in Result)) (string, []byte, int, int) and produces a value (float64) to sort by (largest first).