search

package
v0.0.0-...-23397e3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 27, 2023 License: MIT Imports: 12 Imported by: 0

README

Search Index

Search Term Normalization

The user input search term undergoes normalization:

  1. Trim space
  2. Lowercase
  3. Remove invalid UTF-8 characters
  4. Detect and remove quotes in the form '" (activates exact search mode)

Wildcards are not supported.

Generic Text Normalization

  1. Trim space
  2. Lowercase
  3. Remove invalid UTF-8 characters

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CamelCaseSplit

func CamelCaseSplit(src string) (entries []string)

Split splits the camelcase word and returns a list of words. It also supports digits. Both lower camel case and upper camel case are supported. For more info please check: http://en.wikipedia.org/wiki/CamelCase

Examples

"" =>                     [""]
"lowercase" =>            ["lowercase"]
"Class" =>                ["Class"]
"MyClass" =>              ["My", "Class"]
"MyC" =>                  ["My", "C"]
"HTML" =>                 ["HTML"]
"PDFLoader" =>            ["PDF", "Loader"]
"AString" =>              ["A", "String"]
"SimpleXMLParser" =>      ["Simple", "XML", "Parser"]
"vimRPCPlugin" =>         ["vim", "RPC", "Plugin"]
"GL11Version" =>          ["GL", "11", "Version"]
"99Bottles" =>            ["99", "Bottles"]
"May5" =>                 ["May", "5"]
"BFG9000" =>              ["BFG", "9000"]
"BöseÜberraschung" =>     ["Böse", "Überraschung"]
"Two  spaces" =>          ["Two", "  ", "spaces"]
"BadUTF8\xe2\xe2\xa1" =>  ["BadUTF8\xe2\xe2\xa1"]

Splitting rules

  1. If string is not valid UTF-8, return it without splitting as -> removed in this fork. single item array.
  2. Assign all unicode characters into one of 4 sets: lower case letters, upper case letters, numbers, and all other characters.
  3. Iterate through characters of string, introducing splits between adjacent characters that belong to different sets.
  4. Iterate through array of split strings, and if a given string is upper case: if subsequent string is lower case: move last character of upper case string to beginning of lower case string

Types

type SearchIndexRecord

type SearchIndexRecord struct {
	// List of selectors that found the result. Multiple keywords may find the same file.
	Selectors []SearchSelector

	// result data
	FileID            uuid.UUID
	PublicKey         *btcec.PublicKey
	BlockchainVersion uint64
	BlockNumber       uint64
}

SearchIndexRecord identifies a hash to a given file

type SearchIndexStore

type SearchIndexStore struct {
	Database store.Store // The database storing the blockchain.
	sync.RWMutex
}

This database stores hashes of keywords for file search.

func InitSearchIndexStore

func InitSearchIndexStore(DatabaseDirectory string) (searchIndex *SearchIndexStore, err error)

func (*SearchIndexStore) IndexHash

func (index *SearchIndexStore) IndexHash(publicKey *btcec.PublicKey, blockchainVersion, blockNumber uint64, fileID uuid.UUID, hash []byte) (err error)

IndexHash indexes a new hash

func (*SearchIndexStore) IndexNewBlock

func (index *SearchIndexStore) IndexNewBlock(publicKey *btcec.PublicKey, blockchainVersion, blockNumber uint64, raw []byte)

func (*SearchIndexStore) IndexNewBlockDecoded

func (index *SearchIndexStore) IndexNewBlockDecoded(publicKey *btcec.PublicKey, blockchainVersion, blockNumber uint64, recordsDecoded []interface{})

Indexes a new decoded block. Currently it only indexes file records.

func (*SearchIndexStore) LookupHash

func (index *SearchIndexStore) LookupHash(selector SearchSelector, resultMap map[uuid.UUID]*SearchIndexRecord) (err error)

LookupHash returns all index records stored for the hash.

func (*SearchIndexStore) Search

func (index *SearchIndexStore) Search(term string) (results []SearchIndexRecord)

func (*SearchIndexStore) SearchNodeIDBasedOnHash

func (index *SearchIndexStore) SearchNodeIDBasedOnHash(hash []byte) (NodeIDs [][]byte, err error)

SearchNodeIDBasedOnHash Provides a list of NodeIDs based on the hash provided This is used to find out which nodes are hosting which files based on the hash provided

func (*SearchIndexStore) UnindexBlockchain

func (index *SearchIndexStore) UnindexBlockchain(publicKey *btcec.PublicKey)

UnindexBlockchain deletes all index for a given blockchain. This is intentionally not done on a version/block level, because it could easily lead to orphans.

func (*SearchIndexStore) UnindexHash

func (index *SearchIndexStore) UnindexHash(fileID uuid.UUID, hash []byte) (err error)

UnindexHash deletes a index record. If there are no more files associated with the hash, the entire hash record is deleted.

type SearchSelector

type SearchSelector struct {
	Word        string // Normalized version of the word
	Hash        []byte // Hash of the word
	ExactSearch bool   // Indicates this is an exact search term, for example a full filename.
}

A search selector is a term that discovers a file.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL