trie

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 9, 2021 License: MIT Imports: 2 Imported by: 0

README

go-trie

Go-Trie is small library implementing trie datastructure with additional skipping optimization. It tries to build skipping indexes in nodes to avoid unnecessary checks. It is not suitable for use-cases with small dictionary (or single word searches). Naive implementations might be much better in those cases! Always measure!

Installation

import "github.com/MartinKuzma/go-trie"

Features and decisions

  • Case sensitive.
  • Functions: Contains, Find, HasPrefix, WordsWithPrefix, SortedWords, FromJson, ToJson
  • Trie can be exported into JSON format and parsed from it, thus saving us from building and optimization steps.
  • Nodes in trie are not saved in map due to performance reasons, binary search is performed instead.
  • Designed to be immutable after it has been created. This ensures thread-safety and less complexity.

Examples

textToSearch :=  `Lorem ipsum ...`

// Create optimized trie
myTrie := trie.NewTrie().
    WithWords(substrings...).
    Optimize(true).
    Build()

myTrie.Find(textToSearch, func(result trie.SearchResult) {
    // Do something useful with result
    fmt.Printf("Found %s at position %d\n", result.Word, result.Position)
})

if myTrie.IsContained(textToSearch) {
    // Atleast one of the words in our trie is present.
}

Benchmarks

These results were achieved by using database with 3000 words. Naive implementation uses strings.Index and strings.Contains functions. While the results represent huge performance advantage, we can easily find situations where naive implementation is much faster.

Always use measurements that fit your data and use-case.

Trie initialization included

cpu: Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
BenchmarkNaiveFind-4         	      90	  13292769 ns/op	   81896 B/op	      12 allocs/op
BenchmarkTrieFind-4          	    2292	    504501 ns/op	   82282 B/op	      20 allocs/op
BenchmarkNaiveContained-4    	    4428	    250002 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrieIsContained-4   	 6020218	       201.2 ns/op	       0 B/op	       0 allocs/op

Trie initialization not included

cpu: Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
BenchmarkNaiveFind-4         	      82	  13285727 ns/op	   81896 B/op	      12 allocs/op
BenchmarkTrieFind-4          	    2392	    514459 ns/op	   81896 B/op	      12 allocs/op
BenchmarkNaiveContained-4    	    4500	    249894 ns/op	       0 B/op	       0 allocs/op
BenchmarkTrieIsContained-4   	 6042928	       198.6 ns/op	       0 B/op	       0 allocs/op

Future improvements

  • Improve performance of optimization algorithm
  • Provide better examples and tests
  • Implement naive version for small-sized database
    • Find out what constitues as a small database
  • Find out good hashing function for searching children

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Node

type Node struct {
	Key      byte // Key/Character this node represents
	Word     int  // index of word
	Skip     int  // characters to skip
	Children []*Node
}

func (*Node) AddChild

func (n *Node) AddChild(char byte) *Node

func (*Node) FindChild

func (n *Node) FindChild(char byte) *Node

func (*Node) HasWord added in v1.0.0

func (n *Node) HasWord() bool

type SearchResult

type SearchResult struct {
	Word     string
	Position int
}

type Trie

type Trie struct {
	// contains filtered or unexported fields
}

func FromJson

func FromJson(data []byte) (*Trie, error)

func (*Trie) Find

func (trie *Trie) Find(text string, f func(item SearchResult))

Find searches for all occurences in Trie datastructure.

func (*Trie) HasPrefix added in v1.0.0

func (trie *Trie) HasPrefix(prefix string) bool

func (*Trie) IsContained

func (trie *Trie) IsContained(text string) bool

IsContained searches for first occurence of any word in Trie datastructure.

func (*Trie) SortedWords added in v1.0.0

func (trie *Trie) SortedWords() []string

func (*Trie) ToJson

func (trie *Trie) ToJson() ([]byte, error)

func (*Trie) Words added in v1.0.0

func (trie *Trie) Words() []string

func (*Trie) WordsWithPrefix added in v1.0.0

func (trie *Trie) WordsWithPrefix(prefix string) []string

type TrieBuilder

type TrieBuilder struct {
	// contains filtered or unexported fields
}

func NewTrie

func NewTrie() *TrieBuilder

func (*TrieBuilder) AddWord

func (tb *TrieBuilder) AddWord(word string) *TrieBuilder

AddWord adds new word to list of words we will be searching for

func (*TrieBuilder) Build

func (tb *TrieBuilder) Build() *Trie

Build returns build search trie

func (*TrieBuilder) Optimize

func (tb *TrieBuilder) Optimize(optimize bool) *TrieBuilder

func (*TrieBuilder) WithWords

func (tb *TrieBuilder) WithWords(words ...string) *TrieBuilder

AddWord adds new word to list of words we will be searching for

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL