sffuzzy

package module
v0.0.0-...-a49092d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 29, 2020 License: MIT Imports: 5 Imported by: 0

README

Simple fast fuzzy

Fuzzy library written in go.

This library is a simple fuzzy search with unicode normalization and an arbitrary score system.

Test the library

git clone https://github.com/eregnier/sffuzzy
cd ssfuzzy
make test-trace

Usage

go get github.com/eregnier/sffuzzy

Usage samples are in test.go

A minimal usage code below:

  //One shot search
  names := []string{"super man", "super noel", "super du"}
  results := fuzzy.SearchOnce("perdu", &names, fuzzy.Options{Sort: true, Limit: 5, Normalize: true})
  //Use search cache for performance
  names := []string{"super man", "super noel", "super du"}
  options := fuzzy.Options{Sort: true, Limit: 5, Normalize: true}
  cacheTargets := fuzzy.Prepare(&names, options)
  results := fuzzy.Search("perdu", cacheTargets, options)

Options

  options := fuzzy.Options{Sort: true, Normalize: true, Limit: 10}

This options structure have the following options

Prop Type Description
Sort bool Orders result depending on results score
Normalize bool Handles searches in texts with special characters. Make search more flexible / less strict
Limit int Define how many results are kept un search return value

Performances

The given sample file is a flat csv loaded as string list.

Multi thread performances are worse with my code, so I reverted

On a AMD 3600x and on a single core I fuzzy search a text in 40 ms in SearchOnce mode

When I build cache with a Prepare(data *[]string), and then I run a Search, the prepare takes about 36ms and the Search about 4ms on the data sample of ~155K lines for 320Ko.

Execution

The following code test.go

Have the following output

2020/04/29 02:07:57 TestMinimalSearch &{[{super du 8 5 1} {super man 3 3 4} {super noel 3 3 5}] 8}
2020/04/29 02:07:57 TestMinimalSearchCache &{[{super du 8 5 1} {super man 3 3 4} {super noel 3 3 5}] 8}
2020/04/29 02:07:57  + Cache search, first search is slower.
2020/04/29 02:07:57  🕑 Duration: 9.435114ms
2020/04/29 02:07:57  + Cached searches
2020/04/29 02:07:57  🕑 Duration: 3.883137ms
2020/04/29 02:07:57 [{San Francisco;United States 13 10 16} {South San Francisco;United States 12 10 22} {St. Francis;United States 12 10 19}]
2020/04/29 02:07:57  🕑 Duration: 2.100023ms
2020/04/29 02:07:57 [{Mumbai;India 11 6 0} {Mumbwa;Zambia 8 6 6} {Mount Gambier;Australia 8 6 16}]
2020/04/29 02:07:57  🕑 Duration: 3.992907ms
2020/04/29 02:07:57 [{Hong Kong;Hong Kong 16 8 0} {Xiangkhoang;Laos 13 8 3} {Mokhotlong;Lesotho 12 8 7}]
2020/04/29 02:07:57  🕑 Duration: 2.312343ms
2020/04/29 02:07:57 [{Agadez;Niger 11 6 0} {Várzea Grande;Brazil 8 6 7} {Altagracia de Orituco;Venezuela 8 6 21}]
2020/04/29 02:07:57  🕑 Duration: 2.034594ms
2020/04/29 02:07:57 [{Palmas;Brazil 10 5 0} {La Palma;Panama 10 5 0} {Las Palmas de Gran Canaria;Spain 10 5 0}]
2020/04/29 02:07:57  🕑 Duration: 4.029126ms
2020/04/29 02:07:57 [{Sucre;Bolivia 20 12 0} {Quime;Bolivia 13 7 0} {Villazón;Bolivia 13 7 0}]
2020/04/29 02:07:57  🕑 Duration: 3.910717ms
2020/04/29 02:07:57 [{Ibb;Yemen 16 8 0} {Ibb;Yemen 16 8 0} {Dhamār;Yemen 11 5 0}]
2020/04/29 02:07:57  🕑 Duration: 3.826816ms
2020/04/29 02:07:57 [{West View;United States 16 8 0} {Westview;United States 16 8 0} {Viera West;United States 14 8 3}]
2020/04/29 02:07:57  + Search all at once
2020/04/29 02:07:57  🕑 Duration: 44.714400ms
2020/04/29 02:07:57 Print plain unmarshaled json results
2020/04/29 02:07:57 [
  {
    "target": "Ōsaka;Japan",
    "score": 13,
    "matchCount": 10,
    "typos": 1
  },
  {
    "target": "Northwest Harborcreek;United States",
    "score": 5,
    "matchCount": 5,
    "typos": 29
  },
  {
    "target": "Oshakati;Namibia",
    "score": 5,
    "matchCount": 5,
    "typos": 11
  },
  {
    "target": "Colombo;Sri Lanka",
    "score": 5,
    "matchCount": 5,
    "typos": 11
  },
  {
    "target": "Coxsackie;United States",
    "score": 5,
    "matchCount": 5,
    "typos": 17
  }
]
PASS
ok  	_/home/utopman/sources/sffuzzy	0.083s

The cli

There is a simple cli bundled with this library which usage is in the executable help

go run cli.go --help
Usage: cli [--limit LIMIT] [--sort] [--normalize] SEARCH

Positional arguments:
  SEARCH                 Search terms to find in given data

Options:
  --limit LIMIT, -l LIMIT
                         Results limit, use -1 for no limit [default: 10]
  --sort, -s             Whether or not results are sorted [default: true]
  --normalize, -n        normalize search string and data string for searching. It fuzzy search with no accents/special characters [default: true]
  --help, -h             display this help and exit

Here is a sample about how to use it from this repository sources

head ../sample.csv | go run cli.go -s=1 -l=3 "kol"
[
  {
    "target": "Kolkata;India",
    "score": 8,
    "matchCount": 3,
    "typos": 0
  },
  {
    "target": "Tokyo;Japan",
    "score": 2,
    "matchCount": 2,
    "typos": 7
  },
  {
    "target": "Mexico City;Mexico",
    "score": 2,
    "matchCount": 0,
    "typos": 0
  }
]

It just takes a flat newline separated database to query in stdin. Then it returns the json results as raw indented json depending on given query string and parameters

Licence

MIT

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Prepare

func Prepare(targets *[]Target, options Options) *[]CacheTarget

Prepare data set for multi searches

Types

type CacheTarget

type CacheTarget struct {
	// contains filtered or unexported fields
}

CacheTarget : data structure handling search payload cache

type Options

type Options struct {
	Sort      bool
	Normalize bool
	Limit     int
}

Options : Search options

type SearchResult

type SearchResult struct {
	Results   []algorithmResult `json:"results"`
	BestScore int               `json:"bestScore"`
}

SearchResult : Current search results wrapper

func Search(search string, cacheTargets *[]CacheTarget, options Options) *SearchResult

Search : function to perform the fuzzy search

func SearchOnce

func SearchOnce(search string, targets *[]Target, options Options) *SearchResult

SearchOnce : shorthand function to trigger search and caching at once

type Target

type Target struct {
	// contains filtered or unexported fields
}

Target is a search element that holds a searchable token and a related arbitrary string document

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL