fastfuzzy

command module
v0.0.0-...-12c090f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 6, 2022 License: MIT Imports: 10 Imported by: 0

README

Fastfuzzy

This is command line/tool utility to fuzzy search in large text file or piped stream

  • uses Jaro-Winkler distance for fuzziness calculation with optimization to allow searching large file.

  • uses soundex to eliminate words picked within the search threshold but do not sound like the search keyword.

  • defines separator and position for the input string help narrow the search to specific field in csv input.

  • has three built in filters:

    • include: accepts output when fuzzy falls below threshold but above 0.5 and has anther word is the include list.
    • reject: reject the output regardless the match or the fuzzy search outcome.
    • remove: remove certain characters from the string like '@' or '#'

Usage:

Usage of fastfuzzy:
-config string    (configure yaml file)
-debug            (debug on|off)
-include string   (include words)
-input string     (input file or stdin if ignored)
-nosoundex        (soundex on|off if fuzzy matched)
-separator string (sparator char for csv)
-position int     (positon to process the input field when seperator provided for csv)
-reject string    (reject words)
-remove string    (remove words)
-search string    (search keyword)
-threshold float  (search threshold default 0.5)

Examples:

Example for yaml file configuration:

separator: "|"
position: 1
include:
- azzuie
reject:
- uk
- france
remove:
- '#'
- '@'
- '-'
- '/'
- '\'
debug: false
algo:
  search: beaver
  threshold: 0.85
  soundex: true
  debug: false

run using the config file:

./fastfuzzy -config myfconfig.yaml -input testfile.csv

anther example using the piped commandline:

cat testfile.csv | ./fastfuzzy -search "beaver" -include "lakes,ponds" -reject "forest,rivers" -remove "-,&,@" -separator "|" -position 1 --threshold 0.85 -nosoundex
Benchmarking:

Took ~110 seconds to process ~2.5G file with 130M rows on AMD machine 2.3Ghz.

How to build:
git clone https://github.com/ehab7/fastfuzzy
cd fastfuzzy
go build .
To-Do:
  • Adding support to rune currently it is only to English.
  • Adding support for combined words search.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL