fastfuzzy

command module

v0.0.0-...-12c090f Latest Latest Go to latest Published: Dec 6, 2022 License: MIT Imports: 10 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ehab7/fastfuzzy

Links

Open Source Insights

README ¶

Fastfuzzy

This is command line/tool utility to fuzzy search in large text file or piped stream

uses Jaro-Winkler distance for fuzziness calculation with optimization to allow searching large file.
uses soundex to eliminate words picked within the search threshold but do not sound like the search keyword.
defines separator and position for the input string help narrow the search to specific field in csv input.
has three built in filters:
- include: accepts output when fuzzy falls below threshold but above 0.5 and has anther word is the include list.
- reject: reject the output regardless the match or the fuzzy search outcome.
- remove: remove certain characters from the string like '@' or '#'

Usage:

Usage of fastfuzzy:
-config string    (configure yaml file)
-debug            (debug on|off)
-include string   (include words)
-input string     (input file or stdin if ignored)
-nosoundex        (soundex on|off if fuzzy matched)
-separator string (sparator char for csv)
-position int     (positon to process the input field when seperator provided for csv)
-reject string    (reject words)
-remove string    (remove words)
-search string    (search keyword)
-threshold float  (search threshold default 0.5)

Examples:

Example for yaml file configuration:

separator: "|"
position: 1
include:
- azzuie
reject:
- uk
- france
remove:
- '#'
- '@'
- '-'
- '/'
- '\'
debug: false
algo:
  search: beaver
  threshold: 0.85
  soundex: true
  debug: false

run using the config file:

./fastfuzzy -config myfconfig.yaml -input testfile.csv

anther example using the piped commandline:

cat testfile.csv | ./fastfuzzy -search "beaver" -include "lakes,ponds" -reject "forest,rivers" -remove "-,&,@" -separator "|" -position 1 --threshold 0.85 -nosoundex

Benchmarking:

Took ~110 seconds to process ~2.5G file with 130M rows on AMD machine 2.3Ghz.

How to build:

git clone https://github.com/ehab7/fastfuzzy
cd fastfuzzy
go build .

To-Do:

Adding support to rune currently it is only to English.
Adding support for combined words search.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
algo
configure

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL