unic

package module
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 24, 2021 License: MIT Imports: 5 Imported by: 0

README

unic

Go Report Card GoDoc

Works like UNIX sort | uniq except you don't have to call sort first.

Works by using Cuckoo Filters - See: https://github.com/seiflotfy/cuckoofilter

Advantages over sort | uniq

Quicker output, lower memory footprint

sort by definitions needs to buffer the entire input before it can begin outputing anything. This can use a lot of memory and prevents anything from getting output until the initial process completes.

unic uses probabalistic filters (Cuckoo) to determine if the input has been seen before, and can begin output after the first line of input.

Original item order is kept

Given the list 3 1 2 1 2 3, compare sort | uniq 's output

$ echo '3\n1\n2\n1\n2\n3' | sort | uniq
1
2
3

to unic

echo '3\n1\n2\n1\n2\n3' | unic
3
1
2

Disadvantages

Probabilistic Filtering

As unic works with Cuckoo Filters, there is a very small probability a line will be wrongly marked duplicate. Lines will never be incorrectly marked as unique due to the nature of the filter.

In cases where a false positive cannot ever be tolerated, unic should not be used.

Not compatible with all of uniq's flags

unic by nature does not buffer; thus some of uniq's flags cannot be implemented.

In these cases, you should use uniq.

Installing

Binaries

See: releases

Compile
$ go get -u -v github.com/donatj/unic

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func FilterCaseInsensitive

func FilterCaseInsensitive(f *Filter) error

FilterCaseInsensitive configures the Filter to be Case Insensitive

Types

type Filter

type Filter struct {
	CaseI          bool
	FilterCapacity uint
}

Filter is a unique filter utilizing Cuckoo Filters

func NewFilter

func NewFilter(options ...FilterOption) (*Filter, error)

NewFilter returns a Filter configured with the given FilterOptions

func (*Filter) Exec

func (u *Filter) Exec(input io.Reader, unique, repeated io.Writer) error

Exec executes the filter on the given input. Writes unique output to the unique stream. Writes repeated output to the repeated stream.

type FilterOption

type FilterOption func(*Filter) error

FilterOption sets an option of the passed Filter

func FilterCapacity

func FilterCapacity(capacity uint) FilterOption

FilterCapacity sets the cuckoo filter capacity for the Filter's internal cuckoo filters

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL