ics

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 19, 2022 License: MIT Imports: 7 Imported by: 0

README

Inverval Containment Sets implemened in GO

GoDoc

Interval containment sets (ICS) provide efficient solution for testing containment in overlapping arithmetic intervals. This includes binary classification for discrete and continuous values like integer of floating point numbers, but can also be extended to arbitrary comparable and orderable values.

Description

Assume you have a set of intervals, maybe overlapping, describing a certain property of a value or attributing it to a class. To test if a value is contained within the set, you can simply iterate over all the intervals comparing it to lower and upper limits. This would be a straightforward approach that works nicely if you have just a few intervals, or if the overall speed is not important.

When performance and simplicity matters, however, this library offers a more efficient solution there the intervals are linearised and flattened into a structure that can then be used for efficient containment testing.

Some of the use cases: lookup tables for Unicode character properties, text parsers, regexp optimization, etc.

For example:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
original intervals:

   └───────┘         └───────┘   └─┘
       └─────────┘             └─────────┘  
             └─┘         └───┘       └─┘
flattened intervals:

   └─────────────┘   └───────┘ └─────────┘

Such flattened sets are represented as a sorted array of values where each interval is formed by pairing adjacent elements. Values with even indices are considered as inclusive lower boundaries, values with odd indices are considered as exclusive upper boundaries. There are no duplicate elements in this array. If the total number of elements is odd, then the last element is treaded as open-ended interval.

flattened intervals:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
   └─────────────┘   └───────┘ └─────────┘

in-memory ics representation:
elements: C J L P Q V
meaning:  [ ) [ ) [ )

Searching for containment on such a structure is extremely easy: perform a binary search on the elements, effectively finding a lower bound index, then check if the found index is an even or an odd number. Even indices then indicate containment.

The sets are composited by starting with an empty set then inserting (merging) the intervals one-by-one. The insertion routine then creates, extends, and merges the elements in the set as required to keep it consistent with the incoming intervals. This insertion/merging routine is quite efficient for high performance runtime usage.

For classification tasks that are static in nature, the resulting arrays of elements then can be stored in memory, as a file, or even code-gened into the source code. Later, then the containment lookup is required, use the array with this library or as a direct input into a binary search algorithm followed by even/odd check.

Bonus Feature

The logic of the set can be inverted by inserting or removing a minimum value (e.g. zero for unsigned integers) as a first element in the array:

values:    0 1 2 3 4 5 6 7 8 9 ... maxint
              └───┘     └───┘ 
elements:     [2,  4)   [7,  9)     

inverted:  0 1 2 3 4 5 6 7 8 9 ... maxint
          └───┘   └─────┘   └────────────┘ 

elements: [0,  2) [4,    7) [9  

Unicode Intervals

The library features a couple of containment sets specializations for Unicode codepoints and ASCII codeunits. For convenience, these sets provide insertion and enumeration API that operates with fully closed [a-z]-style ranges instead of half-open intervals.

Documentation

Automatically generated documentation for the package can be viewed online here: http://pkg.go.dev/github.com/adnsv/ics

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Contains

func Contains[S ~[]T, T constraints.Ordered](s S, e T) bool

Contains returns true if element e passes containment test within the interval set s.

func Enumerate

func Enumerate[S ~[]T, T any](s S, f func(l, h T))

Enumerate is a generic functional enumerator for a set. The callback is called with half-open boundaries for each interval within the set. If s has an odd number of elements, meaning the the last element is an open-ended interval, the callback is called with l = h.

func Hull added in v0.3.0

func Hull[S ~[]T, T constraints.Ordered](s S) S

Hull returns a set that contains at most one interval that covers all intervals in s.

func InsertInterval

func InsertInterval[S ~[]T, T constraints.Ordered](s *S, l, h T)

InsertInterval merges an interval into s.

  • if l < h, a bounded interval [l,h) is merged in
  • if l >= h, a half-open interval [l,... is merged instead

func Write

func Write[S ~[]T, T any](w io.Writer, s S)

Write writes interval to w in a human-readable form.

Types

type AsciiSet

type AsciiSet Set[byte]

AsciiSet is a containment set for 7-bit ASCII characters [\x00..\x7f].

func MergeAsciiSets

func MergeAsciiSets(ss ...AsciiSet) (m AsciiSet)

MergeRuneSets combines multiple containment sets into one.

func (AsciiSet) CountElements added in v0.3.0

func (s AsciiSet) CountElements() int

CountElements returns the number of ASCII codeunits effectively contained in s.

func (AsciiSet) EnumerateRanges

func (s AsciiSet) EnumerateRanges(f func(cmin, cmax byte))

EnumerateRanges is a functional enumerator for all the continuous inclusive [cmin,cmax] ranges contained within the set.

func (AsciiSet) Hull added in v0.3.0

func (s AsciiSet) Hull() AsciiSet

Hull returns a hull of all the ranges in s.

func (*AsciiSet) Insert

func (s *AsciiSet) Insert(c byte)

Insert adds c to the set.

func (*AsciiSet) InsertRange

func (s *AsciiSet) InsertRange(cmin, cmax byte)

InsertRange inserts an inclusive [cmin,cmax] range of ascii characters into the set. Notice, that the inserted ranges are fully inclusive on both ends, unlike intervals which are always half open.

func (AsciiSet) Inverted

func (s AsciiSet) Inverted() AsciiSet

Inverted returns a containment set with inverted logic.

func (AsciiSet) String

func (a AsciiSet) String() string

String produces a human-readable string with ranges and elements.

type RuneSet

type RuneSet Set[rune]

RuneSet is a containment set for unicode codepoints [U+00..U+10FFFF].

func MergeRuneSets

func MergeRuneSets(ss ...RuneSet) (m RuneSet)

MergeRuneSets combines multiple containment sets into one.

func (RuneSet) AsciiSplit

func (s RuneSet) AsciiSplit() (AsciiSet, RuneSet)

AsciiSplit splits r into ascii and non-ascii matchers.

func (RuneSet) CountElements added in v0.3.0

func (s RuneSet) CountElements() int

CountElements returns the number of unicode codepoints effectively contained in s.

func (RuneSet) EnumerateRanges

func (s RuneSet) EnumerateRanges(f func(rmin, rmax rune))

EnumerateRanges is a functional enumerator for all the continuous inclusive [rmin,rmax] ranges contained within the set.

func (RuneSet) Hull added in v0.3.0

func (s RuneSet) Hull() RuneSet

Hull returns a hull of all the ranges in s.

func (*RuneSet) Insert

func (s *RuneSet) Insert(r rune)

Insert adds r to the set.

func (*RuneSet) InsertRange

func (s *RuneSet) InsertRange(rmin, rmax rune)

InsertRange inserts an inclusive [rmin,rmax] range of unicode codepoints into the set. Notice, that the inserted ranges are fully inclusive on both ends, unlike intervals which are always half open.

func (RuneSet) Inverted

func (s RuneSet) Inverted() RuneSet

Inverted returns a containment set with inverted logic.

func (RuneSet) String

func (s RuneSet) String() string

String produces a human-readable string with ranges and elements.

type Set

type Set[T constraints.Ordered] []T

Set is a compacted and flattened representation of a set of arithmetic intervals.

All elements in the set are sorted and no duplicates are allowed. Each interval in the set is formed by pairing adjacent even/odd elements. Values with even indices are considered as inclusive lower boundaries, values with odd indices are considered as exclusive upper boundaries. If the total number of elements is odd, then the last element is treaded as open-ended interval:

elements: A B C D E F G
meaning:  [ ) [ ) [ ) [

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL