stringcoding

package
v0.0.0-...-d2f764f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 24, 2019 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package stringcoding provides an easy way to deal with strings in a bit-to-bit fashion Each compressed/uncompressed string will be represented in the structure in right to left order (from the least significant bit to the most significant). i.e: The list of two strings [cia, cio] (without loss of generality we suppose to work with characters instead

bits), will be encoded as [aic|o] (the vertical bar here is simply used to divide the two encoded string)

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrEmptyString is returned when you are trying to access to an empty string using Elias Gamma's methods
	ErrEmptyString = errors.New("elias Gamma coding is undefined for the empty string")
)
View Source
var (
	// ErrTooShortString is returned when you are trying to access given an index that isn't defined
	ErrTooShortString = errors.New("the string is too short to contain a prefix of that length")
)

Functions

This section is empty.

Types

type Coding

type Coding struct {
	// Strings consists of all the concatenated bit sequences
	// corresponding to the suffixes L[i] of S's strings.
	Strings *bd.BitData
	// Starts consists of a sequence of bits in which each bit
	// set to 1 marks the first bit of each of those suffixes
	// in the aforementioned array (Strings).
	Starts *bd.BitData
	// Lengths encodes a value associated to each string in strings.
	// The value depends from the particular compression scheme used
	Lengths *bd.BitData
	// LastString contains the last processed string as a sequence
	// of bit. It can be used for a more efficient processing of the
	// current string to deal with.
	LastString *bd.BitData
	// NextIndex marks the last index in Strings (risp. Starts) arrays.
	NextIndex uint64
	// NextLengthsIndex marks the last index in the Lengths array.
	NextLengthsIndex uint64
}

Coding is a type that contains some of the data structures to run LPRC and PSRC

func New

func New(strings []string) *Coding

New creates and returns a new Coding structure inserting the strings that are in the array of strings.

func (*Coding) String

func (c *Coding) String() string

type LPRC

type LPRC struct {
	Epsilon float64
	// contains filtered or unexported fields
}

LPRC contains all the data structures to run LPRC algorithm

func NewLPRC

func NewLPRC(strings []string, epsilon float64) LPRC

NewLPRC returns a LPRC (Locality Preserving Rear Coding): a storage method based on RC (Rear Coding) that stores a string s in an uncompressed way if the latest c|s| bits do not contain an uncompressed string.

func (*LPRC) FullPrefixSearch

func (lprc *LPRC) FullPrefixSearch(prefix string) ([]string, error)

FullPrefixSearch , given a prefix *prefix* returns all the strings that start with that prefix.

func (*LPRC) GetBitDataSize

func (lprc *LPRC) GetBitDataSize() map[string]uint64

GetBitDataSize returns the size in bits of the BitData used to compress the strings

func (*LPRC) Populate

func (lprc *LPRC) Populate() error

Populate populates all the trie

func (*LPRC) Retrieval

func (lprc *LPRC) Retrieval(u uint64, l uint64) (string, error)

Retrieval (u, l) returns the prefix of the string string(u) with length l. So the returned prefix ends up in the edge (p(u), u).

func (*LPRC) String

func (lprc *LPRC) String() string

type LPRCBitDataSize

type LPRCBitDataSize struct {
	StringsSize        uint64
	StartsSize         uint64
	LengthsSize        uint64
	IsUncompressedSize uint64
}

LPRCBitDataSize contains the size of all the data structures

type PSRC

type PSRC struct {
	Epsilon float64
	// contains filtered or unexported fields
}

PSRC contains all the data structures to run PSRC algorithm

func NewPSRC

func NewPSRC(strings []string, epsilon float64) PSRC

NewPSRC return an implementation of PSRC: a storage method based on RC (Rear Coding) that stores a string s in an uncompressed way if the latest c|s| bits do not contain an uncompressed string.

func (*PSRC) FullPrefixSearch

func (psrc *PSRC) FullPrefixSearch(prefix string) ([]string, error)

FullPrefixSearch , given a prefix *prefix* returns all the strings that start with that prefix.

func (*PSRC) GetBitDataSize

func (psrc *PSRC) GetBitDataSize() map[string]uint64

GetBitDataSize returns the size in bits of the BitData used to compress the strings

func (*PSRC) Populate

func (psrc *PSRC) Populate() error

Populate populates all the trie

func (*PSRC) Retrieval

func (psrc *PSRC) Retrieval(u uint64, l uint64) (string, error)

Retrieval (u, l) returns the prefix of the string string(u) with length l. So the returned prefix ends up in the edge (p(u), u).

func (*PSRC) String

func (psrc *PSRC) String() string

type PSRCBitDataSize

type PSRCBitDataSize struct {
	StringsSize        uint64
	StartsSize         uint64
	LengthsSize        uint64
	IsUncompressedSize uint64
	PrefixOrSuffixSize uint64
}

PSRCBitDataSize contains the size of all the data structures for PSRC

type PrefixSearch

type PrefixSearch interface {
	Populate() error

	Retrieval(uint64, uint64) (string, error)
	FullPrefixSearch(prefix string) ([]string, error)
	GetBitDataSize() map[string]uint64
	// contains filtered or unexported methods
}

PrefixSearch interface contains all the methods in order to run both LPRC and PSRC

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL