sources

package
v0.0.0-...-457a45f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 5, 2020 License: GPL-3.0 Imports: 15 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// DefaultAdviseSize is the default expected number of elements.
	DefaultAdviseSize = 75000

	// DefaultErrorRate is the default target error rate.
	// Range 0.0 - 1.0, default value is 1%.
	DefaultErrorRate = 0.01
)

Variables

This section is empty.

Functions

This section is empty.

Types

type BloomFilter

type BloomFilter struct {
	// contains filtered or unexported fields
}

BloomFilter is a probabilistic data structure that can represent set membership, such that one can be fully certain an item is NOT in the set, and have a reasonably bounded idea whether an item may be in the set. N.B. in order to have confidence in error bounds, the Advise size estimate should be greater than the number of elements added.

e.g. the question "is X in the set?" has answers "no" and "maybe"

func (*BloomFilter) Advise

func (b *BloomFilter) Advise(size int)

Advise the Detector on the estimated size of the data set.

func (*BloomFilter) Count

func (b *BloomFilter) Count() uint64

Count returns the number of items added to the set (if known)

func (*BloomFilter) Detect

func (b *BloomFilter) Detect(value string) (bool, float64)

Detect predicts the value's inclusion in the data set. It returns true/false for the prediction, along with a confidence score from 0.0-1.0. A score of 0.0 means most likely not in the set, and a score of 1.0 means most like in the set.

func (*BloomFilter) ErrorRate

func (b *BloomFilter) ErrorRate(rate float64)

ErrorRate sets the desired error rate for the Detector.

func (*BloomFilter) EstimatedErrorRate

func (b *BloomFilter) EstimatedErrorRate() float64

EstimatedErrorRate returns the estimated error rate of the set. b=bits per element (1.0 - e^(-k/b))^k

func (*BloomFilter) ExpectedError

func (b *BloomFilter) ExpectedError() float64

ExpectedError returns the expected error rate of the set.

func (*BloomFilter) Learn

func (b *BloomFilter) Learn(value string)

Learn a positive value in the data set.

func (*BloomFilter) Name

func (b *BloomFilter) Name() string

Name of the detector instance type.

func (*BloomFilter) Pack

func (b *BloomFilter) Pack() []byte

Pack the detector into a serializable string.

func (*BloomFilter) ShortString

func (b *BloomFilter) ShortString() string

func (*BloomFilter) String

func (b *BloomFilter) String() string

func (*BloomFilter) Unpack

func (b *BloomFilter) Unpack(rawbytes []byte) error

Unpack the detector from a serialized bytes.

type Cache

type Cache struct {
	// MaxEntries is the maximum number of cache entries before
	// an item is evicted. Zero means no limit.
	MaxEntries int
	// contains filtered or unexported fields
}

Cache is an LRU cache. It is not safe for concurrent access.

func NewCache

func NewCache(maxEntries int) *Cache

NewCache creates a new Cache. If maxEntries is zero, the cache has no limit and it's assumed that eviction is done by the caller.

func (*Cache) Add

func (c *Cache) Add(key string, value []string)

Add adds a value to the cache.

func (*Cache) Clear

func (c *Cache) Clear()

Clear purges all stored items from the cache.

func (*Cache) Get

func (c *Cache) Get(key string) (value []string, ok bool)

Get looks up a key's value from the cache.

func (*Cache) Len

func (c *Cache) Len() int

Len returns the number of items in the cache.

func (*Cache) Remove

func (c *Cache) Remove(key string)

Remove removes the provided key from the cache.

func (*Cache) RemoveOldest

func (c *Cache) RemoveOldest()

RemoveOldest removes the oldest item from the cache.

type Database

type Database struct {
	Sources map[string]*Source
	// contains filtered or unexported fields
}

A Database of source identifiers and references to mapping resources between them.

func Open

func Open(filename string) (*Database, error)

Open a source database and load it into memory.

func (*Database) DetermineSource

func (x *Database) DetermineSource(sample []string) []*SourceHit

DetermineSource examines the sample data given and tries to guess which source database it came from. It returns a sorted list of possible Sources along with additional statistics.

func (*Database) GetMapper

func (x *Database) GetMapper(fromID, toID string) (Mapper, error)

GetMapper returns a mapper from the given source IDs to another source IDs.

func (*Database) Mappings

func (x *Database) Mappings(sourceName string) []string

Mappings returns a list of sources that the named Source can be mapped to.

type Mapper

type Mapper interface {
	// Get retrieves ids that map to the given id.
	Get(leftID string) (rightIDs []string, found bool)
}

Mapper represents a one-way mapping between identifier sources.

type Source

type Source struct {
	ID             int64
	Name           string
	Description    string
	IdentifierType string
	URL            string
	LinkoutURL     string
	Citation       string

	Subsets    map[string]*BloomFilter
	LastUpdate time.Time
}

A Source of identifiers.

func (*Source) Cite

func (s *Source) Cite() string

Cite extracts and formats a simple citation using the refman-format data.

func (*Source) Linkout

func (s *Source) Linkout(toID string) string

Linkout directly to an identifier if supported.

type SourceHit

type SourceHit struct {
	// SourceName of the database hit.
	SourceName string
	// Subset of the database if defined.
	Subset string
	// Hits is the number of samples that hit the database.
	Hits uint64
	// UniqueHits is the number of sample values that hit the database.
	UniqueHits uint64
	// Tested is the number of sample values tested.
	Tested uint64
	// SubsetRatio indicates the percentage of the subset covered by the sample.
	// E.g. Hits / |Subset|
	SubsetRatio float64 // 0.0 - 1.0
	// SubsetRatio indicates the percentage of the sample covered by the subset.
	// E.g. Hits / |Sample|
	SampleRatio float64 // 0.0 - 1.0
	// ExpectedError rate of hits for the source tested.
	ExpectedError float64 // 0.0-1.0
	// Examples lists some sample values that were in the hit set.
	Examples []string
}

SourceHit describes a search hit and some statistics.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL