gofiler

package module
v0.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 18, 2021 License: MIT Imports: 13 Imported by: 8

README

build status

Go-Profiler -- gofiler

Thin go-wrapper around the profiler.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrorLanguageNotFound = errors.New("laguage configuration not found")

ErrorLanguageNotFound is the error that is returned if a language configuration cannot be found.

Functions

This section is empty.

Types

type Candidate

type Candidate struct {
	Suggestion   string    // Correction suggestion
	Modern       string    // Modern variant
	Dict         string    // Name of the used dictionary
	HistPatterns []Pattern // List of historical patterns
	OCRPatterns  []Pattern // List of OCR error patterns
	Distance     int       // Levenshtein distance
	Weight       float32   // The vote weight of the candidate
}

Candidate represents a correction candidate for an OCR token.

func MakeCandidate added in v0.8.0

func MakeCandidate(expr string) (Candidate, string, error)

theyl@theil:{teil+[(t:th,0)]}+ocr[(i:y,3)],voteWeight=0.749764,levDistance=1,dict=dict_modern_hypothetic_error

func (Candidate) String added in v0.3.0

func (c Candidate) String() string

type Interpretation

type Interpretation struct {
	OCR        string
	N          int
	Candidates []Candidate
}

Interpretation holds the list of candiates for OCR tokens. In the case of lexicon entries, an interpretation holds only one candidate with empty historical and and ocr pattern list.

type LanguageConfiguration

type LanguageConfiguration struct {
	Language, Path string
}

LanguageConfiguration represents a pair that consists of a language name and the according config path in the backend directory.

func FindLanguage

func FindLanguage(backend, language string) (LanguageConfiguration, error)

FindLanguage searches the backend directory for a language configuration. It returns ErrorLanguageNotFound if the language configuration cannot be found.

func ListLanguages

func ListLanguages(backend string) ([]LanguageConfiguration, error)

ListLanguages returns a list of language configurations in the given backend directory.

type Logger

type Logger interface {
	Log(string)
}

Logger defines a simple interface for the stderr logger of the profiling.

type Pattern

type Pattern struct {
	Left  string  // Left part of the pattern
	Right string  // Right part of the pattern
	Prob  float64 // Global probability of the pattern
	Pos   int     // Position
}

Pattern represents error patterns in strings. Left represents the `true` pattern(either the error correction or the modern form) and Right the actual pattern in the string at position Pos.

func MakePattern added in v0.8.0

func MakePattern(expr string) (Pattern, error)

MakePattern creates a pattern from a pattern expression `(left:right,pos)`.

func (Pattern) String added in v0.3.0

func (p Pattern) String() string

type Profile

type Profile map[string]Interpretation

Profile maps unkown OCR token in a profiled document to the according interpreations of the profiler.

func (Profile) GlobalHistPatterns added in v0.5.0

func (p Profile) GlobalHistPatterns() map[string]float64

GlobalHistPatterns returns all global historical patterns with their according probabilities.

func (Profile) GlobalOCRPatterns added in v0.5.0

func (p Profile) GlobalOCRPatterns() map[string]float64

GlobalOCRPatterns returns all global ocr error patterns with their according probabilities.

type Profiler added in v0.2.0

type Profiler struct {
	Exe, Config     string
	Log             Logger
	Types, Adaptive bool
	// contains filtered or unexported fields
}

Profiler is a profiler executable with an optional logger and some minor options.

func (*Profiler) Run added in v0.2.0

func (p *Profiler) Run(ctx context.Context, tokens []Token) (Profile, error)

Run profiles a list of tokens. It uses the given executable with the given language configuration. The optional logger is used to write the process's stderr.

func (*Profiler) RunFunc added in v0.8.0

func (p *Profiler) RunFunc(ctx context.Context, tokens []Token, f func(string, Candidate) error) error

RunFunc profiles a list of tokens. It uses the given language configuration. The optional logger is used to write the process's stderr. The callback function is called for every Profiler suggestion.

func (*Profiler) RunWriter added in v0.8.0

func (p *Profiler) RunWriter(ctx context.Context, tokens []Token, w io.Writer) error

RunWriter profiles a list of tokens and writes the resulting profile into the given writer.

type Token

type Token struct {
	LE, OCR, COR string
}

Token represents an input token for the profiling. A token either contains an entry for the extended lexicon (LE) or a text token (OCR) with an optional manual correction (COR).

Tokens must never contain any whitespace in any of the strings.

func (Token) String

func (t Token) String() string

String implements the io.Stringer interface. The output is suitable as direct input for the profiler, i.e each lexicon entry start with `#` all other tokens contain exactly on `:` to seperate the ocr token from the correction token. Tokens with no correction still must end with `:` (they contain an empty correction string).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL