proc

package
v2.3.21 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 30, 2024 License: Apache-2.0 Imports: 17 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrorTooManyParsingErrors = errors.New("too many parsing errors")
)

Functions

This section is empty.

Types

type AccumItem

type AccumItem struct {
	// contains filtered or unexported fields
}

type AttrAccumulator

type AttrAccumulator interface {
	ForEachAttr(fn func(structure string, attr string, val string) bool)
	// contains filtered or unexported methods
}

AttrAccumulator specifies an object able to collect (as tokens go) current structural attribute information. Under the hood you can imagine something like a non-strict, generalized stack.

type LineFilter

type LineFilter interface {
	Apply(tk *vertigo.Token, attrAcc AttrAccumulator) bool
}

LineFilter allows selecting only tokens with specific accumulated structure information (e.g. I want doc.type='scifi' AND text.type!='meta').

func LoadCustomFilter

func LoadCustomFilter(libPath string, fn string) (LineFilter, error)

LoadCustomFilter loads a compiled .so plugin from a defined path and selects a function identified by fn. In case libPath does not point to an existing file, the function handles it as a path suffix and tries other locations (working directory, /usr/local/lib/gloomy).

type PassAllFilter

type PassAllFilter struct{}

PassAllFilter is the default filter which returns true for any struct-attr values.

func (*PassAllFilter) Apply

func (df *PassAllFilter) Apply(tk *vertigo.Token, attrAcc AttrAccumulator) bool

Apply tests current state of the attribute accumulator against the filter.

type Status

type Status struct {
	Datetime       time.Time
	File           string
	ProcessedAtoms int
	ProcessedLines int
	Error          error
}

Status stores some basic information about vertical file processing

type TTExtractor

type TTExtractor struct {
	// contains filtered or unexported fields
}

TTExtractor handles writing parsed data to a sqlite3 database. Parsed values are received pasivelly by implementing vertigo.LineProcessor

func NewTTExtractor

func NewTTExtractor(
	database db.Writer,
	conf *cnf.VTEConf,
	colgenFn colgen.AlignedColGenFn,
	statusChan chan Status,
	stopChan <-chan os.Signal,
) (*TTExtractor, error)

NewTTExtractor is a factory function to instantiate proper TTExtractor.

func (*TTExtractor) GetColCounts

func (tte *TTExtractor) GetColCounts() map[string]*ptcount.NgramCounter

func (*TTExtractor) GetNumTokens

func (tte *TTExtractor) GetNumTokens() int

func (*TTExtractor) ProcStruct

func (tte *TTExtractor) ProcStruct(st *vertigo.Structure, line int, err error) error

ProcStruct is a part of vertigo.LineProcessor implementation. It si called by Vertigo parser when an opening structure tag is encountered.

func (*TTExtractor) ProcStructClose

func (tte *TTExtractor) ProcStructClose(st *vertigo.StructureClose, line int, err error) error

ProcStructClose is a part of vertigo.LineProcessor implementation. It is called by Vertigo parser when a closing structure tag is encountered.

func (*TTExtractor) ProcToken

func (tte *TTExtractor) ProcToken(tk *vertigo.Token, line int, err error) error

ProcToken is a part of vertigo.LineProcessor implementation. It is called by Vertigo parser when a token line is encountered.

func (*TTExtractor) Run

func (tte *TTExtractor) Run(conf *vertigo.ParserConf) error

Run starts the parsing and metadata extraction process. The method expects a proper database schema to be ready (see database.go for details). The whole process runs within a transaction which makes sqlite3 inserts a few orders of magnitude faster.

func (*TTExtractor) WordDict

func (tte *TTExtractor) WordDict() *ptcount.WordDict

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL