preprocess

package
v1.9.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 1, 2024 License: MIT Imports: 9 Imported by: 0

Documentation

Overview

Package preprocess performs preparsing filtering and modification of a scientific-name.

Index

Constants

This section is empty.

Variables

View Source
var AmbiguousException = map[string][]string{
	"Aeolesthes":     {"mihi"},
	"Agnetina":       {"den"},
	"Anisochaeta":    {"mihi"},
	"Antaplaga":      {"dela"},
	"Baeolidia":      {"dela"},
	"Bolitoglossa":   {"la"},
	"Campylosphaera": {"dela"},
	"Desmoxytes":     {"des"},
	"Dicentria":      {"dela"},
	"Eucyclops":      {"mihi"},
	"Eulaira":        {"dela"},
	"Gnathopleustes": {"den"},
	"Gobiosoma":      {"spec"},
	"Helophorus":     {"ser"},
	"Lampona":        {"spec"},
	"Leptonetela":    {"la"},
	"Malamatidia":    {"zu"},
	"Meteorus":       {"dos"},
	"Nocaracris":     {"van"},
	"Paralvinella":   {"dela"},
	"Ruteloryctes":   {"bis"},
	"Scoparia":       {"dela"},
	"Selenops":       {"ab"},
	"Semiothisa":     {"da"},
	"Serina":         {"ser", "subser"},
	"Stenoecia":      {"dos"},
	"Sympycnus":      {"du"},
	"Tortolena":      {"dela"},
	"Zodarion":       {"van"},
}
View Source
var NoParseException = map[string]string{
	"Navicula":   "bacterium",
	"Spirophora": "bacterium",
}
View Source
var VirusException = map[string]string{
	"Aspilota":      "vector",
	"Bembidion":     "satellites",
	"Bolivina":      "prion",
	"Ceylonesmus":   "vector",
	"Cryptops":      "vector",
	"Culex":         "vector",
	"Dasyproctus":   "cevirus",
	"Desmoxytes":    "vector",
	"Dicathais":     "vector",
	"Erateina":      "satellites",
	"Euragallia":    "prion",
	"Exochus":       "virus",
	"Hilara":        "vector",
	"Ithomeis":      "satellites",
	"Microgoneplax": "prion",
	"Neoaemula":     "vector",
	"Nephodia":      "satellites",
	"Ophion":        "virus",
	"Phalium":       "vector",
	"Psenulus":      "trevirus",
	"Tidabius":      "vector",
	"Turkozelotes":  "attavirus",
}

Functions

func CleanupStream

func CleanupStream(in <-chan string, out chan<- *CleanupResult, wn int)

CleanupStream takes input and output string channels, and feeds output with pipe delimited strings with original name on the left and cleaned up name on the right from the pipe.

func IsVirus

func IsVirus(data []byte) bool

func NoParse

func NoParse(data []byte) bool

func StripTags

func StripTags(s string) string

StripTags takes a slice of bytes and returns a string with common tags removed and html entities escaped. It does keep all uncommon tags intact to let parser deal with them.

func UnderscoreToSpace

func UnderscoreToSpace(bs []byte) (bool, error)

UnderscoreToSpace takes a slice of bytes. If it finds that the string contains underscores, but not spaces, it substitutes underscores to spaces in the slice. In case if any spaces are present, the slice is returned unmodified.

Types

type CleanupResult

type CleanupResult struct {
	// Input is the original name.
	Input string
	// Output is the name after the tag removal.
	Output string
}

CleanupResult keeps results of removal of some HTML tags.

type Preprocessor

type Preprocessor struct {
	Virus       bool
	Underscore  bool
	NoParse     bool
	DaggerChar  bool
	Approximate bool
	Annotation  bool
	Body        []byte
	Tail        []byte
	Ambiguous   ambiguous
}

Preprocessor structure keeps state of the preprocessor results.

func Preprocess

func Preprocess(ppr *preparser.PreParser, bs []byte) *Preprocessor

Preprocess runs a series of regular expressions over the input to determine features of the input before parsing.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL